ACYL-tRNA SYNTHETASES FIELD OF THE INVENTION The invention relates to acyl-tRNA synthetases with altered activities. For instance, beta-amino-acid-tRNA synthetases, nucleic acids encoding said synthetases, cells expressing said synthetases, and uses and methods thereof. The invention also relates to beta-hydroxy-tRNA synthetases, nucleic acids encoding said synthetases, cells expressing said synthetases, and uses and methods thereof. Furthermore, the invention relates to α,α- disubstituted-amino-acid-acyl-tRNA synthetases, nucleic acids encoding said synthetases, cells expressing said synthetases, and uses and methods thereof. The invention also relates to methods of producing polymers comprising the genetic incorporation of α,α-disubstituted-amino acids. BACKGROUND OF THE INVENTION The genetic code of living cells has been reprogrammed to enable the site-specific incorporation of hundreds of non-canonical amino acids (ncAAs) into proteins1,2, and the encoded synthesis of non-canonical polymers and macrocyclic peptides and depsipeptides3-6. Despite remarkable progress, the monomers that can be site- specifically incorporated into proteins in cells are essentially limited to α-L amino acids with variant side chains, and closely related hydroxy acids. While a wider range of monomers have been incorporated in in vitro translation reactions7-10 – primarily into short peptides – these in vitro approaches cannot be extended to living cells. (S)β3- paraBromo-homophenylalanine ((S)β3-pBrhF) has been incorporated at very low levels in competition with phenylalanine (Phe) at Phe codons in E. coli, using forcing conditions of Phe starvation11; this approach leads to a mixture of amino acids at all Phe codons and does not enable the site -specific incorporation of (S)β3-pBrhF at a single position in response to a single codon, as required to reprogram the genetic code. The encoded, site specific, incorporation of a non-canonical monomer (ncM) via cellular translation requires both the acylation of an orthogonal tRNA with the ncM by an orthogonal synthetase, and ribosomal polymerization of the ncM into a polymer chain (Fig.1). Current methods for engineering aminoacyl-tRNA synthetases that acylate new monomers rely on translational readouts12,13 and therefore require the monomers to be ribosomal substrates for incorporation, often at specific sites in proteins. Since many ncMs of interest are poor ribosomal substrates10,11,14-18, this creates an evolutionary deadlock in cells; an orthogonal synthetase cannot be evolved to acylate an orthogonal tRNA with ncMs that are poor ribosomal substrates, and ribosomes cannot be evolved to polymerize ncMs that cannot be acylated onto orthogonal tRNAs. The inventors previously described tRNA extension (tREX), a rapid and scalable method to determine the aminoacylation status of user-defined tRNAs from cells19. In this approach total tRNA is isolated from cells and the 2’,3’ diol on the ribose at the 3’ end of non-acylated tRNAs is selectively oxidized to a dialdehyde, while acylated tRNAs are protected from oxidation of the diol. A DNA probe bearing a fluorophore is then annealed to the 3’ end of the tRNA of interest, under conditions that facilitate deacylation of acylated tRNAs, to reveal the free diol at their 3’ ends. This enables the polymerase-mediated extension of non-oxidized tRNAs (that were acylated). The resulting difference in mass between oxidized, non-extended, and non-oxidized, extended tRNAs, is resolved by gel electrophoresis, allowing acylated and free tRNAs to be distinguished. Another previous method is disclosed in Saito et al. (the EMBO Journal, Vol.20, No.7, pp 1797-1806, 2001). This method relies on a biotinylated substrate and is performed in vitro. SUMMARY OF THE INVENTION In a first aspect, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising mutations corresponding to: M300A, M300C, M300D, M300M, or M300S relative to SEQ ID NO: 1; A302A, A302C, A302D, A302G, A302H, A302L, A302N, or A302Y relative to SEQ ID NO: 1; and
N346A, N346C, N346G, N346S, N346T, or N346V relative to SEQ ID NO: 1. In an embodiment, the acyl-tRNA synthetase comprises an M300D mutation relative to SEQ ID NO: 1, a A302H, A302Y, or A302C mutation relative to SEQ ID NO: 1, and an N346G, N346A, or N346S mutation relative to SEQ ID NO: 1. In a second aspect, there is provided use of an acyl-tRNA synthetase in a method of generating a polymer comprising a beta amino acid. In an embodiment, there is provided the use of an acyl-tRNA synthetase as disclosed herein in a method of generating a polymer comprising a beta amino acid. Also provided is a method of making a polymer comprising a beta amino acid, wherein the method comprises: i) use of an acyl-tRNA synthetase as disclosed herein to acylate a tRNA with the beta amino acid, and ii) incorporation of the beta amino acid into a polymer chain. In a third aspect, there is provided a nucleic acid encoding an acyl-tRNA synthetase of the first aspect. In a fourth aspect, there is provided a cell comprising an acyl-tRNA synthetase of the first aspect, a nucleic acid of the third aspect, or a vector comprising a nucleic acid of the third aspect. In a fifth aspect, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α- disubstituted-amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: A302C, A302G, A302H, or A302S relative to SEQ ID NO: 1; and N346A, N346C, N346E, N346G, N346T, or N346V relative to SEQ ID NO: 1. In a sixth aspect, there is provided the use of an acyl-tRNA synthetase in a method of generating a polymer comprising an α,α-disubstituted-amino acid. Also provided is a method of making a polymer comprising an α,α-disubstituted-amino acid, wherein the method comprises: i) use of an acyl-tRNA synthetase as disclosed herein to acylate a tRNA with the α,α-disubstituted- amino acid, and ii) incorporation of the α,α-disubstituted-amino acid into a polymer chain. In a seventh aspect, there is provided a nucleic acid encoding an acyl-tRNA synthetase of the fifth aspect. In an eighth aspect, there is provided a cell comprising an acyl-tRNA synthetase of the fifth aspect, a nucleic acid of the seventh aspect, or a vector comprising a nucleic acid of the seventh aspect. In a ninth aspect, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta- hydroxy acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300A, M300D, M300M, M300N, or M300S relative to SEQ ID NO: 1; and A302D, A302G, A302H, or A302N relative to SEQ ID NO: 1. In a tenth aspect, there is provided the use of an acyl-tRNA synthetase in a method of generating a polymer comprising a beta-hydroxy acid. Also provided is a method of making a polymer comprising a beta-hydroxy acid, wherein the method comprises: i) use of an acyl-tRNA synthetase as disclosed herein to acylate a tRNA with the beta-hydroxy acid, and ii) incorporation of the beta-hydroxy acid into a polymer chain.
In an eleventh aspect, there is provided a nucleic acid encoding an acyl-tRNA synthetase of the ninth aspect. In a twelfth aspect, there is provided a cell comprising an acyl-tRNA synthetase of the ninth aspect, a nucleic acid of the eleventh aspect, or a vector comprising a nucleic acid of the eleventh aspect. Also provided is use of an acyl-tRNA synthetase as disclosed herein in a method of genetically incorporating a monomer into a polymer. Also provided is a method of making a polymer, wherein the method comprises: i) use of an acyl-tRNA synthetase as disclosed herein to acylate a tRNA with a monomer, and ii) incorporation of the monomer into a polymer chain. BRIEF DESCRIPTION OF THE DRAWINGS Fig.1: Encoded cellular incorporation of non-canonical monomers into proteins and into non-canonical polymers requires both tRNA acylation and ribosomal polymerization. The encoded, site specific, incorporation of a non-canonical monomer (ncM, yellow star) via cellular translation requires both the acylation of an orthogonal tRNA with the ncM by an orthogonal synthetase, and ribosomal polymerization of the ncM into a polymer chain. Current methods for engineering aminoacyl-tRNA synthetases that acylate new monomers rely on translational readouts and therefore require the monomers to be ribosomal substrates. For ncMs that are poor ribosomal substrates this co-dependence creates an evolutionary deadlock in cells; an orthogonal synthetase cannot be evolved to acylate an orthogonal tRNA with ncMs that are poor ribosomal substrates, and ribosomes cannot be evolved to polymerize ncMs that cannot be acylated onto orthogonal tRNAs. To break this deadlock, we develop direct selections for orthogonal synthetases to aminoacylate their cognate orthogonal tRNAs with ncMs, independent of whether the ncMs are ribosomal substrates. Fig.2. tRNA display enables the direct selection of orthogonal aminoacyl-tRNA synthetases that aminoacylate their cognate orthogonal tRNAs with ncAAs. a, Schematic representation of tRNA display, a translation independent strategy for selecting aaRS enzymes that aminoacylate a specific tRNA with a desired monomer. In tRNA display a library of stmRNAs is transformed into cells and grown in presence and absence of non-canonical monomers of interest (represented by the yellow star) in multiple replicates. The library contains PylRS variants that are active and selective for the ncM (yellow), PylRS variants that are neither active with the ncMs or canonical amino acids in the cell (light blue) and PylRS variants that are active with one or more canonical amino acid in the cell and are not selective for the ncM (dark blue). Bio-mREX is performed for each replicate and the cDNA is submitted for next generation sequencing (NGS). The results are analysed by plotting the selectivity (the ratio of the relative abundance of a particular sequence in the positive samples (+ncM), divided by the relative abundance in the negative sample (-ncM)), against the enrichment (the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library) of all observed sequences; this results in a spindle-shaped plot (henceforth referred to as spindle plot). Active and selective PylRS variants are expected to be highly selective and enriched (yellow dot – upper right quadrant), active but nonselective variants are expected to be non-selective but highly enriched (dark blue dot), and inactive PylRS variants are expected to be non-selective and non-enriched (light blue dot). b, The structures of non-canonical α-alpha-amino acids used in this study. N6-((benzyloxy)carbonyl)-L-lysine (CbzK) (2), N6-((prop-2-yn-1-yloxy)carbonyl)-L-lysine (AlkyneK) (3), N6-benzoyl-L-lysine (BenzK) (4), 3-([2,2'-bipyridin]-5-yl)-2-aminopropanoic acid (BiPyA) (5), Nτ- methyl-L-histidine (NτmH) (6), (S)-2-amino-3-(thiophen-3-yl)propanoic acid (3-ThiA) (7) ), (S)-2-amino-3- (pyridin-3-yl)propanoic acid (PyA) (8), (S)-2-amino-3-(4-iodophenyl)propanoic acid (pIF) (9), (S)-2-amino-3-(4- bromothiophen-2-yl)propanoic acid (BrThiA) (10), (2S)-2-amino-3-(((2-((1-(6-nitrobenzo[d][1,3]dioxol- 5yl)ethyl)thio)ethoxy)carbonyl)amino)propanoic acid (pcDAP) (11). c, tRNA display with PylRS library (stmRNAvol2-lib1), this library mutates three positions to all other canonical amino acids (Y306X, L309X, and N346X). The spindle plot shows the result of the tRNA display selection against CbzK (2) using one step of parallel selection as outlined in (a). Samples were run in triplicate and data processed as described in the Methods. Red dots indicate 65 clones that were further characterized. d, ln(Enrichment, +2) of PylRS mutants
derived from tRNA display (red dots in (c)) vs. the GFP fluorescence measured in cells containing the corresponding PylRS mutant/tRNAPyl CUA pair, GFP(150TAG)His6 and CbzK (2). The dotted line represents the linear regression for the displayed data points; R-squared = 0.6611, p < 0.0001. e-j, (left) white bar: GFP fluorescence from cells containing GFP(150TAG)His6 , the indicated PylRS variant/tRNAPylCUA pair, and the indicated ncAA. Grey bar: the wtPylRS/tRNAPyl CUA pair with the same ncAA. Fluorescence is shown as a fraction of the fluorescence generated by the wt PylRS/tRNAPylCUA pair with 2 mM BocK (1) and GFP(150TAG)His6. (right) ESI-MS of GFP(150X)His6, where X is the indicated ncAA. f, found mass: 27922.0 Da, expected mass 27923.3 Da; g, found mass: 27944.8 Da, expected mass 27945.5 Da; h, found mass: 27867.6 Da, expected mass 27866.4 Da; i, found mass: 27862.0 Da, expected mass 27861.4 Da; j, found mass: 27986.4 Da, expected mass 27986.2 Da; k, found mass: 27945.6 Da, expected mass 27944.3 Da. Fig.3. tRNA display enables selection of orthogonal aminoacyl-tRNA synthetases that charge non- canonical monomers. a, Structures of non-canonical monomers used in this study. (S)-3-amino-3-(3-bromophenyl)propanoic acid ((S)β3mBrF) (12), (S)-3-amino-6-(((benzyloxy)carbonyl)amino)hexanoic acid ((S)-β3CbzK) (13), (S)-6-acetamido- 3-aminohexanoic acid ((S)-β3AcK) (14), BocAhx (15), 6-(((benzyloxy)carbonyl)amino)hexanoic acid (CbzAhx) (16), 3-amino-2-((1-ethyl-1H-imidazol-5-yl)methyl)propanoic acid (β2NeH (17) ), 3-amino-4-(4- bromophenyl)butanoic acid (β3pBrhF) (18), 2-benzyl-3-hydroxypropanoic acid (β2OH-F) (19), 3-amino-3- phenylpropanoic acid (β3F) (20). b, Fluoro-tREX for the indicated PylRS variants. Experiments were performed in triplicate, using tRNAs extracted from cells harboring a pMB1 plasmid encoding each PylRS variant and TRNAPylCUA in presence and absence of 4 mM 15. c-d, Fluoro-tREX for the indicated PylRS variants from the primary selection (panel c) and after further evolution (panel d). Experiments were performed in triplicate, using tRNA extracted from cells harboring a pMB1 plasmid encoding each PylRS variant and tRNAPylCUA in presence and absence of 4 mM 12. e, Selected PylRS variants acylate tRNAPyl CUA with 12. LC-MS traces (scanning ion mode on AQC adduct of substrate 12) on AQC-derivatized eluates from tRNA-pull downs from cells expressing the indicated PylRS variants. Cells harbouring a pMB1 plasmid encoding the corresponding PylRS variant and tRNAPyl CUA (or only tRNA (-) as a control were grown in the presence of substrate 12, and tRNA pulldowns were performed using a biotinylated probe against TRNAPyl . f, Quantification of relativ Pyl CUA e acylation of tRNA CUA by selected PylRS variants. The integrated area under the peak for of LC-MS traces shown in e. g, GFP fluorescence from cells containing GFP(150TAG)His6, the PylRS(12_1) or PylRS(12_1evol1), and tRNAPyl CUA, and grown in the presence or absence of substrate 12. Fluorescence is shown as a fraction of the fluorescence generated by the wt PylRS/tRNAPyl CUA pair with 2 mM BocK (1) and GFP(150TAG)His6. h, intact ESI-MS of GFP(150(S)β3mBrF)His6 purified from cells harbouring PylRS(12_1evol1), tRNAPylCUA, and GFP(150TAG)His6 grown in the presence of 4 mM 12. Found mass: 27939.0 Da, predicted mass: 27,938.2 Da. i, Close up on residue 150 of GFP(150(S)β3mBrF)His6 from a crystal structure determined at 1.5 Å. The 2Fo-Fc map is shown at contour level of sigma = 2 (PDB code 8OVY). The electron density (blue) confirms the incorporation of 12 at position 150, clearly demonstrating the extension of the peptide backbone by one methylene group and the stereochemistry of the β- amino acid in the protein. Fig.4. Following oxidation, the deacylation of tRNAs under alkaline conditions increases the acylation signal in fluro-tREX and thereby permits the robust detection of acylation by hydroxy acids, and carboxylic acids. a, Chemical structure of N6-(tert-butoxycarbonyl)-L-lysine (BocK) 1, (S)-6-((tert-butoxycarbonyl)amino)-2- hydroxyhexanoic acid (OH-BocK) 21 as well as 6-((tert-butoxycarbonyl)amino)hexanoic acid (BocAhx) 15. MmPylRS is highly active with BocK 1, OH-BocK121 and, and shows acylation activity with BocAhx 152. b, The free acid of 1, 21 and 15 span a range of the estimated pKas. The rate constant for the alkaline hydrolysis of esters, to give a fixed alcohol and a variable carboxylic acid, increases as the pKa of the resulting carboxylic acid decreases. We therefore expect the rate of hydrolysis for acylated tRNAs to be slower wjem the acylating monomers are α-hydroxy acids, simple carboxylic acids (and β-amino acids), than when the acylating monomers are α-amino acids c, MmPylRS acylates MmtRNAPyl in cells with BocK 1, OH-BocK 21, and BocAhx 15 respectively. Northern blot of MmtRNAPyl from tRNAs isolated from cells harboring a pMB1 plasmid encoding the MmPylRS/MmtRNAPyl pair in presence and absence of BocK, OH-BocK, or BocAhx. The experiments were carried out in three biological replicates producing similar results. d, After oxidation, a deacylation step under
alkaline conditions is necessary to robustly detect acylation activity of MmPylRS by fluoro-tREX with non-alpha amino acid substrates. Fluoro-tREX was performed with the tRNA samples described in panel c with and without an incubation of the tRNAs for 45 minutes with 50 mM bicine at pH 9.6 post oxidation. The acylation signals from OH-BocK as well as BocAhx were dependent on the deacylation of the tRNAs before the Klenow (exo-) extension step of the protocol. The experiments were carried out in three biological replicates producing similar results. Fig.5. Relationship between the acylation signal measured, by bio-mREX, for stmRNAs and the GFP fluorescence signal measured for intact, translation-competent tRNAs. For stmRNAs, active aminoacyl-tRNA synthetases (aaRS) lead to the acylation of their encoding stmRNAs, which by bio-mREX get extended, separated and ultimately reverse transcribed. This results in the cDNA of the active synthetase, which can be quantified by qPCR. In the case of an inactive aaRS the stmRNAs is not acylated and no cDNA produced in bio-mREX experiments. Therefore, the activity of a synthetase in bio-mREX correlates with the number of cDNA molecules measured by qPCR. In canonical translation an active aaRS enzyme leads to an acylated, intact, cognate tRNACUA which is used in protein translation. Inactive aaRS enzymes lead to non-acylated tRNAs, which are not used in protein translation. The production of GFP protein from GFP150TAGHi6, as measured by GFP fluorescence, reports on the acylation of tRNACUA, as well as the other steps in the production of protein. Fig.6. PylRS libraries used in this work. a, Overview of the seven libraries designed and created. These libraries target a total of 11 amino acid residues in the PylRS active site and employed several types of degenerate codons. NNK codons are depcited as dark red, DBK codons (+ lysine codon) as blue, NDT codons as dark green, NRT codons as yellow. For certain sites, custom residue mixes encompassing the most commonly observed mutations were used (1-7 mixes, depcited as grey spheres). All libraries were created with at least 109 independent transformants. N = A, T, G, C; K = G, T; D = G, A, T; B = G, T, C; R = G, A. The custom mixes are described in the methods. b, The eleven amino acid residues targeted for mutageneesis in the PylRS active site are shown in red. Image was rendered using Pymol, based on the PDB structure 2ZIN. Fig.7. tRNA display identifies active and selective orthogonal aaRS variants from an stmRNA library. a, Spindle plot from the tRNA display selection using stmRNA library 1 and ncAA 1. b, Identifying the region of the spindle plot enriched in active and selective clones. We expect the top right quadrant of the spindle plot to be enriched in active and selective clones. Since selectivity is derived from the ratio of sequence counts + ncAA and -ncAA, enriched clones with negative selectivity values would correspond to specific enrichment of a clone in the -ncAA condition with respect to the +ncAA condition. We postulated that most apparent enrichments of this type were spurious, and therefore that regions of the spindle plot where positive selectivity values were mirrored by negative selectivity values of the same magnitude may contain substantial noise. Based on this postulate we expected the active and ncAA selective clones to be most enriched in the region of the plot where, for a given positive enrichment value, the selectivity becomes asymmetric. To enrich for this asymmetric population we binned mean selectivity values: 5350 points in the spindle plot (Fig.2c) were divided into 500 equal bins along the enrichment+1 dimension (163 bins contained data). The mean selectivity of each bin was plotted against the natural logarithm of enrichment+1. From this, a threshold of ca 7.4 (corresponding to a logarithmic score value of 2) was used to define the enrichment value at which the spindle plot is asymmetric along the selectivity axis. c, Experimental GFP fluorescence values for 100 clones plotted against the natural logarithm of enrichment score in the presence of 1. GFP was expressed from GFP150TAGHis6 in the presence of the MmPylRS variant clone, the cognate MmtRNAPylCUA and the ncAA (1). Points above the symmetry threshold are colored in red (65 points), points below are colored in blue (21 points). There is a strong positive and significant correlation between the tRNA display sequence data and experimental expression data for the red points (R-square value=0.6611, p<0.0001 value), but no significant correlation for the blue points (R2 value=0.0392, p value=0.397), this is consistent with our postulate. The red points are also shown in Fig.2c. In subsequent selections, on the basis of this analysis, we primarily focused on identifying clones on the right-hand side of the spindle plot where, for a given enrichment value, the magnitude of the positive selectivity value for a clone is of greater magnitude than negative selectivity values for clones with the same enrichment value. d, tRNA display identifies ncAA specific PylRS variants. Plot shows the experimental selectivity vs selectivity from the spindle plot for the clones show in
panel b, color coding as in panel b. The experimental selectivity is derived from GFP expression experiments, as in panel b, but +/- ncAA. Fig.8. Schematic representation of tRNA display based strategy for selecting PylRS variants that direct the incorporation of ncAAs into proteins. In the first round naïve stmRNAvol2 libraries 2, 13, 14, 3D, 4D, and 5D were transformed into BL21 cells and grown overnight. Libraries 3D and 4D were combined to generate library 3D4D. The five libraries were grown to OD600 of 0.3-0.4 and 2.6 mL of the cell culture from each library was added to a stock solution of each ncAA; this resulted in 50 samples (five libraries x ten ncAAs). Cells were grown for 40 min, stmRNAs induced, and cells grown for another 20 min. Bio-mREX was performed on the isolated RNA for each of the 50 samples. For each reaction, cDNA was amplified with primers suitable for Golden Gate assembly. Then all amplicons of the libraries selected for the same ncAA were combined at equimolar ratios (resulting in ten combined libraries in total) and cloned into a fresh ColE1vector backbone. This created ten pre-selected libraries. The ten pre-selected libraries were transformed into BL21 cells and grown over night. The preselected libraries for ncAAs 2 - 6 were combined to create a single cluster library. Similarly, the preselected libraries for ncAAs 7 – 11 were combined to create a second cluster library.2.6 mL of the first cluster library was added to solutions of ncAAs 2 – 6.2.6 mL of the first cluster library was also added to a sample without ncAA, as a control. Cells were grown for 40 min, stmRNAs induced and cells grown for another 20 min, and the RNA isolated. Three RNA samples were converted to cDNA as bio-mREX input control. Bio-mREX was performed on the isolated RNA. This generated seven samples (bio-mREX input, -ncAA control for bio-mREX, and five bio-mREX samples for ncAAs 2 – 6. The experiment was performed in triplicates, generating 21 samples. The cDNA of each sample was sequenced by NGS and analyzed to generate spindle plots and sequence tables. The second cluster was treated analogously to the first cluster, using ncAAs 7 – 11 in place of 2 – 6. Fig.9. Selection of PylRS variants for CbzK 2 by tRNA display. a, Chemical structure of CbzK 2. b, Spindle plot resulting from the tRNA display selection as described in Fig.8. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 2), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and minimally one control sample. Red dots are sequences only observed in the positive samples. c, Top 25 sequences ordered by selectivity resulting from the selection for substrate 2 with selectivities of ≥ 5, and enrichments of ≥ 5. Fig.10. Selection of PylRS variants for N6-((prop-2-yn-1-yloxy)carbonyl)-L-lysine (AlkyneK) 3 by tRNA display. a, Chemical structure of AlkyneK 3. b, Spindle plot resulting from the tRNA display selection as described in Fig. 8. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 3), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, Top 25 sequences, ordered by selectivity, resulting from the selection for substrate 3 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Amber suppressor activity data for selected PylRS variants measured by the production of GFP150AlkyneKHis6 from GFP150TAGHis6 from cells harboring a pMB1 plasmid encoding either wild type (wt) MmPylRS or the indicated MmPylRS variants, and a p15A plasmid encoding GFP150TAGHis6 in the presence and absence of 4 mM 3. The fluorescence is shown relative to cells harboring wt MmPylRS, MmtRNAPylCUA expressing GFP from GFP150TAGHis6 in the presence of 2 mM substrate 1. e, Raw mass spectrum for purified GFP150AlkyneKHis6 produced with PylRS 3_1. f, Deconvoluted mass spectrum resulting from the primary spectrum shown in panel e. Mass predicted 27923.3 Da, mass found 27922.0 Da. The minor peak labeled -met corresponds to cleavage of the N-terminal methionine residue. g-h, same as e and f but for GFP150AlkyneKHis6 produced with PylRS 3_2. Mass predicted 27923.3 Da, mass found 27923.6 Da. i-j, same as e and f but for GFP150AlkyneKHis6 produced with PylRS 3_3. Mass predicted 27923.3 Da, mass found 27923.2 Da.
Fig.11. Selection of PylRS variants for N6-benzoyl-L-lysine (BenzK) 4 by tRNA display. a, Chemical structure of BenzK 4. b, Spindle plot resulting from the tRNA display selection as described in Fig.8. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 4), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, Top 25 sequences, ordered by selectivity, resulting from the selection for substrate 4 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Amber suppressor activity data for selected PylRS variants measured by the production of GFP150BenzKHis6 from GFP150TAGHis6 from cells harboring a pMB1 plasmid encoding either wild type (wt) MmPylRS or the indicated MmPylRS variants, and a p15A plasmid encoding GFP150TAGHis6 in the presence and absence of 4 mM 4. The fluorescence is shown relative to cells harboring wt MmPylRS, MmtRNAPyl CUA expressing GFP from GFP150TAGHis6 in the presence of 2 mM substrate 1. e, Raw mass spectrum for purified GFP150BenzKHis6 produced with PylRS 4_1. f, Deconvoluted mass spectrum resulting from the primary spectrum shown in panel e. Mass predicted 27945.5 Da, mass found 27944.8 Da. The minor peak labeled -met corresponds to cleavage of the N-terminal methionine residue. g-h, same as e and f but for GFP150BenzKHis6 produced with PylRS 4_2. Mass predicted 27945.5 Da, mass found 27944.8 Da. i-j, same as e and f but for GFP150BenzKHis6 produced with PylRS 4_3. Mass predicted 27945.5 Da, mass found 27944.4 Da. k-l, same as e and f but for GFP150BenzKHis6 produced with PylRS 4_4. Mass predicted 27945.5 Da, mass found 27944.4 Da. Fig.12. Selection of PylRS variants for 3-([2,2'-bipyridin]-5-yl)-2-aminopropanoic acid (BiPyA) 5 by tRNA display. a, Chemical structure of BiPyA 5. b, Spindle plot resulting from the tRNA display selection as described in Fig.8. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 5), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, Top 25 sequences, ordered by selectivity, resulting from the selection for substrate 5 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Amber suppressor activity data for selected PylRS variants measured by the production of GFP150BiPyA His6 from GFP150TAGHis6 from cells harboring a pMB1 plasmid encoding either wild type (wt) MmPylRS or the indicated MmPylRS variants, and a p15A plasmid encoding GFP150TAGHis6 in the presence and absence of 4 mM 5. The fluorescence is shown relative to cells harboring wt MmPylRS, MmtRNAPylCUA expressing GFP from GFP150TAGHis6 in the presence of 2 mM substrate 1. Fig.13. Selection of PylRS variants for Nτ-methyl-L-histidine (NτmH) 6 by tRNA display. a, Chemical structure of NτmH 6. b, Spindle plot resulting from the tRNA display selection as described in Fig.8. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 6), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 16 sequences, ordered by selectivity, resulting from the selection for substrate 6 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Amber suppressor activity data for selected PylRS variants measured by the production of GFP150NτmH His6 from GFP150TAGHis6 from cells harboring a pMB1 plasmid encoding either wild type (wt) MmPylRS or the indicated MmPylRS variants, and a p15A plasmid encoding GFP150TAGHis6 in the presence and absence of 4 mM 6. The fluorescence is shown relative to cells harboring wt MmPylRS, MmtRNAPyl CUA expressing GFP from GFP150TAGHis6 in the presence of 2 mM substrate 1. Fig.14. Selection of PylRS variants for (S)-2-amino-3-(thiophen-3-yl)propanoic acid (3-ThiA) 7 by tRNA display. a, Chemical structure of 3-ThiA 7. b, Spindle plot resulting from the tRNA display selection as described
in Fig.8. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 7), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, Top 25 sequences, ordered by selectivity, resulting from the selection for substrate 7 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Amber suppressor activity data for selected PylRS variants measured by the production of GFP150-3-ThiA His6 from GFP150TAGHis6 from cells harboring a pMB1 plasmid encoding either wild type (wt) MmPylRS or the indicated MmPylRS variants, and a p15A plasmid encoding GFP150TAGHis6 in the presence and absence of 4 mM 7. The fluorescence is shown relative to cells harboring wt MmPylRS, MmtRNAPylCUA expressing GFP from GFP150TAGHis6 in the presence of 2 mM substrate 1. e, Raw mass spectrum for purified GFP150-3-ThiA KHis6 produced with PylRS 7_1. f, Deconvoluted mass spectrum resulting from the primary spectrum shown in panel e. Mass predicted 27866.4 Da, mass found 27867.2 Da. The minor peak labeled -met corresponds to cleavage of the N-terminal methionine residue. g-h, same as e and f but for GFP150-3-ThiAHis6 produced with PylRS 7_2. Mass predicted 27866.4 Da, mass found 27866.8 Da. i-j, same as e and f but for GFP150-3-ThiAHis6 produced with PylRS 7_3. Mass predicted 27866.4 Da, mass found 27866.8 Da. k-l, same as e and f but for GFP150-3-ThiAHis6 produced with PylRS 7_4. Mass predicted 27866.4 Da, mass found 27866.4 Da. m-n, same as e and f but for GFP150-3-ThiAHis6 produced with PylRS 7_5. Mass predicted 27866.4 Da, mass found 27867.6 Da. o-p, same as e and f but for GFP150-3-ThiAHis6 produced with PylRS 7_6. Mass predicted 27866.4 Da, mass found 27867.6 Da. q-r, same as e and f but for GFP150-3-ThiAHis6 produced with PylRS 7_7. Mass predicted 27866.4 Da, mass found 27867.6Da. s-t, same as e and f but for GFP150-3-ThiA His6 produced with PylRS 7_8. Mass predicted 27866.4 Da, mass found 27868.0 Da. Fig.15. Selection of PylRS variants for (S)-2-amino-3-(pyridin-3-yl)propanoic acid (PyA) 8 by tRNA display. a, Chemical structure of PyA 8. b, Spindle plot resulting from the tRNA display selection as described in Fig.8. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 8), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The five sequences, ordered by selectivity, resulting from the selection for substrate 8 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Amber suppressor activity data for selected PylRS variants measured by the production of GFP150PyAHis6 from GFP150TAGHis6 from cells harboring a pMB1 plasmid encoding either wild type (wt) MmPylRS or the indicated MmPylRS variants, and a p15A plasmid encoding GFP150TAGHis6 in the presence and absence of 4 mM 8. The fluorescence is shown relative to cells harboring wt MmPylRS, MmtRNAPylCUA expressing GFP from GFP150TAGHis6 in the presence of 2 mM substrate 1. e, Raw mass spectrum for purified GFP150PyAHis6 produced with PylRS 8_1. f, Deconvoluted mass spectrum resulting from the primary spectrum shown in panel e. Mass predicted 27861.4 Da, mass found 27860.4 Da. The minor peak labeled -met corresponds to cleavage of the N-terminal methionine residue. g-h, same as e and f but for GFP150PyAHis6 produced with PylRS 8_2. Mass predicted 27861.4 Da, mass found 27862.0 Da. i-j, same as e and f but for GFP150PyAHis6 produced with PylRS 8_3. Mass predicted 27861.4 Da, mass found 27859.6 Da. k-l, same as e and f but for GFP150PyAHis6 produced with PylRS 8_4. Mass predicted 27861.4 Da, mass found 27860.8 Da. Fig.16. Selection of PylRS variants for (S)-2-amino-3-(4-iodophenyl)propanoic acid (pIF) 9 by tRNA display. a, Chemical structure of pIF 9. b, Spindle plot resulting from the tRNA display selection as described in Fig.8. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 9), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least
one control sample. Red dots are sequences only observed in the positive samples. c, Top 25 sequences, ordered by selectivity, resulting from the selection for substrate 9 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Amber suppressor activity data for selected PylRS variants measured by the production of GFP150pIFHis6 from GFP150TAGHis6 from cells harboring a pMB1 plasmid encoding either wild type (wt) MmPylRS or the indicated MmPylRS variants, and a p15A plasmid encoding GFP150TAGHis6 in the presence and absence of 4 mM 9. The fluorescence is shown relative to cells harboring wt MmPylRS, MmtRNAPylCUA expressing GFP from GFP150TAGHis6 in the presence of 2 mM substrate 1. e, Raw mass spectrum for purified GFP150pIFKHis6 produced with PylRS 9_1. f, Deconvoluted mass spectrum resulting from the primary spectrum shown in panel e. Mass predicted 27986.2 Da, mass found 27987.6 Da. The minor peak labeled -met corresponds to cleavage of the N-terminal methionine residue. g-h, same as e and f but for GFP150pIFKHis6 produced with PylRS 9_2. Mass predicted 27986.2 Da, mass found 27986.0 Da. i-j, same as e and f but for GFP150pIFKHis6 produced with PylRS 9_3. Mass predicted 27986.2 Da, mass found 27986.4 Da. k-l, same as e and f but for GFP150pIFKHis6 produced with PylRS 9_4. Mass predicted 27986.2 Da, mass found 27986.4 Da. m-n, same as e and f but for GFP150pIFKHis6 produced with PylRS 9_5. Mass predicted 27986.2 Da, mass found 27986.4 Da. o-p, same as e and f but for GFP150pIFKHis6 produced with PylRS 9_6. Mass predicted 27986.2 Da, mass found 27987.2 Da. Fig.17. Selection of PylRS variants for (S)-2-amino-3-(4-bromothiophen-2-yl)propanoic acid (BrThiA) 10 by tRNA display. a, Chemical structure of BrThiA 10. b, Spindle plot resulting from the tRNA display selection as described in Fig.8. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 10), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, Top 25 sequences, ordered by selectivity, resulting from the selection for substrate 10 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Amber suppressor activity data for selected PylRS variants measured by the production of GFP150BrThiA His6 from GFP150TAGHis6 from cells harboring a pMB1 plasmid encoding either wild type (wt) MmPylRS or the indicated MmPylRS variants, and a p15A plasmid encoding GFP150TAGHis6 in the presence and absence of 4 mM 10. The fluorescence is shown relative to cells harboring wt MmPylRS, MmtRNAPylCUA expressing GFP from GFP150TAGHis6 in the presence of 2 mM substrate 1. e, Raw mass spectrum for purified GFP150BrThiAHis6 produced with PylRS 10_1. f, Deconvoluted mass spectrum resulting from the primary spectrum shown in panel e. Mass predicted 27944.3Da, mass found 27945.6 Da. The minor peak labeled -met corresponds to cleavage of the N-terminal methionine residue. g-h, same as e and f but for GFP150BrThiAHis6 produced with PylRS 10_2. Mass predicted 27944.3Da, mass found 27945.2 Da. i-j, same as e and f but for GFP150BrThiAHis6 produced with PylRS 10_3. Mass predicted 27944.3 Da, mass found 27945.6 Da. k-l, same as e and f but for GFP150BrThiAHis6 produced with PylRS 10_4. Mass predicted 27944.3Da, mass found 27945.2 Da. m-n, same as e and f but for GFP150BrThiAHis6 produced with PylRS 10_5. Mass predicted 27944.3Da, mass found 27944.4 Da. o-p, same as e and f but for GFP150BrThiAHis6 produced with PylRS 10_6. Mass predicted 27944.3Da, mass found 27944.8 Da. Fig.18. Selection of PylRS variants for (2S)-2-amino-3-(((2-((1-(6-nitrobenzo[d][1,3]dioxol- 5yl)ethyl)thio)ethoxy)carbonyl)amino)propanoic acid (pcDAP) 11 by tRNA display. a, Chemical structure of pcDAP 11. b, Spindle plot resulting from the tRNA display selection as described in Fig.8. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 11), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, Top 25 sequences, ordered by selectivity, resulting from the selection for substrate 11 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Amber suppressor activity data for selected PylRS variants measured by the production of GFP150pcDAPHis6 from GFP150TAGHis6 from cells harboring a pMB1 plasmid encoding either wild type (wt) MmPylRS or the indicated
MmPylRS variants, and a p15A plasmid encoding GFP150TAGHis6 in the presence and absence of 4 mM 11. The fluorescence is shown relative to cells harboring wt MmPylRS, MmtRNAPyl CUA expressing GFP from GFP150TAGHis6 in the presence of 2 mM substrate 1. e, Raw mass spectrum for purified GFP150pcDAPHis6 produced with PylRS 11_1. f, Deconvoluted mass spectrum resulting from the primary spectrum shown in panel e. Mass predicted 28096.4 Da, mass found 28093.2 Da. The minor peak labeled -met corresponds to cleavage of the N-terminal methionine residue. g-h, same as e and f but for GFP150pcDAPHis6 produced with PylRS 11_2. Mass predicted 28096.4 Da, mass found 28096.4 Da. i-j, same as e and f but for GFP150pcDAPHis6 produced with PylRS 11_3. Mass predicted 28096.4 Da, mass found 28097.2 Da. k-l, same as e and f but for GFP150pcDAPHis6 produced with PylRS 11_4. Mass predicted 28096.4 Da, mass found 28093.6 Da. Fig.19. Schematic representation of selection strategy for non-canonical monomers. Library 14 was transformed into BL21 cells and grown overnight. Cells were grown to OD600 of 0.3-0.4.4 mL of the library culture was added into stock solutions of each ncM.4 mL of the library culture was also added to a well without ncM. The cells were grown for 40 min, stmRNAs induced, and cells grown for another 20 min and the RNA was isolated. Bio-mREX was performed on the isolated RNA for each sample. The experiment was performed in 4 replicates, leading to 40 cDNA samples. An additional 6 cDNA samples were generated for 6 of the RNA inputs to bio-mREX. The resulting 46 cDNA samples were sequenced by NGS and analyzed to generate spindle plots and sequence tables. Fig.20. Selection of PylRS variants for (S)-3-amino-3-(3-bromophenyl)propanoic acid ((S)β3mBrF) 12 by tRNA display. a, Chemical structure of (S)β3mBrF 12. b, Spindle plot resulting from the tRNA display selection as described in Fig.19. Selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 12), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 14 sequences, ordered by selectivity, resulting from the selection for substrate 12 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Fluoro-tREX for PylRS variants 12_1 to 12_6. Experiments were performed using tRNA extracted from cells harboring a pMB1 plasmid encoding each MmPylRS and MmtRNAPyl in presence and absence of 4 mM 12. Experiments were performed in triplicate. Fig.21. Selection of PylRS variants for (S)-3-amino-6-(((benzyloxy)carbonyl)amino)hexanoic acid ((S)-β3CbzK) 13 by tRNA display. a, Chemical structure of (S)-β3CbzK 13. b, Spindle plot resulting from the tRNA display selection as described in Fig.19. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 13), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, Top 25 sequences, ordered by selectivity, resulting from the selection for substrate 13 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Fluoro-tREX for PylRS variants 13_1 to 13_6. Experiments were performed using tRNA extracted from cells harboring a pMB1 plasmid encoding each MmPylRS and MmtRNAPyl in presence and absence of 4 mM 13. Experiments were performed in triplicate. Fig.22. Selection of PylRS variants for (S)-6-acetamido-3-aminohexanoic acid ((S)-β3AcK) 14 by tRNA display. a, Chemical structure of (S)-β3Ac 14. b, Spindle plot resulting from the tRNA display selection as described in Fig.19. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 14), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 8 sequences, ordered by selectivity, resulting from the selection for substrate 14 with selectivities of ≥ 5, and enrichments of ≥ 5.
Fig.23. Selection of PylRS variants for BocAhx 15 by tRNA display. a, Chemical structure of BocAhx 15. b, Spindle plot resulting from the tRNA display selection as described in Fig.19. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 15), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, Top 25 sequences, ordered by selectivity, resulting from the selection for substrate 15 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Fluoro-tREX for PylRS variants 15_1 to 15_4 as well as wt PylRS. Experiments were performed using tRNA extracted from cells harboring a pMB1 plasmid encoding each MmPylRS and MmtRNAPyl in presence and absence of 4 mM 15. Control samples for wt PylRS were performed in the presence and absence of 4 mM 15 and 2 mM 1. Experiments were performed in triplicate. Fig.24. Selection of PylRS variants for 6-(((benzyloxy)carbonyl)amino)hexanoic acid (CbzAhx) 16 by tRNA display. a, Chemical structure of CbzAhx 16. b, Spindle plot resulting from the tRNA display selection as described in Fig.19. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 16), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 5 sequences, ordered by selectivity, resulting from the selection for substrate 16 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Fluoro-tREX for PylRS variant 16_1. Experiments were performed using tRNA extracted from cells harboring a pMB1 plasmid encoding each MmPylRS and MmtRNAPyl in presence and absence of 4 mM 16. Experiments were performed in triplicate. Fig.25. Selection of PylRS variants for 3-amino-2-((1-ethyl-1H-imidazol-5-yl)methyl)propanoic acid (β2NeH) 17 by tRNA display. a, Chemical structure of β2NeH 17. b, Spindle plot resulting from the tRNA display selection as described in Fig.19. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 17), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 4 sequences, ordered by selectivity, resulting from the selection for substrate 17 with selectivities of ≥ 5, and enrichments of ≥ 5. Fig.26. Selection of PylRS variants for 3-amino-4-(4-bromophenyl)butanoic acid (β3pBrhF) 18 by tRNA display. a, Chemical structure of β3pBrhF 18. b, Spindle plot resulting from the tRNA display selection as described in Fig.19. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 18), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 5 sequences, ordered by selectivity, resulting from the selection for substrate 18 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Fluoro-tREX for PylRS variants 18_1 to 18_2. Experiments were performed using tRNA extracted from cells harboring a pMB1 plasmid encoding each MmPylRS and MmtRNAPyl in presence and absence of 2 mM of either the S or R enantiomer of 18. Experiments were performed in duplicates. Fig.27. Selection of PylRS variants for 2-benzyl-3-hydroxypropanoic acid (β2OH-F) 19. by tRNA display. a, Chemical structure of β2OH-F 19. b, Spindle plot resulting from the tRNA display selection as described in Fig. 19. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples
(+ 19), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 3 sequences, ordered by selectivity, resulting from the selection for substrate 19 with selectivities of ≥ 5, and enrichments of ≥ 5. Fig.28. Selection of PylRS variants for 3-amino-3-phenylpropanoic acid (β3F) 20 by tRNA display. a, Chemical structure of β3F 20. b, Spindle plot resulting from the tRNA display selection as described in Fig.19. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 20), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 5 sequences, ordered by selectivity, resulting from the selection for substrate 20 with selectivities of ≥ 5, and enrichments of ≥ 5. Fig.29. Schematic of tRNA pulldown followed by LC-MS analysis to determine the identity of the monomer on the target tRNA. tRNAs are extracted from cells expressing the tRNA of interest and the cognate orthogonal aaRS, grown in the presence of the ncM. A biotinylated probe is annealed, and the targeted tRNA is pulled down. After washing, the ncM is eluted by alkaline deacylation, derivatised with AQC, and detected using LC-MS. Fig.30. Schematic representation of selection strategy for random mutagenesis selections by tRNA display. We performed an error prone PCR reaction across the active site sequence of PylRS variants 12_1 and 12_2 using the GeneMorph II (Agilent) kit. The diversified PCR amplicons were cloned into a fresh ColE1 plasmid backbone by Golden Gate assembly and the error prone stmRNA libraries transformed into BL21 cells and grown overnight. Cells were grown to OD600 of 0.3-0.4.2.6 mL of the library culture was added into a stock solution of 12 to a concentration of 4 mM.2.6 mL of the library culture was also added to a well without ncM. The cells were grown for 40 min, stmRNAs induced, and cells grown for another 20 min and the RNA was isolated. Bio-mREX was performed on the isolated RNA for each sample. The experiment was performed in four replicates, leading to eight cDNA samples. An additional four cDNA samples were generated for four of the RNA inputs to Bio-mREX. The resulting twelve cDNA samples were sequenced by NGS and analyzed to generate spindle plots and sequence tables. Fig.31. Selection of improved PylRS variants for ((S)β3mBrF) 12 using random mutagenesis libraries by tRNA display. a, Parental active site sequences of (S)β3mBrF PylRS 12_1 and 12_2 used as templates for random mutageneis. b, Spindle plot resulting from the tRNA display selection as described in Fig.30. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 12), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 3, and enrichments of ≥ 3. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. Spindle plot resulting from the tRNA display selection as described in Fig.30. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ 6), divided by the relative abundance in the control sample (- ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 3, and enrichments of ≥ 3. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, Table with non-programmed mutations of evolved PylRS variants ordered by the calculated selectivity from the NGS analysis. We characterised the most selective, the highest enriched, as well as the most abundant variants from the sequences with a selectivities of ≥ 3, and enrichments of ≥ 3. All sequences were based on the parental PylRS variant 12_1. d, Fluoro-tREX data of PylRS variants 12_1evol to
12_1evol8. tRNAs were isolated from DH10β cell harboring a pMB1 plasmid encoding each MmPylRS and MmtRNAPyl in presence and absence of 4 mM 12. We performed fluoro-tREX on the isolated tRNAs. All experiments were caried out in triplicates with similar results. Fig.32. LC-MS assay detecting derivatized (S)β3mBrF 12 on MmtRNAPyl in PylRS variant depending manner. A, Full LC-MS spectrum of data presented in Fig.3 e and f. An additional control in which only the derivatisation agent 6-aminoquinolyl-N-hydroxysccinimidyl carbamate (AQC) was characterised is shown in black. B, Overlay of LC-MS traces of authentic, derivatised standard corresponding to 24 nM of 12 together with 12 isolated from the acylated tRNA of the most active PylRS 12_1evol1. Fig.33. Characterization of protein production by amber readthrough with the PylRS(12_1evol1)/tRNAPylCUA pair. A, Attempt of protein production of GFP3(S)β3mBrFHis6 from
from cells harboring a pMB1 plasmid encoding the PylRS(12_1evol1)/tRNAPylCUA pair and a p15A plasmid encoding GFP3TAGHis6 in the presence and absence of 4 mM 12. No modified protein could be produced by amber read through at position 3 in GFP. Dots represent the mean of two biological replicates, error bars show ± s.d. b, Raw mass spectrum for purified GFP150(S)β3mBrFHis6 produced with PylRS 12_1. The deconvoluted mass spectrum is shown in Fig.3 h. c, MS/MS spectra of ncAA-containing peptides obtained following tryptic digest of GFP150(S)β3mBrFHis6 produced with PylRS 12_1evol1. The precursor ions confirm the incorporation of 12 at position 150 of GFP. Fragmentation of each peptide is predicted to yield a series of b ions (red) and a series of y ions (blue), as well as ions corresponding to the full length peptide (green). D, Raw mass spectrum for purified GFP150(S)β3mBrFHis6 produced with PylRS 12_1evol1. E, intact ESI-MS of GFP(150(S)β3mBrF)His6 purified from cells harbouring PylRS(12_1evol1), MmtRNAPylCUA, and GFP(150TAG)His6 grown in the presence of 4 mM 12. Found mass: 27940.0 Da, predicted mass: 27,938.2 Da. E, MS/MS spectra of ncAA-containing peptides obtained following tryptic digest of GFP150(S)β3mBrFHis6 produced with PylRS 12_1. The precursor ions confirm the incorporation of 12 at position 150 of GFP. Fragmentation of each peptide is predicted to yield a series of b ions (red) and a series of y ions (blue), as well as ions corresponding to the full length peptide (green). Fig.34. Details of structure GFP150(S)β3mBrFHis6 (PDB code 8OVY). A, Detail of protein chain at position 150. The acquired structure for GFP150(S)β3mBrF-His6 (in yellow) is superimposed with the structure of wt GFP used for refinement (PDB: 2B3P), showing the kink in the backbone induced by the β3-amino acid. B, Detail of hydrogen bond network in the beta-barrel at position 150. The GFP150(S)β3mBrF-His6 structure (yellow) is superimposed to the structure of wt GFP (PDB: 2B3P). The kink induced by incorporation of (S)β3mBrF (12) affects the local hydrogen bond at that position, however the remaining contacts of the corresponding beta-strand remain intact. Fig.35. Provides a sequence alignment of various aminoacyl-tRNA synthetases (aaRSs). “*” indicates residues found in all aligned aaRS sequences, “:” indicates residues within strong similarity between the aligned sequences, and “.” indicates residues within some similarity between the aligned sequences. Mm is Methanosarcina mazei; Mb is Methanosarcina barkeri; 1R26 is Methanomethylophilus sp.1R26; Lum 1 is Methanomassiliicoccus luminyensis 1; Nitro is Nitrososphaeria archaeon; Tron is Methanonatronarchaeum thermophilum; Gemm is Gemmatimonadetes bacterium; PGA8 is Peptostreptococcaceae bacterium pGA-8; I2 is Desulfosporosinus sp. I2; Clos is Clostridiales bacterium; D121 is a Deltaproteobacteria bacterium; and D416 is another Deltaproteobacteria. Residues that are not conserved may be regions that can tolerate variability. And so, in embodiments of acyl-tRNA synthetases disclosed herein with a percentage identity to the backbone, these regions are more tolerant of variation from the backbone. Fig.36. tRNA display selection of O-synthetases that charge ncMs. a, ncMs for which selective PylRS mutants were discovered. b, Fluoro-tREX, representative gel for each PylRS variant. Experiments performed in independent triplicates with similar results. c, Selected PylRS variants acylate tRNAPylCUA with 12. LC-MS traces (scanning ion mode on 6-aminoquinolyl-N-hydroxysccinimidyl carbamate (AQC) adduct of 12) of AQC- derivative eluted from tRNA-pull downs. Cells harbouring the corresponding PylRS variant and tRNAPylCUA (or only tRNAPyl CUA (-)) were grown with 12; pulldowns used a biotinylated probe against tRNAPylCUA, representative traces are shown. d, f, h, j. l, n, p, as in b, but with indicated PylRS variants and ncMs. e, g, i, k, m, o, q, as in c, but with indicated PylRS variants and ncMs; e, g, i were performed in duplicates, c, k, m, o, q were performed in tripicate, all replicates yielded similar results. r, Fluorescence from cells containing GFP(150TAG)His6, indicated
PylRS variant and tRNAPylCUA, and grown in the presence or absence of ncM (12, A1, A2, A3, A4, A5, A6, A7, 4 mM). Fluorescence is shown as a fraction of that generated by the wt PylRS/tRNAPyl CUA pair with 4 mM BocK (1) and GFP(150TAG)His6. The bar graphs represent mean of three independent measurements, individual data points shown as dots, the error bar is +/- s.d. s, ESI-MS of GFP(150(S)α-Me-pIF)His6 purified from cells harbouring PylRS(A6_1), tRNAPyl CUA, and GFP(150TAG)His6 grown with A6. Found mass: 28000.5 Da, predicted mass: 28000.2 Da. Spectra were acquired once. Fig.37. LC-MS assay detecting 6-aminoquinolyl-N-hydroxysccinimidyl carbamate (AQC) derivatised ncMs 12 and A1-A7 eluted from acylated MmtRNAPyl in PylRS variant depending manner (see Fig.29 for schematic of the assay). a, Full LC-MS spectrum of data presented in Fig.36c. An authentic, derivatised standard of ncM 12 is shown in gray. The experiments were carried out in triplicates with similar results. b, Zoomed LC-MS spectrum shown in a. c, Full LC-MS spectrum of data presented in Fig.36e. An authentic, derivatised standard of ncM A1 is shown in gray. The experiments were carried out in two replicates with similar results. d, Zoomed LC-MS spectrum shown in c. d, Full LC-MS spectrum of data presented in Fig.36g. An authentic, derivatised standard of ncM A2 is shown in gray. The experiments were carried out in duplicates with similar results. e, Zoomed LC-MS spectrum shown in d. f, Full LC-MS spectrum of data presented in Fig.36i. An authentic, derivatised standard of ncM A3 is shown in gray. The experiments were carried out in two replicates with similar results. g, Zoomed LC- MS spectrum shown in f. h, Full LC-MS spectrum of data presented in Fig.36k. An authentic, derivatised standard of ncM A4 is shown in gray. The experiments were carried out in triplicates with similar results. i, Zoomed LC-MS spectrum shown in h. j, Full LC-MS spectrum of data presented in Fig.36m. An authentic, derivatised standard of ncM A5 is shown in gray. The experiments were carried out in triplicates with similar results. k, Zoomed LC-MS spectrum shown in j. l, Full LC-MS spectrum of data presented in Fig.36o. An authentic, derivatised standard of ncM A6 is shown in gray. The experiments were carried out in triplicates with similar results. m, Zoomed LC-MS spectrum shown in l. n, Full LC-MS spectrum of data presented in Fig.36q. An authentic, derivatised standard of ncM A7 is shown in gray. The experiments were carried out in triplicates with similar results. Fig.38. Schematic representation of two step selection strategy for non-canonical monomers. Library 14 was transformed into BL21 cells and grown overnight. Cells were grown to OD600 of 0.3-0.4.4 mL of the library culture was added into stock solutions of each ncM. The cells were grown for 40 min, stmRNAs induced, and cells grown for another 20 min and the RNA was isolated. Bio-mREX was performed on the isolated RNA for each sample. The experiment was performed in four replicates. For each replicate the cDNA was amplified with primers suitable for Golden Gate assembly. Then all amplicons of the libraries selected for the same ncM were combined at equimolar ratios and cloned into a fresh ColE1vector backbone. This created one preselected library of each ncM. The pre-selected libraries were transformed into BL21 cells and grown over night. Cells were grown to OD600 of 0.3-0.4. For each ncM, 4 mL of the respective preselected library culture was added into stock solutions of the ncM. For each preselected library 4mL of the library culture was also added to a well without ncM. The cells were grown for 40 min, stmRNAs induced, and cells grown for another 20 min and the RNA was isolated. Bio-mREX was performed on the isolated RNA for each sample. The experiment was performed in three replicates, leading to 6 cDNA samples per ncM. An additional three cDNA samples were generated for each ncM using three RNA inputs to bio-mREX of the respective preselected libraries. For each ncM the resulting nine cDNA samples were sequenced by NGS and analyzed to generate spindle plots and sequence tables. Fig.39. Selection of PylRS variants for (S)-3-amino-3-(benzo[d][1,3]dioxol-5-yl)propanoic acid ((S)β3MDF) (A1)by tRNA display. a, Chemical structure of (S)β3MDF A1. b, Spindle plot resulting from the tRNAdisplay selection as described in Figure 38. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ A1), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 25 sequences, ordered by selectivity, resulting from the selection for substrate A1 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Fluoro-tREX for PylRS variants A1_1 to A1_12. Experiments were
performed using tRNA extracted from cells harboring a pMB1 plasmid encoding each MmPylRS and MmtRNAPyl in presence and absence of 4 mM A1. Experiments for A1_1 were performed in triplicates. Fig.40. Selection of PylRS variants for (S)-3-amino-3-(4-bromophenyl)propanoic acid ((S)β3pBrF) (A2) by tRNA display. a, Chemical structure of (S)β3pBrF A2. b, Spindle plot resulting from the tRNA display selection as described in Figure 38. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ A2), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 25 sequences, ordered by selectivity, resulting from the selection for substrate A2 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Fluoro-tREX for PylRS variants A2_1 to A2_12. Experiments were performed using tRNA extracted from cells harboring a pMB1 plasmid encoding each MmPylRS and MmtRNAPyl in presence and absence of 4 mM A2. Experiments for A2_1 were performed in triplicates. Fig.41. Selection of PylRS variants for (S)-3-amino-3-(3,4-difluorophenyl)propanoic acid ((S)β3pmFF) (A3) by tRNA display. a, Chemical structure of (S)β3pmFF A3. b, Spindle plot resulting from the tRNA display selection as described in Figure 38. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ A3), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 25 sequences, ordered by selectivity, resulting from the selection for substrate A3 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Fluoro-tREX for PylRS variants A3_1 to A3_12. Experiments were performed using tRNA extracted from cells harboring a pMB1 plasmid encoding each MmPylRS and MmtRNAPyl in presence and absence of 4 mM A3. Experiments for A3_1 were performed in triplicates. Fig.42. Selection of PylRS variants for (S)-3-amino-3-(2-bromophenyl)propanoic acid ((S)β3oBrF) (A4) by tRNA display. a, Chemical structure of (S)β3oBrF A4. b, Spindle plot resulting from the tRNA display selection as described in Figure 38. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ A4), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 12 sequences, ordered by selectivity, resulting from the selection for substrate A4 with selectivities of ≥ 5, and enrichments of ≥ 5. D, Fluoro-tREX for PylRS variants A4_1 to A4_7. Experiments were performed using tRNA extracted from cells harboring a pMB1 plasmid encoding each MmPylRS and MmtRNAPyl in presence and absence of 4 mM A4. Experiments for A4_1 were performed in triplicates. Fig.43. Selection of PylRS variants for (S)-3-amino-3-(3-(trifluoromethyl)phenyl)propanoic acid (A5) ((S)β3mCF3F) by tRNA display. a, Chemical structure of (S)β3mCF3F A5. B, Spindle plot resulting from the tRNA display selection as described in Figure 38. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ A5), divided by the relative abundance in the control sample (- ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 25 sequences, ordered by selectivity, resulting from the selection for substrate A5 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Fluoro-tREX for PylRS variants A5_1 to A5_8. Experiments were performed using tRNA extracted from cells harboring a pMB1 plasmid encoding each
MmPylRS and MmtRNAPyl in presence and absence of 4 mM A5. Experiments for A5_1 were performed in triplicates. Fig.44. Selection of PylRS variants for (S)-2-amino-3-(4-iodophenyl)-2-methylpropanoic acid (A6) ((S)α-Me- pIF) by tRNA display. a, Chemical structure of (S)α-Me-pIF A6. b, Spindle plot resulting from the tRNA display selection as described in Figure 38. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ A6), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 25 sequences, ordered by selectivity, resulting from the selection for substrate A6 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Fluoro-tREX for PylRS variants A6_1 to A6_9. Experiments were performed using tRNA extracted from cells harboring a pMB1 plasmid encoding each MmPylRS and MmtRNAPyl in presence and absence of 4 mM A6. Experiments for A6_1 were performed in triplicates. Fig.45. Selection of PylRS variants for (S)-3-(3-chlorophenyl)-3-hydroxypropanoic acid (A7) (OH-(S)β3mClF) by tRNA display. a, Chemical structure of OH-(S)β3mClF A7. b, Spindle plot resulting from the tRNA display selection as described in Figure 38. The selectivity is defined as the ratio of the relative abundance of a particular sequence in the positive samples (+ A7), divided by the relative abundance in the control sample (-ncAA). The enrichment is defined as the ratio of the same sequence in the positive samples, divided by the relative abundance in the input library. The boxed region indicates sequences with selectivities of ≥ 5, and enrichments of ≥ 5. Black dots are sequences observed in all positive samples and all control samples. Green dots are sequences observed in all positive samples, and at least one control sample. Red dots are sequences only observed in the positive samples. c, The 14 sequences, ordered by selectivity, resulting from the selection for substrate A7 with selectivities of ≥ 5, and enrichments of ≥ 5. d, Fluoro-tREX for PylRS variants A7_1 to A7_7. Experiments were performed using tRNA extracted from cells harboring a pMB1 plasmid encoding each MmPylRS and MmtRNAPyl in presence and absence of 4 mM A7. Experiments for A7_1 were performed in triplicates. Fig.46. Characterisation of in vivo acylation activity of evolved, ncM specific PylRS variants by tREX, we have previously shown this assay quantitatively reports on in vivo acylation. a, tREX for evolved PylRS variants. tREX was performed on MmtRNAPyl extracted from cells harboring a pMB1 plasmid encoding each MmPylRS variant and MmtRNAPyl grown in the presence of 0, 1, 2 or 4 mM of the indicated ncM. were performed in four replicates. b, Quantification of in vivo acylation activity of evolved PylRS variants. The acylation levels were quantified by taking the ratio of the upper band (acylated tRNA) divided by the sum of the signal intensity of the upper and lower bands (acylated and non-acylated tRNA) in the tREX gels shown in a. The data is shown as a fraction of the acylation activity of wt PylRS with 4 mM BocK (55 ± 1 %), which was set to 1. The dots represent the individual data points, the bars represent the mean and the error bars represent the s.d. We note that this assay is performed under conditions where the acyl-tRNA complexes are not actively consumed by translation in the cell and can accumulate over time. Under these conditions, enzymes with less activity in the cell might lead to comparable levels of measured acylation to enzymes with more activity in the cell. We note that 12, A2, A5, A6 are ribosomally incorporated so must be acylated at a level to support ribosomal translation; this suggests the other monomers may well be acylated at a level that would support translation if they were efficient substrates for other parts of the translational machinery. In future work it may be useful to measure the time-dependent acylation; this is challenging to assess in vivo because the acylation of a tRNA in vivo is likely dominated by the rate of ncM uptake rather than the rate of acylation. Published in vitro acylation assays for kinetic measurements currently require access to either radioactive monomers or radiolabelled tRNAs; the reagents for generating these tRNAs are not currently available, purifying active soluble synthetase for in vitro measurements is challenging and tRNAs for in vitro measurements typically lack post-transcriptional modifications. As a result, in vitro measurements of acylation with this system may not provide insight into in vivo behaviour. Fig.47. Mass spectrometry of ncM containing GFP150XHis6. a, Raw mass spectrum for purified GFP150(S)α-Me- pIF produced from cells containi Pyl His6 ng PylRS A6_1, MmtRNA CUA, GFP(150TAG)His6 and 4 mM A6. The deconvoluted mass spectrum is shown in Fig.36s. b, Raw mass spectrum for purified GFP150(S)β3pBrFHis6
produced from cells containing PylRS A2_1, MmtRNAPylCUA, GFP(150TAG)His6 and 4 mM A2. c, Intact ESI-MS of GFP(150(S)β3pBrF)His6 purified from cells harbouring PylRS(A2_1), MmtRNAPylCUA, and GFP(150TAG)His6 grown in the presence of 4 mM A2. Found mass: 27,939.0 Da, predicted mass: 27,938.2 Da d, Raw mass spectrum for purified GFP150((S)β3mCF3F) His6 produced with PylRS A5_1. e, intact ESI-MS of GFP150((S)β3mCF3F)His6 purified from cells harbouring PylRS(A5_1), MmtRNAPylCUA, and GFP(150TAG)His6 grown in the presence of 4 mM A5. Found mass: 27,927.0 Da, predicted mass: 27,928.3 Da. f, MS/MS spectra of ncAA-containing peptides obtained following tryptic digest of GFP150(S)β3mBrFHis6 produced with produced with PylRS A5_1, or GFP150(S)α-Me-pIFHis6 produced with PylRS A6_1, respectively. The precursor ions confirm the incorporation of A5 or A6 at position 150 of GFP. Fragmentation of each peptide is predicted to yield a series of b ions (red) and a series of y ions (blue), as well as ions corresponding to the full-length peptide. Fig.48. GFP production. a, Production of GFPHis6 (or GFPHis6 incorporating 1 at position 150) was measured by GFP fluorescence. Cell contained a pMB1 plasmid encoding wt PylRS/tRNAPyl CUA pair and a p15A plasmid encoding GFP3TAGHis6 or GFPHis6 in the presence and absence of 4 mM of BocK (1). The yield of GFPHis6 incorporating 1, produced from GFP3TAG , wt Pyl Pyl His6 RS/tRNA CUA pair and 1, was 80% of wtGFPHis6 protein, produced from wtGFPHis6. The fluorescence values shown are normalized by OD. The bars show the mean of three biological replicates, individual replicates are indicated by dots, error bars show ± s.d..). The fluorescence- based yield of GFPHis6 generated by each ncM incorporation is reported relative to the GFPHis6 fluorescence generated by incorporation of 1 in Fig.36r; the percentage yields are: 6.6% (for PylRS variant 12_1 and ncM 12), 14.4% (for PylRS variant 12_1evol1 and ncM 12), 3.7% (for PylRS variant A2_1 and ncM A2), 4.1% (for PylRS variant A5_1 and ncM A5), and 37.0% (for PylRS variant A6_1 and ncM A6). For reference, the Ni-NTA purified yield of GFPHis6, expressed from cells containing a pMB1 plasmid encoding wt PylRS/tRNAPylCUA pair and a p15A plasmid encoding GFP3TAGHis6 in the presence of 4 mM of BocK (1), was 105 ± 8 mg/l culture. The purified yields of the GFP150XHis6 with the relevant ncMs were: 8.4 ± 0.2 mg/l culture for PylRS12_1; 16.2 ± 1.2 mg/l culture for PylRS12_1evo1; 3.6 ± 0.4 mg/l culture for PylRSA2_1; 3.1 ± 0.7 mg/l culture for PylRSA5_1; 35.5 ± 0.9 mg/l culture for PylRSA6_1. b, Attempt of protein production of GFP3(12, A1, A2, A3, A4, A5, A6, A7)His6 from GFP3TAGHis6 from cells harboring a pMB1 plasmid encoding the indicated PylRS variant/tRNAPylCUA pair and a p15A plasmid encoding GFP3TAGHis6 in the presence and absence of 4 mM of the respective ncM. Fluorescence is shown as a fraction of the fluorescence generated by the wt PylRS/tRNAPylCUA pair with 4 mM BocK (1) and GFP(3TAG)His6. The bars show the mean of three biological replicates, individual replicates are indicated by dots, error bars show ± s.d.. DETAILED DESCRIPTION The inventors provide herein methods of measuring the extent of acylation of a tRNA. These methods allow the effect of an agent on the acylation status of a tRNA to be measured. For instance, the methods may be used to: i) determine whether an agent can charge a tRNA, ii) determine the extent or efficiency with which an agent can charge a tRNA, iii) determine whether an agent has an indirect effect on the charging of a tRNA, or iv) whether an agent can chemically alter a substrate in a manner that affects charging or deacylation. The agent may be any polypeptide, nucleic acid, or condition that has or is suspected of having an effect on the acylation of a tRNA. The methods disclosed herein do not require the charged tRNA to be active in a ribosome. Hence, the methods allow the measurement of tRNA acylation with substrates that are not compatible with ribosomal translation, with substrates for which compatible ribosomes have not yet been identified, or with substrates for which compatible ribosomes have not yet been developed. The inventors also provide molecules, tRNAs, nucleic acid constructs, and labelled tRNAs for use with such methods. Method comprising split tRNAs One aspect of the present disclosure relates to the inventors’ demonstration that tRNAs may be split into two or more portions, may be functionally expressed in this format, and may be acylated in this split format. One
advantage of splitting tRNAs in this manner is that the inventors demonstrate that additional sequences may be covalently linked to at least one of the tRNA portions. This then enables methods where the identity of an individual agent capable of effecting charging of a tRNA may be measured, even in large parallel libraries. As an example, the inventors fuse acyl-tRNA synthetase genes to a portion of a tRNA. Under conditions where the tRNA is expressed and the acyl-tRNA synthetase is expressed as a polypeptide, the acylation of the tRNA can then be measured and a direct link between the acylation status and the acyl-tRNA synthetase in question can be made. These methods are exemplified in the Examples section herein. Further details are provided in the documents to which priority is claimed (EP2306393.6, filed 28 April 2023 and EP2400299.0, filed 9 January 2024, each of which is incorporated herein by reference). These methods enabled the development of acyl-tRNA synthetases that could not previously have been developed. Split tRNAs The inventors have developed split tRNAs that find utility in methods of determining the acylation status of a tRNA or efficiency of acylation of a tRNA. These products are exemplified in the Examples section herein. Further details are provided in the documents to which priority is claimed (EP2306393.6, filed 28 April 2023 and EP2400299.0, filed 9 January 2024, each of which is incorporated herein by reference). These products contributed to the development of acyl-tRNA synthetases that could not previously have been developed. RNA fusion molecules The inventors have developed split tRNAs that can tolerate additional fused sequence and still be capable of being charged. For instance, the split tRNAs may comprise additional nucleic acid sequence encoding proteins or nucleic acids. These split RNAs may comprise an RNA molecule that includes a portion of a tRNA and a sequence encoding a polypeptide-of-interest/nucleic-acid-of-interest. These products are exemplified in the Examples section herein. Further details are provided in the documents to which priority is claimed (EP2306393.6, filed 28 April 2023 and EP2400299.0, filed 9 January 2024, each of which is incorporated herein by reference). These products contributed to the development of acyl-tRNA synthetases that could not previously have been developed. Methods of determining the acylation status of a tRNA or efficiency of acylation of a tRNA The inventors have further developed methods of labelling tRNAs that have been acylated. These methods enable the identification of tRNAs that have been charged and are particularly sensitive. An embodiment of these methods is referred to herein as fluoro-tREX. This method enables a specific acylated tRNA, from a pool of tRNAs isolated from cells, to be labelled by primer extension with fluorescent dNTPs. Another embodiment of these methods is referred to herein as bio-tREX. Bio-tREX enables the selective isolation of a specific acylated tRNA, by using primer extension with biotinylated dNTPs and streptavidin pulldown. These methods are exemplified in the Examples section herein. Further details are provided in the documents to which priority is claimed (EP2306393.6, filed 28 April 2023 and EP2400299.0, filed 9 January 2024, each of which is incorporated herein by reference). Nucleic acid constructs The inventors demonstrate herein that split tRNAs can be produced from a single gene, in which the two halves of the tRNA are circularly permuted and linked by an intervening sequence. The primary transcript of this gene is processed in cells to yield a functional split tRNA, which is acylated by an acyl-tRNA synthetase. These constructs are exemplified in the Examples section herein. Further details are provided in the documents to which priority is claimed (EP2306393.6, filed 28 April 2023 and EP2400299.0, filed 9 January 2024, each of which is incorporated herein by reference). These constructs contributed to the development of acyl-tRNA synthetases that could not previously have been developed.
Beta-amino-acid-acyl-tRNA synthetases The inventors have made use of the methods and tools disclosed herein and have identified acyl-tRNA synthetases that are capable of acylating tRNAs with beta amino acids. Thus, in a first aspect, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300A, M300C, M300D, M300M, or M300S relative to SEQ ID NO: 1; A302A, A302C, A302D, A302G, A302H, A302L, A302N, or A302Y relative to SEQ ID NO: 1; and N346A, N346C, N346G, N346S, N346T, or N346V relative to SEQ ID NO: 1. In an embodiment, the acyl-tRNA synthetase comprises mutations corresponding to: M300A, M300C, M300D, or M300S mutation relative to SEQ ID NO: 1; A302C, A302D, A302G, A302H, A302L, A302N, or A302Y mutation relative to SEQ ID NO: 1; and N346A, N346C, N346G, N346S, N346T, or N346V mutation relative to SEQ ID NO: 1. An acyl-tRNA synthetase is capable of “specifically” acylating a tRNA with a particular substrate when the synthetase preferentially acylates the tRNA with said substrate in the presence of the canonical amino acids. In particular embodiments, the acyl-tRNA synthetase may be expressed in a cell in the presence of the substrate and the target tRNA and, of the acylated target tRNAs within the cell, at least 60%, 70%, 80%, 90%, 95%, or 99% are acylated with the particular substrate. As an illustrative example, the cell may be a bacterial cell such as an E. coli cell. The beta amino acid may be an analogue of S beta 3 phenylalanine. In particular, the beta amino acid may be (S)- 3-amino-3-(3-bromophenyl)propanoic acid. The beta amino acid may be (S)-3-amino-3-(benzo[d][1,3]dioxol-5- yl)propanoic acid ((S)β3MDF). The beta amino acid may be (S)-3-amino-3-(4-bromophenyl)propanoic acid ((S)β3pBrF). The beta amino acid may be (S)-3-amino-3-(3,4-difluorophenyl)propanoic acid ((S)β3pmFF). The beta amino acid may be (S)-3-amino-3-(2-bromophenyl)propanoic acid ((S)β3oBrF). The beta amino acid may be (S)-3-amino-3-(3-(trifluoromethyl)phenyl)propanoic acid ((S)β3mCF3F). In a particular embodiment, the acyl-tRNA synthetase comprises mutations corresponding to: M300D and A302H. M300D and A302H were the most common mutations found for screens of any of the beta amino acids tested. In a particular embodiment, the acyl-tRNA synthetase comprises mutations corresponding to: M300D; A302H; and N346G, N346A, or N346S. In a particular embodiment, the acyl-tRNA synthetase comprises mutations corresponding to: M300D; A302H; and N346G/N346A. In a particular embodiment, the M300 mutation is M300D, the A302 mutation is A302H and the N346 mutation is N346G. In embodiment, particularly where the beta amino acid is (S)-3-amino-3-(3-bromophenyl)propanoic acid, the acyl- tRNA synthetase comprises an amino acid sequence comprising mutations corresponding to: an M300D mutation relative to SEQ ID NO: 1; a A302H, A302Y, or A302C mutation relative to SEQ ID NO: 1; and an N346G, N346A, or N346S mutation relative to SEQ ID NO: 1. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3-bromophenyl)propanoic acid, the M300 mutation is M330D. In examples, particularly where the beta amino acid is (S)-3-amino-3-(benzo[d][1,3]dioxol-5- yl)propanoic acid ((S)β3MDF), the M300 mutation is M300A, M300C, M300D, or is the wild type residue (M300M) –M300D was the most common in the screen. In examples, particularly where the beta amino acid is (S)-3-amino- 3-(4-bromophenyl)propanoic acid ((S)β3pBrF), the M300 mutation is M300C, M300D, M300S, or is the wild type residue (M300M) – M300D was the most common in the screen. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3,4-difluorophenyl)propanoic acid ((S)β3pmFF), the M300 residue is M300A, M300C, M300D, or wild-type M300M - M300D was the most common in the screen. In examples, particularly where the
beta amino acid is ((S)-3-amino-3-(2-bromophenyl)propanoic acid ((S)β3oBrF), the M300 residue is M300D, M300S, or wild-type M300M - M300D was the most common in the screen. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3-(trifluoromethyl)phenyl)propanoic acid ((S)β3mCF3F), the mutation is M300D. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3-bromophenyl)propanoic acid, the A302 mutation is A302C, A302H, or A302Y. In examples, particularly where the beta amino acid is (S)-3-amino-3- (benzo[d][1,3]dioxol-5-yl)propanoic acid ((S)β3MDF), the A302 mutation is wild type (A302A), A302C, or A302H – A302H was the most common in the screen. In examples, particularly where the beta amino acid is (S)-3-amino- 3-(4-bromophenyl)propanoic acid ((S)β3pBrF) the A302 mutation is A302G or A302H - A302H was the most common in the screen. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3,4- difluorophenyl)propanoic acid ((S)β3pmFF), the A302 mutation is A302D, A302H, A302L, or A302N - A302H was the most common in the screen. In examples, particularly where the beta amino acid is ((S)-3-amino-3-(2- bromophenyl)propanoic acid ((S)β3oBrF), the A302 mutation is A302D or A302H - - A302H was the most common in the screen. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3- (trifluoromethyl)phenyl)propanoic acid ((S)β3mCF3F), the A302 mutation is A302H, A302N, or A302Y; or, in particular, A302H or A302Y. The amino acid sequence may comprise a mutation at a position corresponding to L305 of SEQ ID NO: 1. In examples, the mutation is L305C, L305F, L305G, L305H, L305I, L305N, or L305V, or a conservative substitution of said residues or the wild-type residue. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3- bromophenyl)propanoic acid, the mutation is L305H or L305C, or a conservative substitution of said residues or the wild-type residue. In examples, particularly where the beta amino acid is (S)-3-amino-3-(benzo[d][1,3]dioxol- 5-yl)propanoic acid ((S)β3MDF), the mutation is L305C, L305G, L305H, L305I, or L305N, or the wild-type residue; in particular, the mutation is L305C, L305I, or L305L. In examples, particularly where the beta amino acid is (S)- 3-amino-3-(4-bromophenyl)propanoic acid ((S)β3pBrF), the mutation is L305C, L305F, L305I, L305N, L305V, or wild-type (L305L); or, in particular, L305F, L305I, L305L, or L305N. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3,4-difluorophenyl)propanoic acid ((S)β3pmFF), the mutation is L305C, L305F, L305H, L305I, L305V, or wild-type (L305L); or, in particular, L305C, L305I, L305L, or L305V. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3-(trifluoromethyl)phenyl)propanoic acid ((S)β3mCF3F), the mutation is L305F, L305I, or L305V, or wild-type (L305L). The inventors have identified functional beta- amino-acid-tRNA synthetases that do not comprise a L305 mutation, which may be referred to as comprising L305L, and so this mutation is not present in all embodiments. L305L is particularly relevant to examples where the beta amino acid is ((S)-3-amino-3-(2-bromophenyl)propanoic acid ((S)β3oBrF). The amino acid sequence may comprise a mutation at a position corresponding to Y306 of SEQ ID NO: 1. In examples, the mutation is Y306C, Y306F, Y306H, Y306I, Y306L, Y306N, Y306R, or Y306V, or a conservative substitution of said residues or the wild-type residue. In examples, particularly where the beta amino acid is (S)-3- amino-3-(3-bromophenyl)propanoic acid, the mutation is Y306L, Y306F, Y306R, or Y306C, or a conservative substitution of said residues or the wild-type residue. In examples, particularly where the beta amino acid is (S)-3- amino-3-(benzo[d][1,3]dioxol-5-yl)propanoic acid ((S)β3MDF), the mutation is Y306H, Y306I, or Y306L, or the wild-type residue. In examples, particularly where the beta amino acid is (S)-3-amino-3-(4-bromophenyl)propanoic acid ((S)β3pBrF), the mutation is Y306C or Y306I, or is wild-type (Y306Y); or, in particular, Y306I or Y306Y. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3,4-difluorophenyl)propanoic acid ((S)β3pmFF), Y306F, Y306I, Y306L, Y306R, or wild-type (Y306Y); or, in particular, Y306I, Y306L, or Y306Y. In examples, particularly where the beta amino acid is ((S)-3-amino-3-(2-bromophenyl)propanoic acid ((S)β3oBrF), the mutation is Y306H, Y306I, Y306L, Y306R, or wild-type (Y306Y); or, in particular, Y306H, Y306L, Y306R, or Y306Y. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3-(trifluoromethyl)phenyl)propanoic acid ((S)β3mCF3F), the mutation is Y306F, Y306I, Y306L, Y306N, Y306R, or Y306V, or wild-type (Y306Y); or, in particular, Y306F, Y306L, or Y306Y. The inventors have identified functional beta-amino-acid-tRNA synthetases
that do not comprise a Y306 mutation, which may be referred to as comprising Y306Y, and so this mutation is not present in all embodiments. The amino acid sequence may comprise a mutation at a position corresponding to L309 of SEQ ID NO: 1. In examples, the mutation is L309C, L309F, L309G, L309H, L309I, L309N, L309S, L309V, or L309Y, or a conservative substitution of said residues or the wild-type residue. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3-bromophenyl)propanoic acid, the mutation is L309V or L309C, or a conservative substitution of said residues or the wild-type residue. In examples, particularly where the beta amino acid is (S)-3- amino-3-(benzo[d][1,3]dioxol-5-yl)propanoic acid ((S)β3MDF), the mutation is L309C, L309F, L309H, L309N, or L309S, or the wild-type residue; or, in particular, L309C, L309F, L309H, L309N, or the wild-type residue. In examples, particularly where the beta amino acid is (S)-3-amino-3-(4-bromophenyl)propanoic acid ((S)β3pBrF), the mutation is L309C, L309F, L309G, L309I, L309N, L309S, L309V, or L309Y, or is wild-type (L309L); or, in particular, L309C, L309F, L309L, or L309N. In examples, particularly where the beta amino acid is (S)-3-amino- 3-(3,4-difluorophenyl)propanoic acid ((S)β3pmFF), the mutation is L309C, L309G, L309H, L309N, L309S, L309V, or wild-type (L309L); or, in particular, L309G, L309H, L309L, L309N, L309S, or L309V. In examples, particularly where the beta amino acid is ((S)-3-amino-3-(2-bromophenyl)propanoic acid ((S)β3oBrF), the mutation is L309C, L309G, L309H, or wild-type (L309L); or, in particular, L309G, L309H, or L309L. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3-(trifluoromethyl)phenyl)propanoic acid ((S)β3mCF3F), the mutation is L309C, L309G, L309H, L309N, L309V, or wild-type (L309L); or, in particular, L309C, L309L, L309N, or L309V. The inventors have identified functional beta-amino-acid-tRNA synthetases that do not comprise a L309 mutation, which may be referred to as comprising L309L, and so this mutation is not present in all embodiments. The amino acid sequence may comprise a mutation at a position corresponding to M344 of SEQ ID NO: 1. In examples, the mutation is M344E, M344G, or M344Q, or a conservative substitution of said residues or the wild- type residue. In examples, particularly where the beta amino acid is (S)-3-amino-3-(benzo[d][1,3]dioxol-5- yl)propanoic acid ((S)β3MDF), the mutation is M344E or M344Q, or the wild-type residue. In examples, particularly where the beta amino acid is (S)-3-amino-3-(4-bromophenyl)propanoic acid ((S)β3pBrF), the mutation is M344Q or is wild-type. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3,4-difluorophenyl)propanoic acid ((S)β3pmFF), the mutation is M344Q or wild-type (M344M) – M344Q was the most common in the screen. In examples, particularly where the beta amino acid is ((S)-3-amino-3-(2-bromophenyl)propanoic acid ((S)β3oBrF), the mutation is M344G, M344Q, or wild-type (M344M) – M344Q was the most common in the screen. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3-(trifluoromethyl)phenyl)propanoic acid ((S)β3mCF3F), the mutation is M344Q or wild-type (M344M) – M344M was the most common in the screen. The inventors have identified functional beta-amino-acid-tRNA synthetases that comprise the wild-type residue at a position corresponding to M344 of SEQ ID NO: 1. This may be referred to as comprising M344M. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3-bromophenyl)propanoic acid, the sequence comprises M344M. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3-bromophenyl)propanoic acid, the N346 mutation is N346G. In examples, particularly where the beta amino acid is (S)-3-amino-3-(benzo[d][1,3]dioxol-5- yl)propanoic acid ((S)β3MDF), the N346 mutation is N346A, N346C, N346G, N346S, N346T, or N346V; or, in particular, N346G or N346A. In examples, particularly where the beta amino acid is (S)-3-amino-3-(4- bromophenyl)propanoic acid ((S)β3pBrF), the N346 mutation is N346A, N346C, N346G, N346S, N346T, N346V; or, in particular, N346A or N346G. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3,4- difluorophenyl)propanoic acid ((S)β3pmFF), the N346 mutation is N346A, N346C, N346G, N346S, or N346T – N346G was the most common in the screen. In examples, particularly where the beta amino acid is ((S)-3-amino- 3-(2-bromophenyl)propanoic acid ((S)β3oBrF)) the N346 mutation is N346A, N346G, N346S, or N346T; or, in particular, N346A, N346G, or N346S; or, more particularly, N346G or N346S. In examples, particularly where the beta amino acid is ((S)-3-amino-3-(3-(trifluoromethyl)phenyl)propanoic acid ((S)β3mCF3F), the N346 mutation is N346G.
The amino acid sequence may comprise a mutation at a position corresponding to C348 of SEQ ID NO: 1. In examples, the mutation is C348A, C348F, C348G, C348H, C348I, C348L, C348M, C348N, C348R, C348S, C348T, C348V, or C348Y, or a conservative substitution of said residues or the wild-type residue. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3-bromophenyl)propanoic acid, the mutation is C348I, C348L, C348V, or C348F, or a conservative substitution of said residues or the wild-type residue. In particular examples, particularly where the beta amino acid is (S)-3-amino-3-(3-bromophenyl)propanoic acid, the mutation is C348F, C348I, or C348L. In examples, particularly where the beta amino acid is (S)-3-amino-3- (benzo[d][1,3]dioxol-5-yl)propanoic acid ((S)β3MDF), the mutation is C348G, C348I, C348L, C348M, C348N, C348S, C348V, or C348Y; or, in particular, C348G, C348N, C348S, or C348V. In examples, particularly where the beta amino acid is (S)-3-amino-3-(4-bromophenyl)propanoic acid ((S)β3pBrF), the mutation is C348F, C348G, C348H, C348I, C348N, or C348S, or is wild-type (C348C); or, in particular, C348C, C348G, C348I, or C348S. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3,4-difluorophenyl)propanoic acid ((S)β3pmFF), the mutation is C348A, C348F, C348H, C348I, C348L, C348R, C348S, C348T, C348V, or wild- type (C348C); or, in particular, C348F, C348L, C348S, or C348V. In examples, particularly where the beta amino acid is ((S)-3-amino-3-(2-bromophenyl)propanoic acid ((S)β3oBrF)) the mutation is C348G, C348I, C348S, C348V, C348Y, or wild-type (C348C); or, in particular, the mutation is C348C, C348I, C348S, or C348Y. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3-(trifluoromethyl)phenyl)propanoic acid ((S)β3mCF3F), the mutation is C348F, C348H, C348I, C348L, C348S, C348V, or C348Y, or wild-type (C348C); or, in particular, C348F, C348H, or C348V. The inventors have identified functional beta-amino-acid-tRNA synthetases that do not comprise a C348 mutation, which may be referred to as comprising C348C, and so this mutation is not present in all embodiments. The amino acid sequence may comprise the wild-type residue at a position corresponding to Y384 of SEQ ID NO: 1. This may be referred to as comprising Y384Y. The amino acid sequence may comprise the wild-type residue at a position corresponding to S399 of SEQ ID NO: 1. This may be referred to as comprising S399S. The amino acid sequence may comprise a mutation at a position corresponding to V401 of SEQ ID NO: 1. In examples, this residue is any of the 20 canonical amino acids (i.e. V401X). In examples, the mutation is V401A, V401C, V401F, V401K, V401L, V401S, or V401T, or a conservative substitution of said residues or the wild-type residue. In examples, the mutation is V401A, V401C, V401K, V401L, V401S, or V401T, or a conservative substitution of said residues or the wild-type residue. In particular examples, particularly where the beta amino acid is (S)-3-amino-3-(3-bromophenyl)propanoic acid, the mutation is V401C, V401S, V401L, V401K, or V401A, or a conservative substitution of said residues or the wild-type residue. In examples, particularly where the beta amino acid is (S)-3-amino-3-(benzo[d][1,3]dioxol-5-yl)propanoic acid ((S)β3MDF), the mutation is V401A, V401C, V401K, or V401L, or the wild-type residue (V401V); or, in particular, V401C, V401L, or V401V. In examples, particularly where the beta amino acid is (S)-3-amino-3-(4-bromophenyl)propanoic acid ((S)β3pBrF), the mutation is V401A, V401C, V401K, or V401L, or is wild-type (V401V); or, in particular, V401C, V401L, or V401V. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3,4-difluorophenyl)propanoic acid ((S)β3pmFF), the mutation is V401C, V401F, V401K, V401L, V401T, or wild-type (V401V); or, in particular, V401C, V401L, or V401V. In examples, particularly where the beta amino acid is ((S)-3-amino-3-(2-bromophenyl)propanoic acid ((S)β3oBrF)) the mutation is V401A, V401C, V401K, V401L, or wild-type (V401V); or, in particular, V401C or V401V. In examples, particularly where the beta amino acid is (S)-3-amino-3-(3-(trifluoromethyl)phenyl)propanoic acid ((S)β3mCF3F), the mutation is V401C, V401K, V401L, V401S, or V401T, or the wild-type residue; or, in particular, V401S, V401T, or V401V. The inventors have identified functional beta-amino-acid-tRNA synthetases that do not comprise a V401 mutation, which may be referred to as comprising V401V, and so this mutation is not present in all embodiments.
In a particular embodiment, the acyl-tRNA synthetase comprises residues corresponding to: M300D, A302H, M344M/M344Q, and N346G/N346A, and optionally a C348 mutation. The C348 may, in some embodiments, be any of C348A, C348F, C348G, C348H, C348I, C348L, C348M, C348N, C348R, C348S, C348T, C348V, or C348Y, or more particularly, any of C348F, C348G, C348H, C348I, C348L, C348N, C348S, C348V, or C348Y. The inventors provide particular embodiments in Table 1. These embodiments are defined relative to SEQ ID NO: 1 and, for instance, 12_1 comprises M300D, A302H, L305H, etc. The inventors have tested each of the below sets of mutations when applied to the MmPylRS backbone (SEQ ID NO: 1) and have found that each is functional with regards to charging a tRNA with a beta amino acid. In particular, these embodiments are capable of charging a tRNA with (S)-3-amino-3-(3-bromophenyl)propanoic acid. Table 1 - Mutations from exemplary functional beta-amino-acid-tRNA synthetases ((S)-3-amino-3-(3- bromophenyl)propanoic acid) hit name M300 A302 L305 Y306 L309 M344 N346 C348 Y384 S399 V401 12_1 D H H Y L M G F Y S C 12_2 D H L L L M G L Y S S 12_3 D H L L L M G L Y S L 12_4 D C C F V M G L Y S C 12_A D Y L R L M G I Y S V 12_B D H L R L M G L Y S V 12_C D Y L C C M G I Y S V 12_D D H C F L M G C Y S K 12_E D H C F L M G L Y S A The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations in Table 1. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and wild-type residues in Table 1. These mutations are particularly suitable for beta-amino-acid-tRNA synthetases that are capable of incorporating the beta-amino acid listed in the table heading. In an embodiment there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising mutations corresponding to: M300D, A302H, L305H, N346G, C348F, and V401C. In an embodiment there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300D, A302H, L305H, Y306Y, L309L, M344M, N346G, C348F, Y384Y, S399S, and V401C. In an embodiment there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising mutations corresponding to: M300D, A302H, Y306L, N346G, C348L, and V401S. In an embodiment there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300D, A302H, L305L, Y306L, L309L, M344M, N346G, C348L, Y384Y, S399S, and V401S. In an embodiment there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising mutations corresponding to: M300D, A302H, Y306L, N346G, C348L, and V401L. In an embodiment there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300D, A302H, L305L, Y306L, L309L, M344M, N346G, C348L, Y384Y, S399S, and V401L.
In an embodiment there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising mutations corresponding to: M300D, A302C, L305C, Y306F, L309V, N346G, C348L, and V401C. In an embodiment there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300D, A302C, L305C, Y306F, L309V, M344M, N346G, C348L, Y384Y, S399S, and V401C. In addition to the sets of mutations and optionally wild-type residues discussed above, the acyl-tRNA synthetase may further comprise a mutation or any combination of mutations at positions corresponding the following positions in SEQ ID NO: 1: F295, N307, T364, F366, K375, T387, D414, and G421. In a particular embodiment, the mutation may be any one or any combination of: F295L/F295I, N307K, T364A, F366L, K375R, T387I/T387S, D414V, and G421A, or a conservative substitution of said residues or the wild-type residue. Table 2 - Mutations from exemplary functional beta-amino-acid-tRNA synthetases ((S)-3-amino-3- (benzo[d][1,3]dioxol-5-yl)propanoic acid ((S)β3MDF))
The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations in Table 2. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and wild-type residues in Table 2. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and optionally wild-type residues that are associated with an identifier (e.g. A1_2) in Table 2. These mutations are particularly suitable for beta-amino-acid-tRNA synthetases that are capable of incorporating the beta-amino acid listed in the table heading.
Table 3 - Mutations from exemplary functional beta-amino-acid-tRNA synthetases ((S)-3-amino-3-(4- bromophenyl)propanoic acid ((S)β3pBrF))
The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations in Table 3. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and wild-type residues in Table 3. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and optionally wild-type residues that are associated with an identifier (e.g. A2_2) in Table 3. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and optionally wild-type residues that are associated with any one of A2_1, A2_2, A2_3, A2_5, A2_6, A2_7, A2_8, A2_9, A2_10, A2_11, or A2_12 in Table 3.These mutations are particularly suitable for beta-amino-acid- tRNA synthetases that are capable of incorporating the beta-amino acid listed in the table heading.
Table 4 - Mutations from exemplary functional beta-amino-acid-tRNA synthetases ((S)-3-amino-3-(3,4- difluorophenyl)propanoic acid ((S)β3pmFF))
The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations in Table 4. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and wild-type residues in Table 4. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and optionally wild-type residues that are associated with an identifier (e.g. A3_2) in Table 4. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and optionally wild-type residues that are associated with any one of A3_1, A3_2, A3_3, A3_4, A3_5, A3_7, A3_8, A3_9, A3_10, or A3_11 in Table 4. These mutations are particularly suitable for beta-amino-acid-tRNA synthetases that are capable of incorporating the beta-amino acid listed in the table heading.
Table 5 - Mutations from exemplary functional beta-amino-acid-tRNA synthetases ((S)-3-amino-3-(2- bromophenyl)propanoic acid ((S)β3oBrF))
The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations in Table 5. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and wild-type residues in Table 5. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and optionally wild-type residues that are associated with an identifier (e.g. A4_3) in Table 5. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and optionally wild-type residues that are associated with any one of A4_1, A4_3, A4_6, or A4_7 in Table 5. These mutations are particularly suitable for beta-amino-acid-tRNA synthetases that are capable of incorporating the beta-amino acid listed in the table heading.
Table 6 - Mutations from exemplary functional beta-amino-acid-tRNA synthetases ((S)-3-amino-3-(3- (trifluoromethyl)phenyl)propanoic acid ((S)β3mCF3F))
The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations in Table 6. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and wild-type residues in Table 6. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and optionally wild-type residues that are associated with an identifier (e.g. A5_2) in Table 6. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and optionally wild-type residues that are associated with any one of A5_1, A5_2, A5_3, A5_4, A5_5, A5_6, or A5_8 in Table 6. These mutations are particularly suitable for beta-amino-acid-tRNA synthetases that are capable of incorporating the beta-amino acid listed in the table heading. In some embodiments, the acyl-tRNA synthetase may further comprise any of the following sets of mutations: a) N307K and F366L b) F295I and T387I c) G421A d) T364A and T387I e) T387S and D414V f) F295L and K375R.
In the experimental data, the mutations referred to as “evol1” are set “a)”, the mutations referred to as “evol2” are set “b)”, the mutations referred to as “evol3” are set “c)”, the mutations referred to as “evol5” are set “d)”, the mutations referred to as “evol7” are set “e)”, and the mutations referred to as “evol8” are set “f)”. For instance, “12_1evol1” has the mutations of “12_1” in Table 1 and the mutations of set “a)” above. In a particular embodiment, any of set a), b), c), d), e), or f) may be present in combination with any set of mutations and optionally wild-type residues in any of Tables 1, 2, 3, 4, 5, or 6. In a preferred embodiment, any of set a), b), c), d), e), or f) is present in combination with the mutations and optionally wild-type residues of any set associated with an identifier (e.g.12_1) in any of Tables 1, 2, 3, 4, 5, or 6. The mutations of the first aspect are provided with reference to SEQ ID NO: 1, which is the sequence of wild type Methanosarcina mazei pyrrolysyl-tRNA synthetase (MmPylRS) and provided below. MDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVVNNSRSSRTARALRHHKYRKTCKRCRVSD EDLNKFLTKANEDQTSVKVKVVSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAIPVSTQESVSVPASV STSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRK KDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPN LYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDS CMVYGDTLDVMHGDLELSSAVVGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 1) To aid alignment, a version of the sequence of MmPylRS where an “X” is marked at positions 300, 302, 305, 306, 309, 346, 348, 384, 401, is provided below. MDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVVNNSRSSRTARALRHHKYRKTCKRCRVSD EDLNKFLTKANEDQTSVKVKVVSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAIPVSTQESVSVPASV STSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRK KDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPXLXPN XXNYXRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLXFXQMGSGCTRENLESIITDFLNHLGIDFKIVGDS CMVXGDTLDVMHGDLELSSAXVGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 2) Thus, in an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, said synthetase comprising an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2, and said amino acid sequence comprising any of the mutations or sets of mutations disclosed in relation to the first aspect. In particular, the amino acid sequence may have at least 80%, 85%, 90%, 95%, or 99% identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2. In an embodiment, the amino acid sequence may be identical to SEQ ID NO: 1 or SEQ ID NO: 2 apart from the recited mutations. The acyl-tRNA synthetase may comprise an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2 and comprising an M300D mutation, a A302H, A302Y, or A302C mutation, and an N346G, N346A, or N346S mutation. The amino acid sequence may comprise mutations corresponding to the mutations of any one of 12_1, 12_2, 12_3, 12_4, 12_A, 12_B, 12_C, or 12_D in Table 1. The amino acid sequence may comprise residues corresponding to the mutations and wild-type residues of any one of 12_1, 12_2, 12_3, 12_4, 12_A, 12_B, 12_C, or 12_D in Table 1. The amino acid sequence may further comprise mutations corresponding to the mutations listed as sets a), b), c), d), e), or f) herein.
The acyl-tRNA synthetase may comprise an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2 and comprising any of sets of mutations in any of Tables 1 to 6. The acyl-tRNA synthetase may comprise an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2 and comprising any of sets of mutations and wild-type residues in any of Tables 1 to 6. The acyl-tRNA synthetase may comprise an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2 and comprising any of sets of mutations, or sets of mutations and wild-type residues, labelled as A1_1, A1_2, A1_3, A1_4, A1_5, A1_6, A1_7, A1_8, A1_9, A1_10, A1_11, A1_12, A2_1, A2_2, A2_3, A2_5, A2_6, A2_7, A2_8, A2_9, A2_10, A2_11, A2_12, A3_1, A3_2, A3_3, A3_4, A3_5, A3_7, A3_8, A3_9, A3_10, A3_11, A4_1, A4_3, A4_6, A4_7, A5_1, A5_2, A5_3, A5_4, A5_5, A5_6, or A5_8 in Tables 2 to 6. The amino acid sequence may further comprise mutations corresponding to the mutations listed as sets a), b), c), d), e), or f) herein. In a preferred embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising mutations corresponding to: M300D, A302H, L305H, N307K, N346G, C348F, F366L, and V401C relative to SEQ ID NO: 1. The acyl-tRNA synthetase may comprise an amino acid sequence comprising residues corresponding to: M300D, A302H, L305H, Y306Y, N307K, L309L, M344M, N346G, C348F, F366L, Y384Y, S399S, and V401C relative to SEQ ID NO: 1. The amino acid sequence may have at least 80%, 85%, 90%, 95%, or 99% identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2. In an embodiment, the amino acid sequence may be identical to SEQ ID NO: 1 or SEQ ID NO: 2 apart from the recited mutations. The skilled person would also be able to apply the identified mutations to other PylRS backbones, including backbones not disclosed herein. This is despite the fact that other PylRSs may have a low sequence identity to MmPylRS. In order to apply the mutations recited with reference to MmPylRS to other synthetase, the sequences should be aligned to identify corresponding residues. In particular, the sequences representing the catalytic sites of the synthetases may be aligned. As such, the region of SEQ ID NO: 1 extending from residue 296 to residue 428 may be aligned to the sequence of another PylRS, and the corresponding mutations may be made. Further information on the transfer of mutations from one PylRS backbone to another is provided in WO2013/171485 A1 (herein incorporated by reference). An exemplary alignment is provided in Figure 35. As such, in an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising any mutations and optionally wild-type residues disclosed in relation to the first aspect, or any set of mutations and optionally wild-type residues disclosed in relation to the first aspect, and wherein the acyl-tRNA synthetase backbone is a PylRS. The mutations may be applied to the PylRS from Nitrososphaeria archaeon (Nitra). This sequence is provided below. MSKIRFTRGQIHRLIELGAEPTELERDFETEAERDKEFNKIAENLARKNLKNIKDFLEQRRKPLVRVIEEKLRTTA LRLGFSEVVTPIIIPRLFIKRMGIDEGDPLWKQVMLIDDKRALRPMLAPNLYVLMAKLSNIVRPVKIFEIGPCFRR ETGGRYHLEEFTMFNMVELAPEGDPKERLLDYIDTIMRDIGLNYTISVEPSNVYGETLDVVVNGIEVASAAIGPKP IDANWGVREPWIGVGFGVERLAMLVGGYNSIARIAKSLSYLDGSTLSVIKLRW (SEQ ID NO: 3) To aid alignment, a version of the sequence of NitraPylRS is provided below, wherein an “X” is marked at positions corresponding to 300, 302, 305, 306, 309, 346, 348, 384, 401 in the MmPylRS sequence. These are positions to which the respective mutations of the present invention may be applied. Further corresponding positions may be identified in the same manner. MSKIRFTRGQIHRLIELGAEPTELERDFETEAERDKEFNKIAENLARKNLKNIKDFLEQRRKPLVRVIEEKLRTTA LRLGFSEVVTPIIIPRLFIKRMGIDEGDPLWKQVMLIDDKRALRPXLXPNXXVLXAKLSNIVRPVKIFEIGPCFRR
ETGGRYHLEEFTMFXMXELAPEGDPKERLLDYIDTIMRDIGLNYTISVEPSNVXGETLDVVVNGIEVASAXIGPKP IDANWGVREPWIGVGFGVERLAMLVGGYNSIARIAKSLSYLDGSTLSVIKLRW (SEQ ID NO: 4) The mutations may be applied to the PylRS from Clostridiales bacterium (Clos). This sequence is provided below. MENFTITQTERLKQLNCENDVLELEFEDSEARNSKFREIEIGRVKKGKENIKNLLKEKHITISDEVGNKLSDWLMS KDYTKVLTPTIISKDQLKAMTIDEENHLFSQVFWIDNNKCLRPMLAPNLYIVMRELKRITNEPVKIFEIGSCFRKE SQGARHMNEFTMLNMVELASVEDGKQLDTLKALAHEAMESLGVESYELVIEESAVYGSTLDIEIDGIEVASGSYGP HELDANWDIFDTWVGIGFGIERLAMAINGGSTIKKYGRSINFIDGETMKL (SEQ ID NO: 5) To aid alignment, a version of the sequence of ClosPylRS is provided below, wherein an “X” is marked at positions corresponding to 300, 302, 305, 306, 309, 346, 348, 384, 401 in the MmPylRS sequence. Thus, these are positions to which the respective mutations of the present invention may be applied. Further corresponding positions may be identified in the same manner. MENFTITQTERLKQLNCENDVLELEFEDSEARNSKFREIEIGRVKKGKENIKNLLKEKHITISDEVGNKLSDWLMS KDYTKVLTPTIISKDQLKAMTIDEENHLFSQVFWIDNNKCLRPXLXPNXXIVXRELKRITNEPVKIFEIGSCFRKE SQGARHMNEFTMLXMXELASVEDGKQLDTLKALAHEAMESLGVESYELVIEESAVXGSTLDIEIDGIEVASGXYGP HELDANWDIFDTWVGIGFGIERLAMAINGGSTIKKYGRSINFIDGETMKL (SEQ ID NO: 6) The mutations may be applied to the PylRS from Methanomethylophilus sp.1R26 (1R26). This sequence is provided below. MAEHFTDAQIQRLREYGNGTYKDMEFADVSAREKAFTKLMSDASRDNESALKGMIAHPARQGLSRLMNDIADALVA DGFIEVRTPIIISKDALAKMTITPDKPLFKQVFWIDDKRALRPMLAPSLYTVMRSLRDHTDGPVKIFEMGSCFRKE SHSGMHLEEFTMLNLVDMGPAGDATESLKKYIGIVMKAAGLPDYQLVHEESDVYKETIDVEINGQEVCSAAVGPHY LDAAHDVHEPWAGAGFGLERLLTIRQGYSTVMKGGASTTYLNGAKMD (SEQ ID NO: 7) To aid alignment, a version of the sequence of 1R26PylRS is provided below, wherein an “X” is marked at positions corresponding to 300, 302, 305, 306, 309, 346, 348, 384, 401 in the MmPylRS sequence. Thus, these are positions to which the respective mutations of the present invention may be applied. Further corresponding positions may be identified in the same manner. MAEHFTDAQIQRLREYGNGTYKDMEFADVSAREKAFTKLMSDASRDNESALKGMIAHPARQGLSRLMNDIADALVA DGFIEVRTPIIISKDALAKMTITPDKPLFKQVFWIDDKRALRPXMXAPXXYTXMRSLRDHTDGPVKIFEMGSCFRK ESHSGMHLEEFTMLXLXDMGPAGDATESLKKYIGIVMKAAGLPDYQLVHEESDVXKETIDVEINGQEVCSAXVGPH YLDAAHDVHEPWAGAGFGLERLLTIRQGYSTVMKGGASTTYLNGAKMD (SEQ ID NO: 8) The mutations may be applied to the PylRS from Methanomassiliicoccus luminyensis 1 (Lum1). This sequence is provided below. MDTRLTPAQAQRIREMGGTVDPSLAFSSEAERESAFQRISADLQGANLAKIRRCAEAPERHPIGSLENTLACALAA KGFIEVKTPMMIPADGLVKMGIDESHPLWNQVFWVGPKKALRPMLAPNLYFLMRHLRRSVPAPLLLFEIGPCFRKE RGSNHLEEFTMLNLVELAPQADATERLKEHIATVMNAVGLPYELVVEGSEVYGTTIDVEVDGVELASGAVGPLPMD PHGITEPWAGVGFGLERIALMRTKEQNIKKVGRSLVYVNGARIDI (SEQ ID NO: 9) To aid alignment, a version of the sequence of Lum1PylRS is provided below, wherein an “X” is marked at positions corresponding to 300, 302, 305, 306, 309, 346, 348, 384, 401 in the MmPylRS sequences. Thus, these are positions to which the respective mutations of the present invention may be applied. Further corresponding positions may be identified in the same manner. MDTRLTPAQAQRIREMGGTVDPSLAFSSEAERESAFQRISADLQGANLAKIRRCAEAPERHPIGSLENTLACALAA KGFIEVKTPMMIPADGLVKMGIDESHPLWNQVFWVGPKKALRPXLXPNXXFLXRHLRRSVPAPLLLFEIGPCFRKE RGSNHLEEFTMLXLXELAPQADATERLKEHIATVMNAVGLPYELVVEGSEVXGTTIDVEVDGVELASGXVGPLPMD PHGITEPWAGVGFGLERIALMRTKEQNIKKVGRSLVYVNGARIDI (SEQ ID NO: 10)
Other backbones to which the mutations of the present invention may be applied include MbPylRS (SEQ ID NO: 11), Lum1PylRS (SEQ ID NO: 12), TronPylRS (SEQ ID NO: 13), GemmPylRS (SEQ ID NO: 14), PGA8PylRS (SEQ ID NO: 15), I2PylRS (SEQ ID NO: 16), D121PylRS (SEQ ID NO: 17), and D416PylRS (SEQ ID NO: 18). Thus, in an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, said synthetase comprising an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 18 and said amino acid sequence comprising any of the mutations and optionally wild-type resides, or sets of mutations and optionally wild-type residues, disclosed in relation to the first aspect. In particular, the amino acid sequence may have at least 80%, 85%, 90%, 95%, or 99% identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 18. In an embodiment, the amino acid sequence may be identical to any one of SEQ ID NOs: 1 to 18 apart from the recited mutations. In a particular embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, said synthetase comprising an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 10 and said amino acid sequence comprising any of the mutations and optionally wild-type resides, or sets of mutations and optionally wild-type residues, disclosed in relation to the first aspect. In particular, the amino acid sequence may have at least 80%, 85%, 90%, 95%, or 99% identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 10. In an embodiment, the amino acid sequence may be identical to any one of SEQ ID NOs: 1 to 10 apart from the recited mutations. Thus, the acyl-tRNA synthetase may comprise an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 18 and comprising mutations corresponding to an M300D mutation, a A302H, A302Y, or A302C mutation, and an N346G, N346A, or N346S mutation in SEQ ID NO: 1 (as indicated by the relevant “X” in SEQ ID NOs: 2, 4, 6, 8, or 10). The amino acid sequence may comprise mutations corresponding to the mutations of any one of 12_1, 12_2, 12_3, 12_4, 12_A, 12_B, 12_C, or 12_D in Table 1. The amino acid sequence may comprise residues corresponding to the mutations and wild-type residues of any one of 12_1, 12_2, 12_3, 12_4, 12_A, 12_B, 12_C, or 12_D in Table 1. The amino acid sequence may comprise residues corresponding to the sets of mutations of any one of Tables 1 to 6. The amino acid sequence may comprise residues corresponding to the sets of mutations and wild-type residues of any one of Tables 1 to 6. The amino acid sequence may comprise residues corresponding to any of sets of mutations, or sets of mutations and wild-type residues, labelled as A1_1, A1_2, A1_3, A1_4, A1_5, A1_6, A1_7, A1_8, A1_9, A1_10, A1_11, A1_12, A2_1, A2_2, A2_3, A2_5, A2_6, A2_7, A2_8, A2_9, A2_10, A2_11, A2_12, A3_1, A3_2, A3_3, A3_4, A3_5, A3_7, A3_8, A3_9, A3_10, A3_11, A4_1, A4_3, A4_6, A4_7, A5_1, A5_2, A5_3, A5_4, A5_5, A5_6, or A5_8 in Tables 2 to 6. The amino acid sequence may further comprise mutations corresponding to the mutations listed as sets a), b), c), d), e), or f) herein. In a preferred embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising mutations corresponding to: M300D, A302H, L305H, N307K, N346G, C348F, F366L, and V401C relative to SEQ ID NO: 1. The acyl-tRNA synthetase may comprise an amino acid sequence comprising residues corresponding to: M300D, A302H, L305H, Y306Y, N307K, L309L, M344M, N346G, C348F, F366L, Y384Y, S399S, and V401C relative to SEQ ID NO: 1. The amino acid sequence may have at least 80%, 85%, 90%, 95%, or 99% identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 18. The amino acid sequence may have at least 80%, 85%, 90%, 95%, or 99% identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 10. The synthetases of the first aspect may comprise sequence changes relative to the wild-type sequence in addition to the mutations described in more detail herein. Specifically, the synthetases may comprise sequence changes at
sites which do not significantly compromise the function or operation of the synthetase as described herein. Synthetase function may be tested by operating the synthetase as described, such as in the examples section, in order to verify that function has not been abrogated or significantly altered. Thus, provided that the synthetase retains its function which can be tested as set out herein, sequence variations may be made in the synthetase relative to the wild-type reference sequence. Fig.35 provides a sequence alignment where “*” indicates residues found in all aligned aaRS sequences, “:” indicates residues within strong similarity between the aligned sequences, and “.” Indicates residues within some similarity between the aligned sequences. The synthetase may be at least 50%, 60%, 70%, 80%, 90%, 95%, 99%, or 100% identical to any one of SEQ ID NOs: 1 to 18, wherein the variation is found only in residues not marked with: i) “*”; ii) “*” or “:”, or iii) or “*”, “:”, or “.”, apart from specifically recited mutations discussed herein. Conservative substitutions may be made, for example according to the table below. Amino acids in the same block in the second column and preferably in the same line in the third column may be substituted for each other: ALIPHATIC Non-polar G A P I L V Polar - uncharged C S T M N Q Polar - charged D E K R AROMATIC H F W Y The acyl-tRNA synthetase of the first aspect may be isolated or purified. The acyl-tRNA synthetase of the first aspect may be a non-natural acyl-tRNA synthetase. In a second aspect, there is provided the use of an acyl-tRNA synthetase in a method of generating a polymer comprising a beta amino acid. In an embodiment, there is provided a method of making a polymer comprising a beta amino acid, wherein the method comprises: i) use of an acyl-tRNA synthetase to acylate a tRNA with the beta amino acid, wherein the acyl-tRNA synthetase comprises mutations to enable the acylation of a tRNA with a beta amino acid, and ii) incorporation of the beta amino acid into a polymer. In an embodiment, there is provided the use of an acyl-tRNA synthetase of the first aspect in a method of generating a polymer comprising a beta amino acid. In an embodiment, there is provided a method of making a polymer comprising a beta amino acid, wherein the method comprises: i) use of an acyl-tRNA synthetase of the first aspect to acylate a tRNA with the beta amino acid, and ii) incorporation of the beta amino acid into a polymer. The beta amino acid may be an analogue of S-beta-3-phenylalanine. In particular, the beta amino acid may be (S)- 3-amino-3-(3-bromophenyl)propanoic acid, (S)-3-amino-3-(benzo[d][1,3]dioxol-5-yl)propanoic acid ((S)β3MDF), (S)-3-amino-3-(4-bromophenyl)propanoic acid ((S)β3pBrF), (S)-3-amino-3-(3,4-difluorophenyl)propanoic acid ((S)β3pmFF), (S)-3-amino-3-(2-bromophenyl)propanoic acid ((S)β3oBrF), or (S)-3-amino-3-(3- (trifluoromethyl)phenyl)propanoic acid ((S)β3mCF3F).
The polymer may comprise one or more canonical amino acids or one or more naturally occurring amino acids. The polymer may comprise one or more unnatural amino acids and/or hydroxy acids. The unnatural amino acid may be an α-amino acid and/or the hydroxy acid may be an α-hydroxy acid. The polymer may be formed by genetic incorporation of the monomers using a ribosome. The polymer may be formed by translation of a nucleic acid sequence by a ribosome. The translation may comprise the binding of a tRNA charged with the beta amino acid to a ribosome, and the formation of a bond between the beta amino acid and a preceding and/or subsequent monomer. Thus, the method may comprise the provision of a nucleic acid sequence encoding a polymer, and the use of charged tRNAs and a ribosome to translate said sequence to form the polymer. The use or method may be performed in a cell. For instance a prokaryotic cell, a bacterial cell, or an E. coli cell. In a third aspect, there is provided a nucleic acid encoding an acyl-tRNA synthetase of the first aspect. The nucleic acid may be DNA. A vector may comprise the nucleic acid. In a fourth aspect, there is provided a cell comprising an acyl-tRNA synthetase of the first aspect, a nucleic acid of the third aspect, or a vector comprising a nucleic acid of the third aspect. In a particular embodiment, the cell comprises or expresses an acyl-tRNA synthetase of the first aspect and the acyl-tRNA synthetase is orthogonal to the endogenous tRNAs. Thus, the acyl-tRNA synthetase does not acylate the endogenous tRNAs to an extent that would render the cell non-viable. In an embodiment, the fitness of the cell, for instance measured by proliferation, is reduced by less than 50%, less than 25%, less than 10%, less than 5%, or is not reduced when the acyl-tRNA synthetase is expressed. α,α-disubstituted-amino-acid-acyl-tRNA synthetases The inventors have made use of the methods and tools disclosed herein and have identified acyl-tRNA synthetases that are capable of acylating tRNAs with α,α-disubstituted-amino acids. Thus, in a fifth aspect, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α-disubstituted-amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: A302C, A302G, A302H, or A302S relative to SEQ ID NO: 1; and N346A, N346C, N346E, N346G, N346T, or N346V relative to SEQ ID NO: 1. The definition of “specifically” acylating is provided in the section relating to the first aspect. In a particular embodiment, the amino acid sequence comprises A302G or A302H and/or N346A, N346C, or N346G. The amino acid sequence may comprise a mutation at a position corresponding to M300 relative to SEQ ID NO: 1. The mutation may be M300A, M300C, M300D, M300E, M300S, or M300T. In some embodiments, the residue is wild-type (M300M). The most common mutation in the screen was M300D. In particular embodiments, the mutation is M300D. The amino acid sequence may comprise a mutation at a position corresponding to L305 relative to SEQ ID NO: 1. The mutation may be L305C, L305F, L305H, L305I, L305N, or L305S. More particularly, the mutation may be L305C, L305F, or L305S. In some embodiments, the residue is wild-type (L305L). The most common mutation in the screen was L305C.
The amino acid sequence may comprise a mutation at a position corresponding to Y306 relative to SEQ ID NO: 1. The mutation may be Y306D, Y306F, Y306L, or Y306N. In some embodiments, the residue is wild-type (Y306Y). In particular embodiments, the amino acid sequence comprises Y306F or Y306Y. The amino acid sequence may comprise a mutation at a position corresponding to L309 relative to SEQ ID NO: 1. The mutation may be L309C, L309F, L309G, L309H, L309N, or L309V. In some embodiments, the residue is wild-type (L309L). In particular embodiments, the amino acid sequence comprises L309F or L309L. The amino acid sequence may comprise a mutation at a position corresponding to M344 relative to SEQ ID NO: 1. The mutation may be M344H or M344Q. In some embodiments, the residue is wild-type (M344M) and the wild-type residue was the most common in the screen. The amino acid sequence may comprise a mutation at a position corresponding to C348 relative to SEQ ID NO: 1. The mutation may be C348F, C348G, C348H, C348I, C348L, C348S, or C348V. In some embodiments, the residue is wild-type (C348C). In particular embodiments, the amino acid sequence comprises C348G or C348C. The most common mutation in the screen was C348G. The amino acid sequence may comprise the wild-type residue at a position corresponding to Y384 of SEQ ID NO: 1. This may be referred to as comprising Y384Y. The amino acid sequence may comprise the wild-type residue at a position corresponding to S399 of SEQ ID NO: 1. This may be referred to as comprising S399S. The amino acid sequence may comprise a mutation at a position corresponding to V401 of SEQ ID NO: 1. In examples, the mutation is V401A, V401C, V401K, V401L, or a conservative substitution of said residues. More particularly, the mutation may be V401C or V401L. In some embodiments, the residue is wild-type (V401V). In particular embodiments, the amino acid sequence comprises V401C, V401L, or V401V. In an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α-disubstituted-amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300D; A302C, A302G, A302H, or A302S; or more particularly A302G or A302H; and N346A, N346C, N346E, N346G, N346T, or N346V; or more particularly N346A, N346C, N346G. In a particular embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α-disubstituted-amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300D; A302G or A302H; and N346A, N346C, or N346G. There is provided herein an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α- disubstituted-amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300D; A302G or A302H; an L305 mutation and an N346 mutation (to any residue).
In a particular embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α-disubstituted-amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300D; A302G or A302H; an L305 mutation, aY306 mutation, an L309 mutation, an N346 mutation, and a C348 mutation; and optionally a V401 mutation. In a particular embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α-disubstituted-amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300D; A302H; N346A or N346G; and C348G. In a particular embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α-disubstituted-amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300D; A302G or A302H; L305C, L305F, or L305S; N346A, N346C, or N346G; and C348C or C348G. In a particular embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α-disubstituted-amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300D; A302G or A302H; L305C, L305F, or L305S; Y306F or Y306Y; L309F or L309L; N346A, N346C, or N346G; C348C or C348G; and V401C, V401L, or V401V. In a particular embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α-disubstituted-amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300D; A302G or A302H; L305C; Y306F; L309F N346A or N346G; and C348G.
In a particular embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α-disubstituted-amino acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300D; A302G or A302H; L305C, L305F, or L305S; Y306F or Y306Y; L309F or L309L; M344M; N346A, N346C, or N346G; C348C or C348G; Y384Y; S399S; and V401C, V401L, or V401V. The α,α-disubstituted-amino acid may be (S)-2-amino-3-(4-iodophenyl)-2-methylpropanoic acid ((S)α-Me-pIF). The inventors provide particular embodiments in Table 7. These embodiments are defined relative to SEQ ID NO: 1. For instance, A6_2 comprises M300D, A302H, L305S, etc.
Table 7 - Mutations from exemplary α,α-disubstituted-amino-acid-tRNA synthetases The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations in Table 7. For instance, the acyl-tRNA synthetase may comprise an amino acid sequence comprising residues corresponding to M300D, A302H, L305S, L309F, N346G, and C348G. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and wild-type residues in Table 7. For instance, the acyl-tRNA synthetase may comprise an amino acid sequence comprising residues corresponding to M300D, A302H, L305S, Y306Y, L309F, M344M, N346G, C348G, Y384Y, S399S, and V401V. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations labelled as A6_1, A6_2, A6_4, A6_5, A6_6, A6_7, A6_8, or A6_9 in Table 7 (e.g. M300D, A302H, L305S, L309F, N346G, and C348G for A6_2). The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and wild-type resides labelled as A6_1, A6_2, A6_4, A6_5, A6_6, A6_7, A6_8, or A6_9 in Table 7 (e.g. M300D, A302H, L305S, Y306Y, L309F, M344M, N346G, C348G, Y384Y, S399S, and V401V for A6_2). The mutations of the fifth aspect are provided with reference to SEQ ID NO: 1, which is the sequence of wild type Methanosarcina mazei pyrrolysyl-tRNA synthetase (MmPylRS) and provided herein. To aid alignment, a version of the sequence of MmPylRS where an “X” is marked at positions 300, 302, 305, 306, 309, 346, 348, 384, 401, is provided as SEQ ID NO: 2. Thus, in an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α-disubstituted-amino acid, said synthetase comprising an amino acid sequence having at least 50%, 60%,
70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2, and said amino acid sequence comprising any of the mutations or sets of mutations disclosed in relation to the fifth aspect. In particular, the amino acid sequence may have at least 80%, 85%, 90%, 95%, or 99% identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2. In an embodiment, the amino acid sequence may be identical to SEQ ID NO: 1 or SEQ ID NO: 2 apart from the recited mutations. The acyl-tRNA synthetase may comprise an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2 and comprising M300D; A302G or A302H; and N346A, N346C, or N346G. The acyl-tRNA synthetase may comprise an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2 and comprising M300D; A302G or A302H; L305C, L305F, or L305S; Y306F or Y306Y; L309F or L309L; M344M; N346A, N346C, or N346G; C348C or C348G; Y384Y; S399S; and V401C, V401L, V401V. The amino acid sequence may comprise mutations corresponding to the mutations of any one of the sets of mutations in Table 7. The amino acid sequence may comprise mutations corresponding to the mutations of any one of the sets of mutations labelled as A6_1, A6_2, A6_4, A6_5, A6_6, A6_7, A6_8, or A6_9 in Table 7. The amino acid sequence may comprise residues corresponding to the mutations and wild-type residues of any one of A6_1, A6_2, A6_4, A6_5, A6_6, A6_7, A6_8, or A6_9 in Table 7. The skilled person would also be able to apply the identified mutations to other PylRS backbones, including backbones not disclosed herein. This is despite the fact that other PylRSs may have a low sequence identity to MmPylRS. In order to apply the mutations recited with reference to MmPylRS to other synthetase, the sequences should be aligned to identify corresponding residues. In particular, the sequences representing the catalytic sites of the synthetases may be aligned. As such, the region of SEQ ID NO: 1 extending from residue 296 to residue 428 may be aligned to the sequence of another PylRS, and the corresponding mutations may be made. Further information on the transfer of mutations from one PylRS backbone to another is provided in WO2013/171485 A1 (herein incorporated by reference). An exemplary alignment is provided in Figure 35. As such, in an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α-disubstituted-amino, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising any mutations and optionally wild-type residues disclosed in relation to the fifth aspect, or any set of mutations and optionally wild-type residues disclosed in relation to the fifth aspect, and wherein the acyl-tRNA synthetase backbone is a PylRS. The mutations may be applied to the PylRS from Nitrososphaeria archaeon (Nitra). This sequence is provided herein as SEQ ID NO: 3. To aid alignment, a version of the sequence of NitraPylRS is provided herein (SEQ ID NO: 4), wherein an “X” is marked at positions corresponding to 300, 302, 305, 306, 309, 346, 348, 384, 401 in the MmPylRS sequence. These are positions to which the respective mutations of the present invention may be applied. Further corresponding positions may be identified in the same manner. The mutations may be applied to the PylRS from Clostridiales bacterium (Clos). This sequence is provided as SEQ ID NO: 5. To aid alignment, a version of the sequence of ClosPylRS is provided herein (SEQ ID NO: 6), wherein an “X” is marked at positions corresponding to 300, 302, 305, 306, 309, 346, 348, 384, 401 in the MmPylRS sequence. These are positions to which the respective mutations of the present invention may be applied. Further corresponding positions may be identified in the same manner. The mutations may be applied to the PylRS from Methanomethylophilus sp.1R26 (1R26). This sequence is provided herein (SEQ ID NO: 7). To aid alignment, a version of the sequence of 1R26PylRS is provided herein (SEQ ID NO: 8), wherein an “X” is marked at positions corresponding to 300, 302, 305, 306, 309, 346, 348, 384, 401 in the MmPylRS sequence. These are positions to which the respective mutations of the present invention may be applied. Further corresponding positions may be identified in the same manner.
The mutations may be applied to the PylRS from Methanomassiliicoccus luminyensis 1 (Lum1). This sequence is provided here (SEQ ID NO: 9). To aid alignment, a version of the sequence of Lum1PylRS is provided herein (SEQ ID NO: 10), wherein an “X” is marked at positions corresponding to 300, 302, 305, 306, 309, 346, 348, 384, 401 in the MmPylRS sequences. These are positions to which the respective mutations of the present invention may be applied. Further corresponding positions may be identified in the same manner. Other backbones to which the mutations of the present invention may be applied include MbPylRS (SEQ ID NO: 11), Lum1PylRS (SEQ ID NO: 12), TronPylRS (SEQ ID NO: 13), GemmPylRS (SEQ ID NO: 14), PGA8PylRS (SEQ ID NO: 15), I2PylRS (SEQ ID NO: 16), D121PylRS (SEQ ID NO: 17), and D416PylRS (SEQ ID NO: 18). Thus, in an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α-disubstituted-amino acid, said synthetase comprising an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 18 and said amino acid sequence comprising any of the mutations and optionally wild-type resides, or sets of mutations and optionally wild-type residues, disclosed in relation to the fifth aspect. In particular, the amino acid sequence may have at least 80%, 85%, 90%, 95%, or 99% identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 18. In an embodiment, the amino acid sequence may be identical to any one of SEQ ID NOs: 1 to 18 apart from the recited mutations. In a particular embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α-disubstituted-amino, said synthetase comprising an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 10 and said amino acid sequence comprising any of the mutations and optionally wild-type resides, or sets of mutations and optionally wild-type residues, disclosed in relation to the fifth aspect. In particular, the amino acid sequence may have at least 80%, 85%, 90%, 95%, or 99% identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 10. In an embodiment, the amino acid sequence may be identical to any one of SEQ ID NOs: 1 to 10 apart from the recited mutations. Thus, in examples, the acyl-tRNA synthetase may comprise an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 18 and comprising any one of the sets of mutations in Table 7 or any one of the sets of mutations labelled as A6_1, A6_2, A6_4, A6_5, A6_6, A6_7, A6_8, or A6_9 in Table 7. The amino acid sequence may comprise residues corresponding to the mutations and wild-type residues of any one of the sets in Table 7 or any one of the sets of mutations labelled as A6_1, A6_2, A6_4, A6_5, A6_6, A6_7, A6_8, or A6_9 in Table 7. In an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with an α,α-disubstituted-amino acid, said synthetase comprising an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 18 and said amino acid sequence comprising: M300D relative to SEQ ID NO: 1; A302C, A302G, A302H, or A302S relative to SEQ ID NO: 1; or more particularly A302G or A302H; and N346A, N346C, N346E, N346G, N346T, or N346V relative to SEQ ID NO: 1; or more particularly N346A, N346C, N346G. In an embodiment, the amino acid sequence comprises M300D; A302G or A302H; L305C, L305F, or L305S; Y306F, or Y306Y; L309F, or L309L; M344M; N346A, N346C, or N346G; C348C or C348G; Y384Y; S399S; and V401C, V401L, or V401V. The synthetases of the fifth aspect may comprise sequence changes relative to the wild-type sequence in addition to the mutations described in more detail herein. Specifically, the synthetases may comprise sequence changes at sites which do not significantly compromise the function or operation of the synthetase as described herein. Synthetase function may be tested by operating the synthetase as described, such as in the examples section, in
order to verify that function has not been abrogated or significantly altered. Thus, provided that the synthetase retains its function which can be tested as set out herein, sequence variations may be made in the synthetase relative to the wild-type reference sequence. As discussed in relation to the first aspect, Fig.35 provides a sequence alignment. The synthetase may be at least 50%, 60%, 70%, 80%, 90%, 95%, 99%, or 100% identical to any one of SEQ ID NOs: 1 to 18, wherein the variation is found only in residues not marked with: i) “*”; ii) “*” or “:”, or iii) or “*”, “:”, or “.”, apart from specifically recited mutations discussed herein. Conservative substitutions may be made, for example according to the table disclosed in relation to the first aspect. The acyl-tRNA synthetase of the fifth aspect may be isolated or purified. The acyl-tRNA synthetase of the fifth aspect may be a non-natural acyl-tRNA synthetase. In a sixth aspect, there is provided the use of an acyl-tRNA synthetase in a method of generating a polymer comprising an α,α-disubstituted-amino acid. In an embodiment, there is provided a method of making a polymer comprising an α,α-disubstituted-amino acid, wherein the method comprises: i) use of an acyl-tRNA synthetase to acylate a tRNA with the α,α-disubstituted-amino acid, wherein the acyl-tRNA synthetase comprises mutations to enable the acylation of a tRNA with the α,α-disubstituted-amino acid, and ii) incorporation of the α,α-disubstituted-amino acid into a polymer. In an embodiment, there is provided the use of an acyl-tRNA synthetase of the fifth aspect in a method of generating a polymer comprising an α,α-disubstituted-amino acid. In an embodiment, there is provided a method of making a polymer comprising an α,α-disubstituted-amino acid, wherein the method comprises: i) use of an acyl-tRNA synthetase of the fifth aspect to acylate a tRNA with the α,α-disubstituted-amino acid, and ii) incorporation of the α,α-disubstituted-amino acid into a polymer. The α,α-disubstituted-amino acid may be (S)-2-amino-3-(4-iodophenyl)-2-methylpropanoic acid ((S)α-Me-pIF). The polymer may comprise one or more canonical amino acids or one or more naturally occurring amino acids. The polymer may comprise one or more unnatural amino acids and/or hydroxy acids. The unnatural amino acid may be an α-amino acid and/or the hydroxy acid may be an α-hydroxy acid. The polymer may be formed by genetic incorporation of the monomers using a ribosome. The polymer may be formed by translation of a nucleic acid sequence by a ribosome. The translation may comprise the binding of a tRNA charged with the α,α-disubstituted-amino acid to a ribosome, and the formation of a bond between the α,α- disubstituted-amino acid and a preceding and/or subsequent monomer. Thus, the method may comprise the provision of a nucleic acid sequence encoding a polymer, and the use of charged tRNAs and a ribosome to translate said sequence to form the polymer. Thus, there is provided a method of making a polymer comprising the genetic incorporation of at least one α,α- disubstituted-amino acid, which may be (S)-2-amino-3-(4-iodophenyl)-2-methylpropanoic acid ((S)α-Me-pIF).
The use or method may be performed in a cell. For instance a prokaryotic cell, a bacterial cell, or an E. coli cell. In a seventh aspect, there is provided a nucleic acid encoding an acyl-tRNA synthetase of the fifth aspect. The nucleic acid may be DNA. A vector may comprise the nucleic acid. In an eighth aspect, there is provided a cell comprising an acyl-tRNA synthetase of the fifth aspect, a nucleic acid of the seventh aspect, or a vector comprising a nucleic acid of the seventh aspect. In a particular embodiment, the cell comprises or expresses an acyl-tRNA synthetase of the fifth aspect and the acyl-tRNA synthetase is orthogonal to the endogenous tRNAs. Thus, the acyl-tRNA synthetase does not acylate the endogenous tRNAs to an extent that would render the cell non-viable. In an embodiment, the fitness of the cell, for instance measured by proliferation, is reduced by less than 50%, less than 25%, less than 10%, less than 5%, or is not reduced when the acyl-tRNA synthetase is expressed. Beta-hydroxy-acid-acyl-tRNA synthetases The inventors have made use of the methods and tools disclosed herein and have identified acyl-tRNA synthetases that are capable of acylating tRNAs with beta-hydroxy acids. Thus, in a ninth aspect, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300A, M300D, M300M, M300N, or M300S relative to SEQ ID NO: 1; and A302D, A302G, A302H, or A302N relative to SEQ ID NO: 1. The definition of “specifically” acylating is provided in the section relating to the first aspect. In particular, the acyl-tRNA synthetase may comprise a residue corresponding to M300N or M300S. M300N was the most common in the screen. The acyl-tRNA synthetase may comprise a residue corresponding to A302H. A302H was the most common in the screen. The acyl-tRNA synthetase may comprise residue corresponding to M300N and A302H. The amino acid sequence may comprise a mutation at a position corresponding to N346 relative to SEQ ID NO: 1. The mutation may be N346A, N346C, N346E, N346G, N346S, or N346V. In some embodiments, the residue is wild-type (N346N). In particular embodiments, the mutation is N346A, N346C, or N346G. The amino acid sequence may comprise a mutation at a position corresponding to L305 relative to SEQ ID NO: 1. The mutation may be L305C, L305N, L305S, or L305V. In some embodiments, the residue is wild-type (L305L). In embodiments, the amino acid sequence may comprise L305L or L305V. The most common residue in the screen was wild-type. The amino acid sequence may comprise a mutation at a position corresponding to Y306 relative to SEQ ID NO: 1. The mutation may be Y306D, Y306F, Y306I, Y306N, Y306R, or Y306S. In some embodiments, the residue is wild-type (Y306Y). The most common residue in the screen was wild-type but a notably high selectivity was achieved by a variant including Y306N. The amino acid sequence may comprise a mutation at a position corresponding to L309 relative to SEQ ID NO: 1. The mutation may be L309D, L309H, L309I, L309R, or L309S. More particularly, the mutation may be L309I or L309S. In some embodiments, the residue is wild-type (L309L). The most common residue in the screen was wild-type but a notably high selectivity was achieved by a variant including L309I.
The amino acid sequence may comprise a mutation at a position corresponding to M344 relative to SEQ ID NO: 1. The mutation may be M344E or M344Q. In some embodiments, the residue is wild-type (M344M) and the wild- type residue was the most common in the screen. The amino acid sequence may comprise a mutation at a position corresponding to C348 relative to SEQ ID NO: 1. The mutation may be C348F, C348I, C348L, C348T, or C348V. More particularly, the mutation may be C348I, C348L, C348T, or C348V. In some embodiments, the residue is wild-type (C348C). This residue was mutated in the majority of variants in the screen. The amino acid sequence may comprise the wild-type residue at a position corresponding to Y384 of SEQ ID NO: 1. This may be referred to as comprising Y384Y. The amino acid sequence may comprise the wild-type residue at a position corresponding to S399 of SEQ ID NO: 1. This may be referred to as comprising S399S. The amino acid sequence may comprise a mutation at a position corresponding to V401 of SEQ ID NO: 1. In examples, the mutation is V401A, V401C, V401K, V401L, V401S, or V401T. More particularly, the mutation may be V401C, V401K, or V401L. In some embodiments, the residue is wild-type (V401V). In an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to M300N and A300H, and optionally mutation(s) at position(s) N346 and/or C348. In an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300A, M300D, M300M, M300N, or M300S; more particularly M300N or M300S; A302D, A302G, A302H, or A302N; more particularly A302H; and N346A, N346C, N346E, N346G, N346S, or N346V; more particularly N346A, N346C, or N346G. In an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300N or M300S; more particularly M300N; A302H; and N346A, N346C, or N346G; more particularly N346G. In an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to M300N, A302H, and N346G; optionally further comprising a C348 mutation and/or a V401 mutation. In an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300N or M300S; A302H; N346A, N346C, or N346G; and a C348 mutation and/or a V401 mutation.
In an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300N or M300S; A302H; N346A, N346C, or N346G; and a C348 mutation, optionally which is C348I, C348L, C348T, or C348V. In an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300N; A302H; and N346G; and a C348 mutation and/or a V401 mutation; optionally wherein the C348 mutation is C348I, C348L, C348T, or C348V; and/or wherein the V401 mutation is V401C, V401K, V401L, or V401V. In an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300N or M300S; A302H; N346A, N346C, or N346G; and C348I, C348L, C348T, or C348V; and optionally V401C, V401K, V401L, or V401V. In an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising residues corresponding to: M300N; A302H; N346A, N346C, or N346G; and C348I, C348L, or C348V. The beta-hydroxy acid may be (S)-3-(3-chlorophenyl)-3-hydroxypropanoic acid (OH-(S)β3mClF). The inventors provide particular embodiments in Table 8. These embodiments are defined relative to SEQ ID NO: 1. For instance, A7_2 comprises M300N, A302H, L305L, etc.
Table 8 - Mutations from exemplary beta-hydroxy-acid-tRNA synthetases
The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations in Table 8. For instance, the acyl-tRNA synthetase may comprise an amino acid sequence comprising residues corresponding to M300N, A302H, Y306N, N346G, C348L, and V401L. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and wild-type residues in Table 8. For instance, the acyl-tRNA synthetase may comprise an amino acid sequence comprising residues corresponding to M300N, A302H, L305L, Y306N, L309L, M344M, N346G, C348L, Y384Y, S399S, and V401L. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations labelled as A7_1, A7_2, A7_3, A7_4, A7_5, or A7_7 in Table 8. The acyl-tRNA synthetase may comprise an amino acid sequence comprising any of the sets of mutations and wild-type resides labelled as A7_1, A7_2, A7_3, A7_4, A7_5, or A7_7 in Table 8. The mutations of the ninth aspect are provided with reference to SEQ ID NO: 1, which is the sequence of wild type Methanosarcina mazei pyrrolysyl-tRNA synthetase (MmPylRS) and provided herein. To aid alignment, a version of the sequence of MmPylRS where an “X” is marked at positions 300, 302, 305, 306, 309, 346, 348, 384, 401, is provided as SEQ ID NO: 2. Thus, in an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, said synthetase comprising an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2, and said amino acid sequence comprising any of the mutations or sets of mutations disclosed in relation to the ninth aspect. In particular, the amino acid sequence may have at least 80%, 85%, 90%, 95%, or 99% identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2. In an embodiment, the amino acid sequence may be identical to SEQ ID NO: 1 or SEQ ID NO: 2 apart from the recited mutations. The acyl-tRNA synthetase may comprise an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2 and comprising M300A, M300D, M300M, M300N, or M300S; more particularly M300N, or M300S; A302D, A302G, A302H, or A302N; more particularly A302H; and N346A, N346C, N346E, N346G, N346S, or N346V; more particularly N346A, N346C, or N346G. The acyl-tRNA synthetase may comprise an amino acid sequence having
at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2 and comprising M300N or M300S; more particularly M300N; A302H; and N346A, N346C, or N346G; more particularly N346G. The amino acid sequence may comprise mutations corresponding to the mutations of any one of the sets of mutations in Table 8. The amino acid sequence may comprise mutations corresponding to the mutations of any one of the sets of mutations labelled as A7_1, A7_2, A7_3, A7_4, A7_5, and A7_7 in Table 8. The amino acid sequence may comprise residues corresponding to the mutations and wild- type residues of any one of A7_1, A7_2, A7_3, A7_4, A7_5, and A7_7 in Table 8. The skilled person would also be able to apply the identified mutations to other PylRS backbones, including backbones not disclosed herein. This is despite the fact that other PylRSs may have a low sequence identity to MmPylRS. In order to apply the mutations recited with reference to MmPylRS to other synthetase, the sequences should be aligned to identify corresponding residues. In particular, the sequences representing the catalytic sites of the synthetases may be aligned. As such, the region of SEQ ID NO: 1 extending from residue 296 to residue 428 may be aligned to the sequence of another PylRS, and the corresponding mutations may be made. Further information on the transfer of mutations from one PylRS backbone to another is provided in WO2013/171485 A1 (herein incorporated by reference). An exemplary alignment is provided in Figure 35. As such, in an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, wherein the acyl-tRNA synthetase comprises an amino acid sequence comprising any mutations and optionally wild-type residues disclosed in relation to the ninth aspect, or any set of mutations and optionally wild-type residues disclosed in relation to the ninth aspect, and wherein the acyl-tRNA synthetase backbone is a PylRS. The mutations may be applied to the PylRS from Nitrososphaeria archaeon (Nitra). This sequence is provided herein as SEQ ID NO: 3. To aid alignment, a version of the sequence of NitraPylRS is provided herein (SEQ ID NO: 4), wherein an “X” is marked at positions corresponding to 300, 302, 305, 306, 309, 346, 348, 384, 401 in the MmPylRS sequence. These are positions to which the respective mutations of the present invention may be applied. Further corresponding positions may be identified in the same manner. The mutations may be applied to the PylRS from Clostridiales bacterium (Clos). This sequence is provided as SEQ ID NO: 5. To aid alignment, a version of the sequence of ClosPylRS is provided herein (SEQ ID NO: 6), wherein an “X” is marked at positions corresponding to 300, 302, 305, 306, 309, 346, 348, 384, 401 in the MmPylRS sequence. These are positions to which the respective mutations of the present invention may be applied. Further corresponding positions may be identified in the same manner. The mutations may be applied to the PylRS from Methanomethylophilus sp.1R26 (1R26). This sequence is provided herein (SEQ ID NO: 7). To aid alignment, a version of the sequence of 1R26PylRS is provided herein (SEQ ID NO: 8), wherein an “X” is marked at positions corresponding to 300, 302, 305, 306, 309, 346, 348, 384, 401 in the MmPylRS sequence. These are positions to which the respective mutations of the present invention may be applied. Further corresponding positions may be identified in the same manner. The mutations may be applied to the PylRS from Methanomassiliicoccus luminyensis 1 (Lum1). This sequence is provided here (SEQ ID NO: 9). To aid alignment, a version of the sequence of Lum1PylRS is provided herein (SEQ ID NO: 10), wherein an “X” is marked at positions corresponding to 300, 302, 305, 306, 309, 346, 348, 384, 401 in the MmPylRS sequences. These are positions to which the respective mutations of the present invention may be applied. Further corresponding positions may be identified in the same manner. Other backbones to which the mutations of the present invention may be applied include MbPylRS (SEQ ID NO: 11), Lum1PylRS (SEQ ID NO: 12), TronPylRS (SEQ ID NO: 13), GemmPylRS (SEQ ID NO: 14), PGA8PylRS (SEQ ID NO: 15), I2PylRS (SEQ ID NO: 16), D121PylRS (SEQ ID NO: 17), and D416PylRS (SEQ ID NO: 18).
Thus, in an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, said synthetase comprising an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 18 and said amino acid sequence comprising any of the mutations and optionally wild-type resides, or sets of mutations and optionally wild-type residues, disclosed in relation to the ninth aspect. In particular, the amino acid sequence may have at least 80%, 85%, 90%, 95%, or 99% identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 18. In an embodiment, the amino acid sequence may be identical to any one of SEQ ID NOs: 1 to 18 apart from the recited mutations. In a particular embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, said synthetase comprising an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 10 and said amino acid sequence comprising any of the mutations and optionally wild-type resides, or sets of mutations and optionally wild-type residues, disclosed in relation to the ninth aspect. In particular, the amino acid sequence may have at least 80%, 85%, 90%, 95%, or 99% identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 10. In an embodiment, the amino acid sequence may be identical to any one of SEQ ID NOs: 1 to 10 apart from the recited mutations. Thus, in examples, the acyl-tRNA synthetase may comprise an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 18 and comprising any one of the sets of mutations in Table 8 or any one of the sets of mutations labelled as A7_1, A7_2, A7_3, A7_4, A7_5, and A7_7 in Table 8. The amino acid sequence may comprise residues corresponding to the mutations and wild-type residues of any one of the sets in Table 8 or any one of the sets of mutations labelled as A7_1, A7_2, A7_3, A7_4, A7_5, and A7_7 in Table 8. In an embodiment, there is provided an acyl-tRNA synthetase capable of specifically acylating a tRNA with a beta-hydroxy acid, said synthetase comprising an amino acid sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, or 99% similarity or identity to the amino acid sequence of any one of SEQ ID NOs: 1 to 18 and said amino acid sequence comprising mutations corresponding to: M300A, M300D, M300M, M300N, or M300S; more particularly M300N, or M300S; A302D, A302G, A302H, or A302N; more particularly A302H; and N346A, N346C, N346E, N346G, N346S, or N346V; more particularly N346A, N346C, or N346G (all relative to SEQ ID NO: 1). The amino acid sequence may comprise mutations corresponding to: M300N or M300S; more particularly M300N; A302H; and N346A, N346C, or N346G; more particularly N346G. The amino acid sequence may comprise M300N, A302H, and optionally N346 and/or C348 mutations. The synthetases of the ninth aspect may comprise sequence changes relative to the wild-type sequence in addition to the mutations described in more detail herein. Specifically, the synthetases may comprise sequence changes at sites which do not significantly compromise the function or operation of the synthetase as described herein. Synthetase function may be tested by operating the synthetase as described, such as in the examples section, in order to verify that function has not been abrogated or significantly altered. Thus, provided that the synthetase retains its function which can be tested as set out herein, sequence variations may be made in the synthetase relative to the wild-type reference sequence. As discussed in relation to the first aspect, Fig.35 provides a sequence alignment. The synthetase may be at least 50%, 60%, 70%, 80%, 90%, 95%, 99%, or 100% identical to any one of SEQ ID NOs: 1 to 18, wherein the variation is found only in residues not marked with: i) “*”; ii) “*” or “:”, or iii) or “*”, “:”, or “.”, apart from specifically recited mutations discussed herein.
Conservative substitutions may be made, for example according to the table disclosed in relation to the first aspect. The acyl-tRNA synthetase of the ninth aspect may be isolated or purified. The acyl-tRNA synthetase of the ninth aspect may be a non-natural acyl-tRNA synthetase. In a tenth aspect, there is provided the use of an acyl-tRNA synthetase in a method of generating a polymer comprising a beta-hydroxy acid. In an embodiment, there is provided a method of making a polymer comprising a beta-hydroxy acid, wherein the method comprises: i) use of an acyl-tRNA synthetase to acylate a tRNA with the beta-hydroxy acid, wherein the acyl-tRNA synthetase comprises mutations to enable the acylation of a tRNA with the beta-hydroxy acid, and ii) incorporation of the beta-hydroxy acid into a polymer. In an embodiment, there is provided the use of an acyl-tRNA synthetase of the ninth aspect in a method of generating a polymer comprising a beta-hydroxy acid. In an embodiment, there is provided a method of making a polymer comprising a beta-hydroxy acid, wherein the method comprises: i) use of an acyl-tRNA synthetase of the ninth aspect to acylate a tRNA with the beta-hydroxy acid, and ii) incorporation of the beta-hydroxy acid into a polymer. The beta-hydroxy acid may be (S)-3-(3-chlorophenyl)-3-hydroxypropanoic acid (OH-(S)β3mClF). The polymer may comprise one or more canonical amino acids or one or more naturally occurring amino acids. The polymer may comprise one or more unnatural amino acids and/or additional hydroxy acids. The unnatural amino acid may be an α-amino acid and/or the additional hydroxy acid may be an α-hydroxy acid. The polymer may be formed by genetic incorporation of the monomers using a ribosome. The polymer may be formed by translation of a nucleic acid sequence by a ribosome. The translation may comprise the binding of a tRNA charged with the beta-hydroxy acid to a ribosome, and the formation of a bond between the beta-hydroxy acid and a preceding and/or subsequent monomer. Thus, the method may comprise the provision of a nucleic acid sequence encoding a polymer, and the use of charged tRNAs and a ribosome to translate said sequence to form the polymer. Thus, there is provided a method of making a polymer comprising the genetic incorporation of at least one beta- hydroxy acid, which may be (S)-3-(3-chlorophenyl)-3-hydroxypropanoic acid (OH-(S)β3mClF). The use or method may be performed in a cell. For instance a prokaryotic cell, a bacterial cell, or an E. coli cell. In an eleventh aspect, there is provided a nucleic acid encoding an acyl-tRNA synthetase of the ninth aspect. The nucleic acid may be DNA. A vector may comprise the nucleic acid. In a twelfth aspect, there is provided a cell comprising an acyl-tRNA synthetase of the ninth aspect, a nucleic acid of the eleventh aspect, or a vector comprising a nucleic acid of the eleventh aspect. In a particular embodiment, the cell comprises or expresses an acyl-tRNA synthetase of the ninth aspect and the acyl-tRNA synthetase is orthogonal to the endogenous tRNAs. Thus, the acyl-tRNA synthetase does not acylate
the endogenous tRNAs to an extent that would render the cell non-viable. In an embodiment, the fitness of the cell, for instance measured by proliferation, is reduced by less than 50%, less than 25%, less than 10%, less than 5%, or is not reduced when the acyl-tRNA synthetase is expressed. Sequence comparisons can be conducted with the aid of readily available sequence comparison programs. These publicly and commercially available computer programs can calculate sequence identity between two or more sequences. The skilled technician will appreciate how to calculate the percentage identity between two nucleic sequences. In order to calculate the percentage identity between two nucleic acid sequences or two amino acid sequences, an alignment of the two sequences must first be prepared, followed by calculation of the sequence identity value. The percentage identity for two sequences may take different values depending on: (i) the method used to align the sequences, for example, the Needleman-Wunsch algorithm (e.g. as applied by Needle(EMBOSS) or Stretcher(EMBOSS), the Smith-Waterman algorithm (e.g. as applied by Water(EMBOSS)), or the LALIGN application (e.g. as applied by Matcher(EMBOSS); and (ii) the parameters used by the alignment method, for example, local versus global alignment, the matrix used, and the parameters applied to gaps. In a particular embodiment, the sequence identities disclosed herein may be calculated based on a global alignment of the relevant feature, for instance the comparison of complete lengths of sequences encoding a PylRS catalytic site, the complete lengths of sequences encoding PylRS C-terminal domains, or the complete lengths of sequences encoding PylRSs. In particular, the proteins defined herein by reference to a degree of identity to a SEQ ID NO may have that degree of identity in comparison to the full length of the SEQ ID NO. Having made the alignment, there are many different ways of calculating percentage identity between the two sequences. For example, one may divide the number of identities by: (i) the length of shortest sequence; (ii) the length of alignment; (iii) the mean length of sequence; (iv) the number of non-gap positions; or (iv) the number of equivalenced positions excluding overhangs. Furthermore, it will be appreciated that percentage identity is also strongly length-dependent. Therefore, the shorter a pair of sequences is, the higher the sequence identity one may expect to occur by chance. A calculation of percentage identities between two nucleic acid sequences may then be calculated from such an alignment as (N/T)*100, where N is the number of positions at which the sequences share an identical residue, and T is the total number of positions compared including gaps but excluding overhangs. The sequence alignment may be a pairwise sequence alignment. Suitable services include Needle (EMBOSS), Stretcher (EMBOSS), Water (EMBOSS), Matcher (EMBOSS), LALIGN, or GeneWise. In an example, the identity between two amino acid sequences may be calculated using the service Needle(EMBOSS) set to the default parameters, e.g. matrix (BLOSUM62), gap open (10), gap extend (0.5), end gap penalty (false), end gap open (10), and end gap extend (0.5). In another example, the identity between two amino acid sequences may be calculated using the service Matcher (EMBOSS) set to the default parameters, e.g. matrix (BLOSUM62), gap open (14), gap extend (4), alternative matches (1). In an example, the identity between two nucleic acid sequences may be calculated using the service Needle(EMBOSS) set to the default parameters, e.g. matrix (DNAfull), gap open (10), gap extend (0.5), end gap penalty (false), end gap open (10), and end gap extend (0.5). In another example, the identity between two nucleic acid sequences may be calculated using the service Matcher (EMBOSS) set to the default parameters, e.g. matrix (DNAfull), gap open (16), gap extend (4), alternative matches (1). All of the features described herein (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined with any of the above aspects in any combination, except combinations where at least some of such features and/or steps are mutually exclusive.
For a better understanding of the invention, and to show how embodiments of the same may be carried into effect, reference will now be made to the Examples, which are not intended to limit the invention in any way. EXAMPLES Summary of Examples The genetic code of living cells has been reprogrammed to enable the site-specific incorporation of hundreds of non-canonical amino acids (ncAAs) into proteins, and the encoded synthesis of non-canonical polymers and macrocyclic peptides and depsipeptides. Current methods for engineering orthogonal (O)-aminoacyl-tRNA synthetases to acylate new monomers rely on translational readouts and therefore require the monomers to be ribosomal substrates. O-synthetases cannot be evolved to acylate O-tRNAs with non-canonical monomers (ncMs) that are poor ribosomal substrates (and ribosomes cannot be evolved to polymerize ncMs that cannot be acylated onto O-tRNAs); this co-dependence creates an evolutionary deadlock that has restricted the scope of translation in living cells to alpha-L-amino acids and closely related hydroxy acids. Here, we break this deadlock by developing a direct selection for O-synthetases that acylate cognate O-tRNAs with ncMs, independent of whether the ncMs are ribosomal substrates. We develop split tmRNAs, composed of a non-covalent assembly between the 5’ half of an O-tRNA and a fusion between the 3’ half of the O-tRNA and the mRNA that encodes the cognate synthetase; this couples the synthetase genotype to split tmRNA acylation in cells. We also develop an approach to specifically isolate and enrich acylated split tmRNAs. We combine these advances into tRNA display, which enables the direct, rapid and scalable selection for O-synthetases that acylate their cognate O-tRNA; this approach uses 50-times less ncM than translation-based selections and we parallelize tRNA display to select efficient synthetases for 8 ncAAs. Using tRNA display, we directly select O-synthetases that acylate their cognate O-tRNA with a β-amino acid. We build on this advance to demonstrate the genetically encoded, site-specific cellular incorporation of a β-amino acid into a protein, and thereby expand the chemical scope of the E. coli genetic code. Introduction Here we develop derivatives of tREX in which specific acylated tRNAs, isolated from cells, are labelled with dNTP analogs by primer extension; this enables fluorescent imaging (fluoro-tREX) or capture (bio-tREX) of acylated tRNAs. We create Methanosarcina mazei pyrrolysine tRNA Pyl CUA (MmtRNA CUA, henceforth referred to as tRNAPyl CUA) genes that are circularly permuted (which we refer to as cis split (s)tRNAPyl genes) by joining the 5’ and 3’ ends through an intervening sequence and creating new 5’ and 3’ ends at the anticodon. The intervening sequence is processed out of the transcript to generate a split stRNA, composed of a 5’ half and a 3’ half, which is acylated by Mmpyrrolysyl-tRNA synthetase (henceforth referred to as PylRS), the cognate aminoacyl-tRNA synthetase of the parent tRNA. We connect the genotype responsible for acylation to the acylation itself, by fusing the gene for PylRS to the cis stRNAPyl gene, creating cis stmRNAPyl. We demonstrate that we can selectively enrich – by more than 300-fold – stmRNAs, encoding active PylRS variants, with respect to attenuated activity variants, using bio-mREX (a variation of bio-tREX applied to the stmRNA). We increase the dynamic range of PylRS enrichment by maximizing the transcription of stmRNAs, while minimizing the translation of the PylRS mRNA encoded within the stmRNA. We generate stmRNA libraries with combinations of mutations in the PylRS gene and use parallel bio-mREX based enrichments, in the presence and absence of ncAAs, followed by reverse transcription and NGS to rapidly and scalably define active and selective PylRS variants, through a process we term tRNA display. We use tRNA display to select orthogonal aminoacyl-tRNA synthetases (aaRSs) that specifically acylate their cognate, orthogonal tRNAs with a carboxylic acid, that cannot function in translation, and with a β-amino acid, a class of ncMs generally considered to be poor ribosomal substrates10,11,14-18. Moreover, using the β-amino acid pair we demonstrate the site-specific co-translational incorporation of a β-amino acid in a recombinant protein. Example 1 - Sensitive detection & efficient isolation of acylated tRNAs
We demonstrated that we could determine the aminoacylation status of a specific tRNA isolated from cells by periodate oxidation followed by selective, primer mediated, extension of non-oxidized tRNAs with nucleotide derivatives bearing a fluorophore (Cy5) or biotin. These experiments demonstrate that our approach, which we named fluorescent tREX (fluoro-tREX), allows the acylation of a tRNA to be followed through the generation of a fluorescent signal. Next, we replaced Cy5-dCTP with biotinylated dCTP (bio-dCTP) in the extension step of fluoro-tREX, thereby creating biotin-tREX (bio-tREX). We selectively captured the biotinylated extension product (resulting from tRNA molecules that were protected from periodate oxidation by their aminoacylation) on magnetic streptavidin beads, washed the beads, and eluted bound tRNA extension products. The presence of the tRNAPyl CUA extension product in the eluate was dependent on the presence of PylRS, BocK (1) in cells, and the addition of bio-dCTP to the extension reaction. We conclude that biotin-tREX enabled the selective capture of tRNA extension products from tRNAs that were aminoacylated. Further information is provided in the priority documents (EP2306393.6, filed 28 April 2023 and EP2400299.0, filed 9 January 2024; herein incorporated by reference). Example 2 - Split tRNAs: in vivo assembly, maturation, and acylation. Next, we asked whether we could split the tRNAPylCUA gene at the anticodon to create a split tRNAPyl (stRNAPyl). We envisioned generating a system in which the 5’ and 3’ halves of a split tRNA gene were transcribed, assembled in vivo via non-covalent interactions (including base-pairing), matured by the cellular tRNA processing machinery, and recognized and efficiently acylated by PylRS. We first designed a series of constructs in which we split the tRNA gene into two at the anticodon. Some constructs resulted in robust aminoacylation, which was dependent on the presence of both tRNA halves, PylRS, and BocK. To the best of our knowledge this data constitutes the first example of a split tRNA being assembled, matured and acylated in cells. Further information is provided in the priority documents. Next, we turned our attention to designing an optimal expression system for stRNAPyls, which would: (1) ensure equimolar stoichiometry of both tRNA halves, and (2) facilitate spatial proximity, and thereby the assembly, of both tRNA halves into a tRNA body that can be acylated. To achieve this, we showed that tRNAPyl CUA can be split, expressed from a single transcript in cis, and that the split tRNA functions as an efficient substrate for acylation by PylRS. Further information is provided in the priority documents. Example 3 - Covalently linking acylation phenotype and genotype Next, we fused the PylRS coding sequence and a linker sequence to the 5’ end of the 3’ half of the cis stRNAPyl expression cassette, creating the stmRNAPyl cassette. Experiments were performed and we conclude that our stmRNA construct is functional. The construct is transcribed and processed to generate a split tRNA in which the 3’ half is fused to the synthetase mRNA. The synthetase gene is translated, and the resulting protein catalyzes the acylation of the 3’ end of the 3’ half of the stmRNA. In fluoro-mREX the acylation is converted into a fluorescent ‘phenotype’ via the addition of fluorescent nucleotides to the RNA strand that contains the mRNA of the synthetase. This creates a physical linkage between the mRNA sequence of the synthetase, its genotype, and the fluorescent ‘phenotype’, generated as a result of the activity of the synthetase. Further information is provided in the priority documents. Example 4 - Efficient acylation-specific enrichment of PylRS genotypes. Next, we aimed to selectively isolate acylated stmRNAs with respect to non-acylated stmRNAs and, reverse transcribe the isolated PylRS mRNA within stmRNAs to directly yield the cDNA of the PylRS gene responsible for acylation. We envisioned that the resulting cDNA could then either be used as a direct readout in a quantitative (q)PCR, converted to DNA for further selection and directed evolution, or sequenced. To selectively isolate
acylated stmRNAs with respect to non-acylated stmRNAs we created bio-mREX, an adaptation of bio-tREX for stmRNAs. This method was tested experimentally and we conclude that bio-mREX enables the efficient and selective recovery of stmRNAs, and the genes for synthetases that acylate the split tRNAs within them. Further information is provided in the priority documents. In conclusion, by combining the stmRNAvol2 construct with bio-mREX we developed a pulldown with which we could selectively isolate cDNA of active PylRS variants over inactive variants over a large dynamic range. Example 5 - Direct selection for cellular acylation by tRNA display. Next, we set up the first method for directly selecting synthetase enzymes on the basis of their tRNA acylation activity. We envisaged that we could efficiently detect active and selective PylRS variants by running parallel bio- mREX-based selections coupled with deep sequencing by next generation sequencing (NGS) (Fig.2a). In such an experiment, PylRS active site libraries are cloned into stmRNAvol2 constructs and transformed into E. coli cells. Overnight cultures are then diluted into media in the presence (positive samples) and absence (negative samples) of the monomer of choice and bio-mREX is performed. The experiments are performed in multiple replicates, and cDNA of the positive and negative samples, as well as the cDNA reverse transcribed from the input library, are subsequently barcoded and sequenced by NGS (Fig.2a). We termed this selection approach tRNA display. NGS data from the selections of the samples enables the calculation of two key parameters: (1) the enrichment – defined as the average abundance of a particular sequence in the positive samples, over the abundance of the same sequence in the input RNA and (2), the selectivity – defined as the ratio of the abundance of a sequence in the positive sample, over the abundance of the same sequence in the negative sample. Desired PylRS variants correspond to sequences that are highly enriched as well as highly selective. Plotting the natural logarithm of the enrichment against the natural logarithm of the selectivity results in a spindle-shaped distribution. We expect any desired sequences to be in the top right quadrant of the spindle plots (Fig.2a) (highly enriched, and highly selective). To test tRNA display we generated a small PylRS library, in which we expected many sequences to be active, as a stmRNAvol2 construct. The library targeted positions Y306, L309, and N346 in PylRS, where mutations had previously been identified that enable the efficient incorporation of CbzK (2)26,27 (Fig.2b and Fig.6). We transformed the library into E. coli cells and performed a single round of tRNA display. We observed a large population of highly enriched and selective variant PylRS sequences in the spindle plot derived from the sequencing of this experiment (Fig.2c). To assess the predictive power of tRNA display, we measured the in vivo production of GFP(150CbzK)His6 from GFP(150TAG)His6 in the presence of tRNAPyl CUA and 65 PylRS variants, which appeared to be potential hits on the basis of their position on the spindle plot (Fig.7). We observed a positive correlation between the enrichment derived from NGS data and the translation activity derived from GFP production for these hits (Fig.2d), and the vast majority of these hits were selective (Fig.7). We concluded that tRNA display permits the direct, translation independent, identification of active and selective PylRS enzymes, from a library of PylRS sequences. Example 6 - High throughput selection for ncAAs by tRNA display. To further validate the utility of tRNA display, we ran parallel selections (Fig.8) using six independent, highly diverse, PylRS active site libraries (Fig.6), and ten ncAAs, 2-11 (Fig.2b). After two rounds of selection, we analyzed the spindle plots derived from the NGS data and identified individual PylRS mutants for ncAAs 2,3,4,7,8,9,10,11 that were enriched and selective; the enriched and selective mutants for each of these ncAAs showed convergent sequence motifs (Figs.8-18).
We demonstrated the incorporation of ncAAs 3,4,7,8,9,10,11 in response to the amber codon in GFP(150TAG)His6, in cells containing tRNAPyl CUA and the corresponding PylRS mutants identified by tRNA display. GFP production was ncAA dependent and ESI-MS confirmed the incorporation of each ncAA in GFP (Fig.2e-i, Figs.9-18). We note that our results include an aminoacyl-tRNA synthetase for 7, which enables the first incorporation of this thiophene containing ncAA into a protein. Out of the 30 characterized variants with a selectivity score greater than or equal to ten and an enrichment score greater than or equal to five, 27 were active with their cognate ncAA in protein expression (20 variants had activities at least 50% of the activity of PylRS with BocK (1)). All 27 variants selectively incorporated their ncAA substrate, as judged by mass spectrometry (Fig.2 e-i, Figs.9-18). In summary, we demonstrated the parallel, scalable and rapid selection of PylRS variants, using diverse libraries and ncAAs, through tRNA display. We note that the amount of ncAA used in each tRNA display selection is 50- 100 times lower than used in current methods for synthetase selection. Example 7 - Discovery of a β-amino acid specific PylRS variant by tRNA display Next, we challenged tRNA display to discover synthetases for classes of ncMs which either cannot be translated or are known – from in vitro studies – to be poor ribosomal substrates (Fig.3)10,11,14-18. We focused on nine ncMs (12-20, Fig.3a), these monomers include β2 and β3 amino acids with variable stereochemistry, β-hydroxy acids, and carboxylic acids.6-((tert-butoxycarbonyl)amino)hexanoic acid (BocAhx, 15), is a known substrate for wt PylRS21, and provided a control for our ncM selections. To select PylRS variants for these monomers we ran parallel tRNA display selections (Fig.19) using a highly diverse library, which mutated residues 300, 302, 305, 306, 309, 344, 346, 348, and 401 in the PylRS active site (Fig.6), and each ncM. We analyzed the spindle plots derived from the NGS data for each ncM to identify PylRS variants which were enriched and selective (Fig.20-28). For 12 and 15 we identified enriched and selective sequences, and the most selective hits converged on a distinct sequence pattern for each of these ncM. We identified sequences similar to the wild-type PylRS sequence for ncM 15, 6-((tert- butoxycarbonyl)amino)hexanoic acid (BocAhx), indicating that the selection had converged on active synthetases for this monomer. The four most selective and active hits from the selection, PylRS(15_1) to PylRS(15_4), differ from the wild-type sequence by point mutations of V401; we showed, by fluoro-tREX, that all of these selected sequences were active and selective for 15 (Fig.3b, Fig.23). As 15 is expected to be more challenging to cleave from the tRNA than other monomers, including β-amino acids (Fig.4), this result provided confidence that tRNA display would enable the selection of synthetases for a wide range of other monomers, including β-amino acids. Remarkably, two PylRS variants, PylRS(12_1) and PylRS(12_2), identified by tRNA display selection, directed (S)-3-amino-3-(3-bromophenyl)propanoic acid (β3-mBrF, 12)-dependent acylation of tRNAPyl CUA, as judged by fluoro-tREX (Fig.3c, Fig.20). To verify the identity of the monomer attached to the tRNA by PylRS(12_1) and PylRS(12_2) we captured the acylated tRNAPyl CUA on streptavidin beads via a biotinylated probe for tRNAPylCUA, washed the beads, and eluted the ncM by heating under alkaline conditions; we then derivatized the free ncM and analyzed the sample by LC-MS (Fig.29). Using this approach, we confirmed that both PylRS variants charged tRNAPyl CUA with the ncM 12 (Fig.3e,f). To the best of our knowledge PylRS(12_1)/ tRNAPylCUA and PylRS(12_2)/ tRNAPylCUA are the first β-amino acid specific orthogonal aminoacyl-tRNA synthetase/tRNA pairs described to date. Next we increased the activity of PylRS(12_1) and PylRS(12_2), by random mutagenesis of the active site region of the PylRS gene within the stmRNA construct followed by tRNA display-based selection (Figs.30, 31). From the resulting spindle plot, we identified sequences, carrying one to three additional mutations with respect to the parental clones, which were enriched and selective. The three most enriched and selective hits, PylRS(12_1evol1-3, were derivatives of PylRS(12_1). We confirmed the specificity of PylRS(12_1evol1-3) for acylating tRNAPyl CUA
with 12, by both fluoro-tREX and our LC-MS based assay (Fig.3d,f,g, Fig.29 and 32). PylRS(12_1evol1-3) were notably more active in acylating tRNAPyl CUA with 12 than PylRS(12_1) (Fig.3c,d,e,f). Next, we combined PylRS(12_1 or PylRS(12_1evol1) with tRNAPyl CUA and GFP150TAGHis6 in cells. While β-amino acids are expected to be poor ribosomal substrates10,11,14-18, we found that production of GFP was dependent upon addition of 12 to cells. GFP fluorescence was 3-fold higher with PylRS(12_1evol1) than with PylRS(12_1), consistent with the higher acylation activity of PylRS(12_1evol1). We confirmed the incorporation of 12 at position 150 in GFPHis6 purified from cells with PylRS(12_1), or PylRS(12_1evol1), by ESI-MS and MS/MS. In the absence of 12 we observe some GFP production resulting from the incorporation of Phe. However, in the presence of 12 (Fig.3g,h, Fig.33), we produce more GFP, and we only detect incorporation of 12 by intact MS and MS/MS. We conclude that in the presence of 12 the background incorporation of Phe observed in the absence of 12 is effectively outcompeted. Similar observations have previously been made for efficient and selective ncAA incorporation systems28 and the fidelity of the natural code is also known to rely on competition29. We note that we did not observe incorporation of 12 at position 3 of GFP in experiments using GFP3TAGHis6 (Fig.33), consistent with β-amino acids being poor ribosomal substrates10,11,14-18 and the fact that β-amino acids may not be tolerated at all positions in a protein. Next, we performed two rounds of selection with lib14 and ncMs A1, A2, A3, A4, A5 (β-amino acids with varied side chains), A6 (an α,α-disubstituted-amino acid) and A7 (a β-hydroxy acid) (Fig.36a, Fig.38). From the resulting spindle plots (Fig.39-45), we identified enriched and selective sequences, and the most selective hits converged on a distinct sequence pattern for each ncM. We note that the synthetases selected for all six β-amino acids differ in sequence, but contain common mutations M300D and A302H (Figs.20, 39-43). The sequence pattern observed for the β-hydroxy acid A7 is similar to the one observed for β-amino acids. However, the residue at position 300 – which may be in direct proximity to the amine-/hydroxy- group – is changed from aspartic acid to asparagine (Fig.45). The PylRS variants, identified by tRNA display selection, directed the specific acylation of tRNAPyl CUA by their cognate monomer, as judged by fluoro-tREX and our LC-MS based assay (Fig.36d-q). We quantified the fraction of acylation as a function of ncM concentration (Fig.46). To the best of our knowledge, we have discovered the first specific aminoacyl-tRNA synthetase/tRNA pairs for three distinct classes of ncM: β- amino acids, α,α-disubstituted-amino acids and, β-hydroxy acids. Example 8 - Enoding ncMs in proteins To investigate the incorporation of the β-amino acids, α,α-disubstituted-amino acids, and β-hydroxy acids into proteins we combined the orthogonal synthetase/ orthogonal tRNA pairs we have discovered for eight ncMs, 12, A1, A2, A3, A4, A5, A6, A7, with GFP150TAGHis6 in cells, and added the cognate ncM. We observed ncM dependent GFP production for 12, A2, A5 and A6, with isolated yields ranging from 3 to 35 mg per litre of culture, and mass spectrometry confirmed the incorporation of these β-amino acids and α,α-disubstituted-amino acids in GFP (Figs.3h, 33, 36r, 36s, 47, and 48). In the absence of 12 we observe some GFP production resulting from the incorporation of natural amino acids (Fig.36r). However, in the presence of 12, we produce more GFP, and we only detected incorporation of 12 by intact MS and MS/MS (Fig.3h, 33, 47). We conclude that in the presence of 12 the background incorporation, observed in the absence of 12, was effectively outcompeted; we made similar observations for A2, A5 and A6 (Fig.3h, 36s, 33, 47). Similar observations have previously been made for efficient and selective ncAA incorporation systems and the fidelity of the natural code is also known to rely on competition. We conclude that 12, A2, A5 and A6 are site-specifically incorporated with high-fidelity. We did not observe incorporation of ncMs 12, A2, A5 or A6 at position 3 of GFP (from GFP3TAGHis6) (Fig.48), indicating that these ncMs are not tolerated at all positions in a protein; similar site-dependent incorporation efficiency has previously been observed for ncAAs. We note that we did not observe ncM-dependent increase in production of GFP from GFP150TAGHis6 or GFP3TAGHis6 with A1, A3, A4 or A7 when cells were provided with these ncMs and their cognate orthogonal synthetase/ orthogonal tRNA pairs (Figs.36r, 48). These observations are consistent with these ncMs being poor
substrates for ribosomal polymerization. For A4 and A7 we observe a decrease in GFP production upon addition of ncM (Fig.36r); this is consistent with these ncMs, once acylated onto the orthogonal tRNA, inhibiting read- through of the amber codon. The discovery of orthogonal synthetases that are specific for these ncMs for provides a starting point for selecting ribosomes that polymerize them. Example 9 - Structure of β-amino acid protein To further characterize the incorporation of 12 at position 150 of GFP, we solved the structure of GFP(150(S)β3mBrFHis6 at 1.5 Å by X-ray crystallography (the protein was purified from cells harboring PylRS(12_1) and tRNAPylCUA (Fig.3i). The electron density shows two consecutive carbon atoms (C2 and C3) in the protein backbone at position 150; the meta-bromo phenyl substituent is attached to C3 of the β-amino acid, and the stereochemistry at C3 corresponds to the expected (S) stereoisomer. Our structure confirms the site- specific incorporation of the expected (S)β3mBrF-amino acid in the protein. The introduction of (S)β3mBrF leads to a notable kink in the beta barrel of GFP, when compared to the wildtype protein (Fig.34). Interestingly, the hydrogen bonding networks of the residues immediately preceding and following the β-amino acid in the polypeptide chain remain essentially unperturbed; this indicates that this beta barrel can accommodate the β3- amino acid at this position (Fig.34). To the best of our knowledge, this represents the first structure for a protein produced in vivo that contains a β-amino acid. Taken together, our data demonstrate the site-specific incorporation of a β-amino acid in a protein produced in cells. Example 10 - Discussion The in vivo incorporation of backbone modified monomers is a long-standing, unaddressed challenge in expanding the scope of encoded cellular polymer synthesis beyond α-L amino acids and their close analogs4,30. The co-dependence of tRNA acylation and ribosomal polymerization31 in the translational readouts commonly used to select for tRNA acylation12,13 or ribosomal polymerization creates an evolutionary deadlock; this deadlock has hitherto limited the range of monomers that can be used to acylate specific orthogonal tRNAs in vivo. tRNA display breaks this co-dependence-based deadlock. It achieves this by creating a selectable acylation phenotype and coupling this phenotype to the sequence of the corresponding synthetase gene; this enables the direct selection for orthogonal aminoacyl-tRNA synthetases that acylate their cognate orthogonal tRNAs with ncMs, without any requirement for the ncM to be a ribosomal substrate or function in translation. Using tRNA display we have rapidly and scalably selected efficient aaRS systems for eight ncAAs; these selections consume orders of magnitude less compound than previous approaches. Moreover, we have used tRNA display to select, and improve, tRNA acylation systems for ncMs; these systems could not have been discovered using previous approaches. Having established that tRNAPylCUA was acylated with the ncM 12 in cells, we were able to identify a site in GFP where incorporation of this ncM was tolerated to realize the genetically encoded incorporation of a β-amino acid in a protein. In future work we will leverage the cellular acylation of tRNAs with ncMs to enable translation-based selections for orthogonal ribosomes28,31,33,34 that can polymerize ncMs at a wider range of sites. Translation-based selections31 may also generate orthogonal ribosomes that facilitate the encoded cellular synthesis of polymers composed of more diverse ncMs. We anticipate that the repertoire of ncMs that can be used to construct modified proteins and non-canonical polymers may be further expanded by combining approaches for directly encoding ncMs with elegant approaches for increasing the chemical scope of peptides and proteins via post-translational modification and protein ligation35-39. The genetic encoding of β-amino acids may enable the creation of protease-resistant proteins40-42 in cells. Future developments may combine strategies for encoding non-canonical polymers in cells5 with an expansion in the range of monomers that can be encoded to enable the encoded biosynthesis of foldamers43, composed of β-amino acids and other ncMs, which can form secondary, tertiary and quaternary structures44,45. We anticipate that it may
be possible to complement and augment the canonical functions of cells through the design and directed evolution of genetically encoded foldamers. In addition to ncM 12, using tRNA display we have discovered orthogonal synthetase/orthogonal tRNA pairs that are selective for eight new ncMs, including β-amino acids, α,α-disubstituted-amino acids, β-hydroxy acids, and thereby directly facilitated the genetic encoding, and site-specific incorporation of β-amino acids, α,α- disubstituted-amino acids into proteins in a living organism. Example 9 – Methods Buffers Resuspension buffer (RB): 100 mM NaOAc, 50 mM NaCl, 0.1 mM EDTA, pH 5.0 Deacylation buffer (DB): 50 mM Bicine pH 9.6, 1 mM EDTA Buffer D (D): 50 NaOAc pH 5, 150 NaCl, 10 mM Mg2Cl, 0.1 mM EDTA Hybridization buffer (HB): 10 mM Tris-HCl, 25 mM NaCl, pH 7.4 Klenow (exo-) master mix with Cy5-11-dCTP (KMM-Cy5): 17 μL water, 5 μL 10x NEBuffer 2.0, 1 μL Klenow exo(-), 1 dNTPS-dCTP (10 mM), 1 μL Cy5-11-dCTP (20 μM). Orange loading dye (OLD): 8 M urea, Orange G Klenow (exo-) master mix with Cy5-11-dCTP mini (KMM-Cy5-mini): 1.1 μL water, 1.2 μL 10x NEBuffer 2.0, 0.2 μL Klenow exo(-), 0.5 dNTPS-dCTP (5 mM), 1 μL Cy5-11-dCTP (5 μM). Klenow (exo-) master mix with Bio-11-dCTP (KMM-Bio): 17 μL water, 5 μL 10x NEBuffer 2.0, 1 μL Klenow exo(-), 1 dNTPS-dCTP (10 mM), 1 μL Bio-11-dCTP (20 μM). Washing buffer (WB): 10 mM Tris-HCl, 150 mM LiCl, 1 mM EDTA, 0.05% v/v Tween20, pH 7.5 Binding buffer (BB): 20 mM Tris-HCl, 1 M LiCl, 2 mM EDTA, 0.05% v/v Tween20, pH 7.5 Formamide loading buffer (FMB): 90% formamide SDS Lysis Buffer (SLB): 100 mM NaOAc, 50 mM NaCl, 0.1 mM EDTA, pH 5.0, 1% (m/w) SDS Alkaline washing buffer (AWB): 25 mM NaOH, 4 mM EDTA, 0.05% Tween20 Reverse transcription hybridization mix (RHM): 1 μL DNA primer (2 μM), 1 μL 10 mM dNTPs, 1 μL 10x HB, 10 μL water RT master mix (RMM): 4 μL SSIV buffer, 1 μL RNAse Out, 1 μL SSIV RT, 10.1 M dithiothreitol (DTT) Acidic washing buffer 1(aWB1): 100 mM NaOAc pH 5 Acidic washing buffer 1 plus Tween20 (aWB1-T): 100 mM NaOAc pH 5, 0.01% (v/v) Tween-20 Acidic washing buffer 2 (aWB2): 20 mM NaOAc pH 5 Media SOC: Super Optimal Broth plus glucose 2xYT-s: Yeast Extract Tryptone supplemented with 75 μg/mL spectinomycin 2xYT-s-t: Yeast Extract Tryptone supplemented with 75 μg/mL spectinomycin and 10 μg/mL tetracycline 2xYT-s-ap: Yeast Extract Tryptone supplemented with 75 μg/mL spectinomycin and 50 μg/mL apramycin 2xYT-am: Yeast Extract Tryptone supplemented with 75 μg/mL ampicillin Chemicals NcM 1 and 2 were purchased from Bachem. NcM 4 was purchased from Fluorochem. NcM 5 was purchased from Ambeed. NcMs 6, 8, 13, 17 and 18 were purchased from Enamine. NcM 7 was purchased from aaBlocks. NcMs 9, 10, 15 and 16 were purchased from Merck. NcMs 12 and 20 were purchased from BLD. NcM 14 was purchased from Advanced ChemBlock. NcM 19 was purchased from AstaTech. NcMs 3 was synthesized as previously described (Spinck, M. et al. s. Nature Chemistry 15, 61-69 (2023)) and ncM 11 was custom synthesized as previously described (Tang, S. et al. Nature 602, 701-707 (2022)). NcMs 13 and 18 were Boc deprotected in concentrated HCl in dioxane. DNA constructs cloning Standard cloning was performed by Gibson assembly using NEBuilder® HiFi DNA Assembly Master Mix (NEB) according to manufacturers guidelines. Libraries were generated by enzymatic inverse PCR, as previously described. Briefly, a template plasmid was amplified by PCR using two primers (see primer list) containing degenerate codons at desired mutagenesis sites and a BsaI cleavage site. In the case of custom mixes, primers containing different codons were manually mixed and used for PCR reactions. PCR products were gel purified and digested using BsaI and DpnI. Subsequently, samples were purified, ligated using T4 Ligase, and transformed into electrocompetent E. coli DH10ß cells ensuring a minimal transformation efficiency of 109. Individual colonies (>10) were evaluated using sanger sequencing for quality control of the library assembly. Total plasmid DNA was prepared from the resulting culture, sequenced as a bulk using Sanger sequencing and used for subsequent experiments. General Protocols
NGS data analysis NGS was performed on a MiSeq system (in the case of the test evolution with library 1 and substrate 1) or a NextSeq2000 system (in all other cases). The resulting cDNA from tRNA display was amplified using oligos NGS A(1-8) and NGS_B(1-8) containing different combination of Nextera sequencing barcodes via PCR. Samples were purified, quantified, and combined in equimolar amounts. Paired end reads were first paired using PEAR7 , and aligned to a reference sequence of MmPylRS using Bowtie28. The relevant library positions were extracted and translated to amino acids, and resulting variants were counted using R script. Subsequent operations were performed using the frequency of each variant in each library which was computed as the count value divided by the total number of counts of that library. Using R script, enrichment and selectivity scores were calculated for all variants as follows. First, variants that were only present in all positive replicates were considered (tables were merged using AND operator). Assuming that highly enriched sequences could potentially not be covered in the negative and the input samples but may still be of interest, the negative and the naïve replicates were merged to the positive table using an OR operator. A placeholder value of 0.95 counts was adopted in cases in which a replicate was not covering a specific variant. The resulting data set was used to calculate mean enrichments in the presence and in the absence of the respective substrate, computed as the quotient of the mean frequencies in one condition and the input condition. The resulting positive and negative enrichments were used to calculate the selectivity value for each variant (equivalent to the quotient of positive and negative frequencies). For further analysis, variants were filtered using an empirically determined threshold value for the normalized standard deviation of the positive frequency (dispersion error in the plus substrate condition). tRNA pulldown and ncM identification by LC-MS tRNAs were isolated from 8 mL of cells following the general protocol B omitting the oxidation by NaIO4. The RNA pellet was resuspended in 90 µL buffer D and RNA concentrations adjusted to match the lowest concentration in the samples being compared.0.5 µL of biotinylated DNA probe (100 µM) was added to the RNA and the DNA probe was hybridized at 65˚C for 5 min.40 µL Streptavidin Dynabeads MyOne C1 (Invitrogen), were washed twice with buffer D-T, and added into 10 µL buffer D. The beads were added to hybridization reaction and the probe was bound to the beads for minimally 30 min at 4 ˚C with head over tail rotation. The samples were washed three times with 200 µL of acW1-T, twice with 200 µL of acW1, three times with 200 µL of acW2 and once with 200 µL water, all on a magnetic stand.24 µL of DB was added and the beads, which were incubated at 42 ˚C for one hour.12 µL of the deacylation mix was added to 3 µL 6-Aminoquinolyl-N- hydroxysccinimidyl carbamate (AQC - 3 mg/mL in acetonitrile) and the reaction incubated at 55 ˚C for 15 min. Samples were analyzed on an Agilent Technologies 6130 Quadrupole LC/MS using single ion monitoring. GFP(150X)His6 and GFP(3X)His6, where X stands for any ncM, activity assay Chemically competent DH10ß cells harboring a p15A plasmid encoding GFP150TAGHis6 or GFP3TAGHis6 (two versions of the p15A plasmid with either a tetracycline or apramycin resistance cassette, which led to similar results, were used interchangeably) were transformed with a pMB1 plasmid encoding MmPylRS, or a mutant thereof, and MmtRNAPyl CUA, rescued in SOC and grown over night in 2xYTs-t or 2xYTs-ap.20 µL of the overnight culture was diluted into 480 µL 2xYTs-t or 2xYTs-ap containing 0.2% L-arabinose in presence and absence of 2 to 4 mM of the respective ncM in a 96-well-plate format. Cells were grown for 16-20 h at 37˚C at 700 r.p.m. The plates were centrifuged for 12 min, 4200 rcf at 4˚C and the cells resuspended in 150 µL PBS.100 µL of the resuspended cells were transferred into a Costar 96-well flat bottom plate and the OD600, and GFP fluorescence was measured using a PHERAstar FS plate reader. GFP(150X)His6 isolation for MS analysis Three replicates of the protein produced as described above were combined in a 1.5 mL Eppendorf tube, centrifuged at 4200 rcf for 3 min, frozen at -20 ˚C, thawed and resuspended in 150 µL BugBuster (Millipore). Cells were lysed for 1 h with head over tail rotation. Lysed cells were centrifuged for 20 min, 20000 rcf, at 4 ˚C and the lysate was added to 20 µL of NiNTA beads. GFP(150)Hi6 was bound to the beads for 20 min at room temperature with head over tail rotation. The beads were washed six times with 60 µL 30 mM imidazole in PBS, and the protein was eluted with five times 30 µL 300 mM imidazole in PBS. For low activity mutants of ncM 125-15 mL of cell culture was used for protein production. The volumes of Bug Buster were adjusted proportionally, all other volumes were kept the same. Mass spectroscopy ESI-MS as well as MS/MS were obtained as previously described24, 27. Protein expression, purification, and crystallization Chemically competent DH10ß cells harboring a p15A plasmid encoding GFP150TAGHis6 and a pMB1 plasmid encoding MmPylRS(12_1) and MmtRNAPyl were transformed, rescued in SOC and grown over night in 2xYTs-ap. 10 mL of the overnight culture was diluted into 1 L 2xYTs-ap containing 0.2% L-arabinose in presence of 5 mM 12. Bacterial pellet of 1 L expression culture of GFP150(S)β3mBrF-His6 was lysed by sonication, centrifuged at 142,000 rcf for 30 minutes and supernatant bound to Ni-NTA beads (Qiagen). Beads were washed three times before protein was eluted and further purified by gel filtration using a Superdex 75 HiLoad 26/60 pg column (GE Healthcare) in 25 mM Tris pH 7.4, 200 mM NaCl and 0.06 % NaN3. The purified protein was concentrated using Vivaspin 20, 10,000 MWCO (Sartorius) to 6 mg/mL. Sample was Trypsin digested with Sequencing Grade
Modified Trypsin (Promega) in a 50:1 ratio. Sample was incubated for 1 hour at 37 ^C, centrifuged at 21,000 rcf for 10 minutes before plating in crystal trays. Crystallization trials with multiple commercial crystallization kits were performed in 96-well sitting-drop vapor diffusion plates (Molecular Dimensions) at 18°C and set up with a Mosquito HTS robot (TTP Labtech). Drop ratios of 0.2 μL protein solution plus 0.2 μL reservoir solution were used for screening. The only useful dataset was collected from a crystal harvested from the Fusion screen (Molecular Dimensions) with following composition: 37.5 % PEG 3350/PEG 1K/MPD (1:1:1), 0.1 M Bicine/Trizma pH 8.5, 0.8 % (w/v) Morpheus III Alkaloids and 0.12 M Morpheus Alcohols. Crystals were harvested and flash frozen in liquid nitrogen. Diffraction data collection, processing, and structure solution. Diffraction data were collected at the ESRF on beamline ID23-2 at an energy of 14.2 keV. Data were processed with XDS via the pipeline autoProc (Global Phasing ltd.). The structure was solved by molecular replacement with MolRep using the homologue model PDB 2B3P. Interactive building was performed with Coot, refinement with REFMAC5, and validation with Molprobity. Figures of the structure were prepared with PyMOL (PyMOL Molecular Graphics System, Schrödinger, LLC). Selection for ncAAs The selection was performed as described in Figure 8. RNA was isolated and oxidized as described in general procedure A. Bio-mREX was performed as specified in the general procedure. After the first round of selection the new libraries were assembled from the amplified cDNA as described above. After the second round of selection the NGS samples were prepared from the isolated cDNA as described above, the NGS run using a P2 600 cycles cartridge, and the data was analyzed as specified above. Selection for ncMs The selection was performed as described in Figure 19. RNA was isolated and oxidized as described in general procedure A. Bio-mREX was performed as specified in the general procedure. The NGS samples were prepared from the isolated cDNA as described above, the NGS run using a P1600 cycles cartridge, and the data was analyzed as specified above. Selection for substrate 12 using a random mutagenesis library The concentrations of the pMB1 plasmids encoding PylRS hits 12_1 and 12_2 were measured by Qubit 2 Fluorometer (Life Technologies) and the Qubit 1x dsDNA HS Assay Kit (Invitrogen) and the plasmids combined in equimolar amounts. The combined plasmids were used for an error prone PCR of the active site of PylRS using golden gate primers and the GeneMorph II kit (Agilent) at conditions leading to the maximal number of random mutations. The amplicons were cloned into a new pColE1 backbone by two-piece Golden Gate assembly according to NEB (New England Biolabs) guidelines. The selection was performed as outlined in Figure 30. RNA was isolated and oxidized as described in general procedure A. Bio-mREX was performed as specified in the general procedure. The NGS samples were prepared from the isolated cDNA as described above, the NGS run using a P2600 cycles cartridge, and the data was analyzed as specified above. Code availability The code for analysing tRNA display NGS data will be available upon publication at https://github.com/JWChin- Lab/tRNA_display.
References 1 Dumas, A., Lercher, L., Spicer, C. D. & Davis, B. G. Designing logical codon reassignment - Expanding the chemistry in biology. Chem Sci 6, 50-69, doi:10.1039/c4sc01534g (2015). 2 Young, D. D. & Schultz, P. G. Playing with the molecules of life. ACS chemical biology 13, 854-870 (2018). 3 Chin, J. W. Expanding and reprogramming the genetic code. Nature 550, 53-60, doi:10.1038/nature24031 (2017). 4 De La Torre, D. & Chin, J. W. Reprogramming the genetic code. Nature Reviews Genetics 22, 169-184, doi:10.1038/s41576-020-00307-7 (2021). 5 Robertson, W. E. et al. Sense codon reassignment enables viral resistance and encoded polymer synthesis. Science 372, 1057-1062 (2021). 6 Spinck, M. et al. Genetically programmed cell-based synthesis of non-natural peptide and depsipeptide macrocycles. Nature Chemistry 15, 61-69 (2023). 7 Ellman, J. A., Mendel, D. & Schultz, P. G. Site-specific incorporation of novel backbone structures into proteins. Science 255, 197-200 (1992). 8 Mendel, D., Cornish, V. W. & Schultz, P. G. Site-directed mutagenesis with an expanded genetic code. Annual review of biophysics and biomolecular structure 24, 435-462 (1995). 9 Hecht, S. M. Expansion of the genetic code through the use of modified bacterial ribosomes. Journal of molecular biology 434, 167211 (2022). 10 Katoh, T. & Suga, H. In vitro genetic code reprogramming for the expansion of usable noncanonical amino acids. Annual Review of Biochemistry 91, 221-243 (2022). 11 Melo Czekster, C., Robertson, W. E., Walker, A. S., Soll, D. & Schepartz, A. In Vivo Biosynthesis of a beta-Amino Acid- Containing Protein. J Am Chem Soc 138, 5194-5197, doi:10.1021/jacs.6b01023 (2016). 12 Santoro, S. W., Wang, L., Herberich, B., King, D. S. & Schultz, P. G. An efficient system for the evolution of aminoacyl- tRNA synthetase specificity. Nature biotechnology 20, 1044-1048 (2002). 13 Chin, J. W., Martin, A. B., King, D. S., Wang, L. & Schultz, P. G. Addition of a photocrosslinking amino acid to the genetic code of Escherichia coli. Proceedings of the National Academy of Sciences 99, 11020-11024 (2002). 14 Tan, Z., Forster, A. C., Blacklow, S. C. & Cornish, V. W. Amino Acid Backbone Specificity of the Escherichia c oli Translation Machinery. Journal of the American Chemical Society 126, 12752-12753 (2004). 15 Pavlov, M. Y. et al. Slow peptide bond formation by proline and other N-alkylamino acids in translation. Proceedings of the National Academy of Sciences 106, 50-54 (2009). 16 Katoh, T., Tajima, K. & Suga, H. Consecutive Elongation of D-Amino Acids in Translation. Cell Chem Biol 24, 46-54, doi:10.1016/j.chembiol.2016.11.012 (2017). 17 Katoh, T. & Suga, H. Ribosomal Incorporation of Consecutive beta-Amino Acids. J Am Chem Soc 140, 12159-12167, doi:10.1021/jacs.8b07247 (2018). 18 Dedkova, L. M. et al. beta-Puromycin selection of modified ribosomes for in vitro incorporation of beta-amino acids. Biochemistry 51, 401-415, doi:10.1021/bi2016124 (2012). 19 Cervettini, D. et al. Rapid discovery and evolution of orthogonal aminoacyl-tRNA synthetase-tRNA pairs. Nat Biotechnol, doi:10.1038/s41587-020-0479-2 (2020). 20 Barton, P., Laws, A. P. & Page, M. I. Structure–activity relationships in the esterase-catalysed hydrolysis and transesterification of esters and lactones. Journal of the Chemical Society, Perkin Transactions 2, 2021-2029 (1994). 21 Kobayashi, T., Yanagisawa, T., Sakamoto, K. & Yokoyama, S. Recognition of non-alpha-amino substrates by pyrrolysyl-tRNA synthetase. J Mol Biol 385, 1352-1360, doi:10.1016/j.jmb.2008.11.059 (2009). 22 Soma, A. et al. Permuted tRNA genes expressed via a circular RNA intermediate in Cyanidioschyzon merolae. Science 318, 450-453 (2007). 23 El Yacoubi, B., Bailly, M. & de Crécy-Lagard, V. Biosynthesis and function of posttranscriptional modifications of transfer RNAs. Annual review of genetics 46, 69-95 (2012). 24 Dunkelmann, D. L., Oehm, S. B., Beattie, A. T. & Chin, J. W. A 68-codon genetic code to incorporate four distinct non- canonical amino acids enabled by automated orthogonal mRNA design. Nat Chem, doi:10.1038/s41557-021-00764-5 (2021). 25 Salis, H. M., Mirsky, E. A. & Voigt, C. A. Automated design of synthetic ribosome binding sites to control protein expression. Nature biotechnology 27, 946-950 (2009). 26 Yanagisawa, T. et al. Structural Basis for Genetic-Code Expansion with Bulky Lysine Derivatives by an Engineered Pyrrolysyl-tRNA Synthetase. Cell Chem Biol 26, 936-949 e913, doi:10.1016/j.chembiol.2019.03.008 (2019). 27 Dunkelmann, D. L., Willis, J. C. W., Beattie, A. T. & Chin, J. W. Engineered triply orthogonal pyrrolysyl-tRNA synthetase/tRNA pairs enable the genetic encoding of three distinct non-canonical amino acids. Nat Chem 12, 535-544, doi:10.1038/s41557-020-0472-x (2020). 28 Wang, K., Neumann, H., Peak-Chew, S. Y. & Chin, J. W. Evolved orthogonal ribosomes enhance the efficiency of synthetic genetic code expansion. Nat Biotechnol 25, 770-777, doi:10.1038/nbt1314 (2007). 29 Fredens, J. et al. Total synthesis of Escherichia coli with a recoded genome. Nature 569, 514-518, doi:10.1038/s41586- 019-1192-5 (2019). 30 Arranz-Gibert, P., Vanderschuren, K. & Isaacs, F. J. Next-generation genetic code expansion. Current Opinion in Chemical Biology 46, 203-211 (2018). 31 Schmied, W. H. et al. Controlling orthogonal ribosome subunit interactions enables evolution of new function. Nature 564, 444-448, doi:10.1038/s41586-018-0773-z (2018). 32 Beattie, A. T., Dunkelmann, D. L., Chin, J. W. Quintuply orthogonal pyrrolysyl-tRNA synthetase/tRNAPyl pairs. Nature Chemistry In press (accepted) (2023). 33 Rackham, O. & Chin, J. W. A network of orthogonal ribosome x mRNA pairs. Nat Chem Biol 1, 159-166, doi:10.1038/nchembio719 (2005). 34 Neumann, H., Wang, K., Davis, L., Garcia-Alai, M. & Chin, J. W. Encoding multiple unnatural amino acids via evolution of a quadruplet-decoding ribosome. Nature 464, 441-444, doi:10.1038/nature08817 (2010). 35 Morinaka, B. I. et al. Natural noncanonical protein splicing yields products with diverse β-amino acid residues. Science 359, 779-782 (2018). 36 Lakis, E., Magyari, S. & Piel, J. In Vivo Production of Diverse β-Amino Acid-Containing Proteins. Angewandte Chemie 134, e202202695 (2022). 37 Camarero, J. A. & Muir, T. W. Native chemical ligation of polypeptides. Current Protocols in Protein Science 15, 18.14. 11-18.14.21 (1999). 38 Niquille, D. L. et al. Nonribosomal biosynthesis of backbone-modified peptides. Nature chemistry 10, 282-287 (2018). 39 Arnison, P. G. et al. Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Natural product reports 30, 108-160 (2013).
40 Seebach, D. et al. β-Peptides: Synthesis by Arndt-Eistert homologation with concomitant peptide coupling. Structure determination by NMR and CD spectroscopy and by X-ray crystallography. Helical secondary structure of a β-hexapeptide in solution and its stability towards pepsin. Helvetica Chimica Acta 79, 913-941 (1996). 41 Hintermann, T. & Seebach, D. The Biological Stability of?-Peptides: No Interactions between?-and?-Peptidic Structures? Chimia 51, 244-244 (1997). 42 Seebach, D. et al. Biological and pharmacokinetic studies with β-peptides. Chimia 52, 734-734 (1998). 43 Gellman, S. H. Foldamers: a manifesto. Accounts of chemical research 31, 173-180 (1998). 44 Horne, W. S., Price, J. L. & Gellman, S. H. Interplay among side chain sequence, backbone composition, and residue rigidification in polypeptide folding and assembly. Proceedings of the National Academy of Sciences 105, 9151-9156 (2008). 45 Wang, P. S. & Schepartz, A. β-Peptide bundles: Design. Build. Analyze. Biosynthesize. Chemical Communications 52, 7420-7432 (2016).