IN-CELL CONTINUOUS TARGET-GENE EVOLUTION, SCREENING AND SELECTIONELD OF THE INVENTION present invention relates to methods and means for evolution of a target sequence of interest. CKGROUND OF THE INVENTION rent molecular evolution methods, mainly committed to binders engineering, such as displayhnologies, impose a series of constraints such as: 1) The high cost and time per optimization le related to the library construction using purified reagents including molecular biology ducts, target protein production (expression, purification and labeling), biopanning method elopment and man-hours; 2) Limited diversity imposed by the cell transformation bottleneck; Experimenter bias and; 4) Due to the mentioned constraints, these methods frequently impose ocus the diversity to small regions of the evolving molecule thus requiring previous structure function knowledge, making difficult to implement multiple evolution rounds, to scale-up and arallelize the assays. te-of-art technologies that comply to the continuous evolution paradigm such as PACE (Esvelt l, 2011) and MAGE (Wang et al.2009) can partially address some of these constraints by using cially conceived electronic apparatus that are not commercially available and impose evident dles to assay parallelization and scale-up. present invention is aiming to provide improved methods overcoming the mentionedwbacks. MMARY OF THE INVENTION present invention provides methods and means for implementing evolution inside cells. It uld allow to address some major concerns in protein engineering projects such as: a) the itations regarding the diversity up-scale, b) the requirement of highly optimized in vitroction using purified products by experts in the field of molecular biology and molecular display, he associated costs, d) the high time-to-results and, e) the relative low convenience of display ed methods. other words, the invention concerns methods and means that implement an intracellular tinuous evolution program focused on one (or multiple) target-gene(s) and that may encompass the required evolutionary steps: Diversity generation, variant production and optionally
ening of protein variants and stopping the generation of diversity if a good variant is found. s new technology should then allow to: 1. simplify molecular evolution by suppressing several steps requiring experimenter's intervention. 2. reduce cost, time and experimenter bias since no in vitro reaction would be required after cell transformation. 3. overcome the current diversity size limitations associated to the cell transformation bottleneck since the diversity should be generated inside cells. As a consequence, the number of independent clones can be modulated simply by adjusting the culture volume. 4. increase the diversity of solutions (good variants) since every cell implied in a continuous evolution process could generate variants resulting from different evolutionary pathways (theoretically, each cell is converted into an independent gene evolution machine). 5. avoid specific device-related constraints for execution, thereby, limiting the investment required for its use and granting easiness for technology application, scale-up and parallelization. 6. obviate the need of purified target-protein and library construction required for most display-based strategies. particular aspect, the present invention relates to a method for generating diversity in a gene omprising: providing a bacterial cell comprising a molecular complex formed by the association of: - a scaffold protein (SP), - a template RNA (tpRNA) comprising from 5’ to 3’: the gene L, an RTtag sequence operably linked to the gene L and a scaffold protein binding module 1 (SPBM1) sequence capable of binding to the SP at a first specific binding site (SPS1). - a primer RNA (prRNA) comprising: an RTprimer sequence positioned in 3’ end of the prRNA and capable of complementary pairing to the RTtag sequence, a scaffold protein binding module 2 (SPBM2) sequence capable of binding to the SP at a second specific binding site (SPS2) and a reverse transcriptase binding module (RBM) sequence, and - a fusion protein (RBD-RT) comprising a reverse transcriptase (RT) and an RBM binding domain (RBD) capable of binding to the RBM of the prRNA; and placing the bacterial cell in conditions that allow the reverse transcription of the gene L, eby generating altered copies of said gene L of the tpRNA. Optionally, the RT of the fusion protein is TF1 or the HIV or MMLV reverse transcriptase.
onally, the SP is Hfq protein or a fragment or variant thereof. ionally, the prRNA further comprises a transfer RNA (tRNA) sequence contiguously itioned 3' upstream of the RTprimer sequence, said tRNA sequence comprising a specific site can be cleaved by a bacterial cell RNAse, preferably by RNAse P, thereby producing a well- ned 3’ prRNA end and a tRNA. ferably, the bacterial cell further expresses a homologous recombination (HR) factor capable ntegrating the altered copies of the gene L into a DNA vector or into a genome of the bacterial , said vector or genome comprising a copy of the gene L, thereby preserving the altered copies he gene L from degradation and allowing it to be expressed or to be iteratively altered in new les. Optionally, the HR factor is a lambda phage beta protein (λBet). ionally, the bacterial cell further expresses a preservative effector capable of inhibiting an Ase, thereby preserving tpRNA, prRNA and altered copies of the gene L from degradation by Ase. Optionally, the preservative effector is RNA helicase rhlB or a fragment 711-844 of Ase E. ernatively, the bacterial cell further expresses a preservative effector capable of impairing the match repair system (MMR) function. Optionally, the preservative effector is a xyadenosine methylase (dam), preferably a dam over-expressed by transient methods, or mutL /or mutS dominant negative mutants. present invention also relates to a method for screening a ligand molecule capable of binding rget molecule from variants encoded by altered copies of a gene L prepared by the method ording to the present invention, wherein the bacterial cell further comprises a bacterial two- rid system (B2H) comprising: a promoter (P), a sequence defining a ribosome binding site (RBS) and a reporter gene, P sequence being operably linked to the RBS sequence and the reporter gene, a fusion protein (FPR) comprising the target molecule and a DNA binding domain (DBD), said DBD being capable of binding to a site located at proximity of the promoter P so as to promote the expression of the reporter gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, and a fusion protein (FPL) comprising a variant encoded by an altered copy of the gene L and transcription subunits (TrSu) capable of recruiting an RNA polymerase, or
a fusion protein (FPL) comprising a variant encoded by an altered copy of the gene L a DNA binding domain (DBD), said DBD being capable of binding to a site located at ximity of the promoter P so as to promote the expression of the reporter gene when the target ecule is bound to a variant encoded by an altered copy of the gene L, a fusion protein (FPR) comprising the target molecule and transcription subunits (TrSu) able of recruiting an RNA polymerase, method comprises the selection of the variant encoded by an altered copy of the gene L when reporter is expressed, optionally at least at a predetermined level. ionally, the B2H further comprises a DNA invertase gene operably linked to the promoter P, d DNA invertase being capable of targeting DNA invertase sites that flank DNA sequences oding the RT and/or the HR factor, thereby stopping the method for generating diversity in a e L once the binding between the target molecule and the ligand molecule occurs. ernatively, the DNA invertase could be replaced by highly specific restriction enzyme (such as I) and by replacing invertase sites by the corresponding restriction sites. In this aspect, the B2H her comprises a gene encoding a highly specific restriction enzyme (such as SceI) to the moter P, said restriction enzyme being capable of introducing double-stranded break at riction sites that flank DNA sequences encoding the RT and/or the HR factor, thereby stopping method for generating diversity in a gene L once the binding between the target molecule and ligand molecule occurs, in particular by removal of the DNA sequences encoding the RT and/or HR factor. nother alternative, the method for generating diversity in the gene L can be stopped by using a scription repressor. In this aspect, the B2H further comprises a gene encoding a transcription ressor to the promoter P or P’, said transcription repressor being capable of stopping the ression of the DNA sequences encoding the RT and/or the HR factor, thereby stopping the hod for generating diversity in a gene L once the binding between the target molecule and the nd molecule occurs. ionally, the expression of the FPR and/or FPL component, for instance the component mprising the DBD, is controlled by the association of a strong promoter and a weak RBS. present invention further relates to a method for screening a ligand molecule that loses the acity of binding a target molecule from variants encoded by altered copies of a gene L prepared the method according to the present invention, wherein the bacterial cell further comprises a B2H system comprising:
a first promoter P, a sequence defining a first ribosome binding site (RBS) and a reporter ge the first promoter P being operably linked to the first RBS sequence and the reporter gene an lowing a stable basal level of expression of the reporter gene, and a second promoter P’, a sequence defining a second RBS and a repressor gene, the se d promoter P’ being operably linked to the second RBS sequence and the repressor gene, sa epressor being capable of targeting the first promoter P to block the transcription of the re er gene, a fusion protein (FPR) and fusion protein (FPL), wherein the fusion protein (FPR) co rises the target molecule and a DNA binding domain (DBD), said DBD being capable of bi ng to a site located at proximity of the promoter P’ so as to promote the expression of the re sor gene when the target molecule is bound to a variant encoded by an altered copy of the ge L, and the fusion protein (FPL) comprises a variant encoded by an altered copy of the gene L transcription subunits (TrSu) capable of recruiting an RNA polymerase; or wherein the fu protein (FPR) comprises the target molecule and transcription subunits (TrSu) capable of re ting an RNA polymerase, and the fusion protein (FPL) comprising a variant encoded by an alt d copy of the gene L and a DNA binding domain (DBD), said DBD being capable of binding to te located at proximity of the promoter P’ so as to promote the expression of the repressor ge when the target molecule is bound to a variant encoded by an altered copy of the gene L; and th ethod comprises the selection of the variant encoded by an altered copy of the gene L when th pression of the reporter is increased, optionally at least at a predetermined level.
Op nally, the B2H further comprises a DNA invertase gene operably linked to the second pr oter P’, said DNA invertase being capable of targeting DNA invertase sites that flank DNA se nces encoding the RT and/or the HR factor, thereby stopping the method for generating di ity in a gene L once the binding between the target molecule and the ligand molecule is lost.
Op nally, the B2H further comprises a gene encoding a highly specific restriction enzyme to the pr oter P’, said restriction enzyme being capable of introducing double-stranded break at re tion sites that flank DNA sequences encoding the RT and/or the HR factor, thereby stopping th ethod for generating diversity in a gene L once the binding between the target molecule and th and molecule is lost, in particular by removal of the DNA sequences encoding the RT and/or th R factor.
Op nally, the B2H further comprises a gene encoding a transcription repressor to the promoter P’ id transcription repressor being capable of stopping the expression of the DNA sequences encoding the RT and/or the HR factor, thereby stopping the method for generating diversity in a
once the binding between the target molecule and the ligand molecule is lost. Optionally, the pressor under the control of the second promoter P’ is capable of stopping the expression of the NA sequences encoding the RT and/or the HR factor. Op nally, the expression of the FPR and/or FPL component, for instance the component co ising the DBD, is controlled by the association of a strong promoter and a weak RBS. In dition, the present invention relates to a single vector or a set of vectors that can be tra ormed in a bacterial cell, comprising: - a transcription cassette (tC1) comprising a sequence encoding a pre-tpRNA operably lin to a promoter (P1), said pre-tpRNA comprising from 5’ to 3’: an insertion site suitable for the sertion of a gene L, an RTtag sequence operably linked to the gene L to be inserted and a SP M1 sequence, wherein said tC1 is suitable for allowing, in the bacterial cell, the transcription of pRNA including an inserted gene L, wherein the SPBM1 is capable of binding to an SP pre t in the bacterial cell at a first specific binding site (SPS1). - a transcription cassette (tC2) comprising a sequence encoding a prRNA operably linked to romoter (P2), said prRNA comprising: an RBM sequence positioned in 5’ end, an SPBM2 seq nce and an RTprimer, wherein said tC2 is suitable for allowing, in the bacterial cell, the tra ription of a prRNA, wherein the RTprimer is capable of complementary pairing to the RTtag, the BM2 is capable of binding to the SP at a second specific binding site (SPS2), and - an expression cassette (eC1) comprising a sequence encoding an RBD-RT fusion protein op bly linked to a promoter (P3), said RBD-RT comprising a reverse transcriptase (RT) seq nce and an RBD sequence, wherein said eC1 is suitable for allowing, in the bacterial cell, the pression of the RBD-RT fusion protein, wherein the RBD is capable of binding to the RBM of NA. Op nally, the single vector or the set of vectors further comprises an expression cassette (eC2) co ising a sequence encoding the SP operably linked to a promoter (P4), preferably said SP be the Hfq protein, wherein eC2 is suitable for allowing, in the bacterial cell, the expression of the , preferably the Hfq protein. Op nally, in the single vector or the set of vectors, the sequence encoding the prRNA further co ises a sequence encoding a tRNA sequence contiguously positioned downstream of the RT mer sequence, a site cleavable by an RNAse of the bacterial cell is present between the said tR sequence and said RTprimer, thereby allowing the production of a well-defined 3’ prRNA en
onally, the single vector or the set of vectors further comprises an expression cassette (eC3)mprising an HR factor gene operably linked to a promoter (P5), wherein said eC3 is suitable for wing, in the bacterial cell, the expression of an HR factor capable of integrating the altered ies of the gene L into a DNA vector or into the genome of the bacterial cell, said vector or ome comprising a copy of the gene L, thereby preserving the altered copies of the gene L from radation and allowing it to be expressed or to be iteratively altered in new cycles. ionally, the single vector or the set of vectors further comprises: - an expression cassette (eC4) comprising a sequence encoding a reporter gene operably linked to a promoter (P6), - an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the reporter gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and - an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising an insertion site suitable for the insertion of the gene L and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of a FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L; or - an expression cassette (eC4) comprising a sequence encoding a reporter gene operably linked to a promoter (P6), - an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and - an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising an insertion site suitable for the insertion of the gene L and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the reporter gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of an FPL protein
comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR- integrated altered copy of gene L; or - an expression cassette (eC4) comprising a sequence encoding a repressor gene operably linked to a promoter (P6), - an expression cassette (eC4’) comprising a sequence encoding a reporter gene operably linked to a promoter (P6’), the expression of the reporter gene being negatively controlled by the repressor encoded by (eC4), - an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the repressor gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and - an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising an insertion site suitable for the insertion of the gene L and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of a FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L; or - an expression cassette (eC4) comprising a sequence encoding a repressor gene operably linked to a promoter (P6), - an expression cassette (eC4’) comprising a sequence encoding a reporter gene operably linked to a promoter (P6’), the expression of the reporter gene being negatively controlled by the repressor encoded by (eC4), - an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and - an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising an insertion site suitable for the insertion of the gene L and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the repressor gene when
the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of an FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR- integrated altered copy of gene L. ionally, the eC1 further comprises DNA invertase sites flanking the sequence encoding RBD- and/or the eC3 further comprises DNA invertase sites flanking the sequence encoding HR or gene, and the eC4 further comprises a sequence encoding a DNA invertase gene operably ed to P6. Optionally, the eC1 further comprises restriction sites flanking the sequence encoding D-RT and/or the eC3 further comprises restriction sites flanking the sequence encoding HR or gene, and the eC4 further comprises a sequence encoding a restriction enzyme gene operably ed to P6. Optionally, the eC1 further comprises a sequence encoding a transcription repressor e operably linked to P6, and the expression of the sequence encoding RBD-RT of the eC1 /or the sequence encoding HR factor gene of the eC3 can be stopped by said transcription ressor gene. ionally, the tC1 and eC6 comprise a gene L instead of the insertion sites. ionally, said vectors are low copy vectors. ally, the present invention relates to a bacterial cell comprising said single vector or set of tors and the use thereof for implementing evolution of a gene of interest. present invention further relates to an improved B2H system and its uses. IEF DESCRIPTION OF THE FIGURES ure 1: Schematic representation of the basic concepts behind one implementation of the racellular system for targeted and continuous gene evolution. RNAs transcribed from the lving gene are reverse transcribed and mutations are randomly incorporated. The mutated DNA aces the original copy of the gene by homologous recombination. Dynamically, different tein variant fusions are expressed and one of them interact conveniently with the target fusion, ce, triggering the expression of reporter, marker and evolution arrest genes which signalize that ood binder was produced and stop continuous evolution. ure 2: RT (1), HR (2), two-hybrid (3) and system arrest (4) modules interaction. With the nt to facilitate the comprehension of the proposed artificial biological circuit, one possible bodiment is schematized and corresponds to the evolution of protein ligands against a target protein.
Semantic connection among modules. The reverse transcription module (1) converts the RNA n evolving binder into a mutated ssDNAs or dsDNAs. Homologous recombination module (2) aces the original gene (or part of the gene) by the mutated version encoded in ssDNAs or DNAs thereby, allowing the variant to be expressed. The two-hybrid module (3) screens the duced variants and if a strong enough binder is found a signal is triggered in order to arrest dule 1 and 2 (module 4), as well as, a signal allowing the isolation of the corresponding cell. refore, diversity generation stops but not the expression of the selected variant and its detection module 3, thus, allowing the isolation of the corresponding cell, the identification of the lving variant and, therefore, its characterization by current techniques. Detailed molecular connections (DNA, RNA and protein levels) of one possible evolutionary tegy for protein binders. Target gene (gene T) fused to a DNA binding domain (DBD) coding on is transcribed and translated. The protein fusion T-DBD recognizes a specific motif on the A. The ligand gene to be evolved (gene L) can be transcribed from a fusion with a sequence should allow reverse transcription, here named RTtag. Low-fidelity conversion of the RNA DNA generates gene variants (module 1) that replaces (module 2) the original copy of gene Gene L (or its variants) fused to transcription subunits or transcription activator (TrSu) are ressed and if one of them interacts with the target gene in a stable enough way it triggers odule 3) the expression of interaction signals (for instance but not limited to: inescent/fluorescent proteins, enzymes, auxotrophic markers, antibiotic resistance markers, as well as signals to arrest modules 1 and 2 (for instance but not limited to: restriction enzymes, ombinases, transposases, repressors, etc). DNA is represented by double lines, RNA by single s, protein domains by distinct geometric forms. ure 3: Scheme of the genetic system designed to demonstrate the feasibility of coupling erse transcription (RT) with homologous recombination (HR). The reverse transcription enzyme (RT) and the recombination factor (λ Bet) are expressed m one plasmid (up, left; VN575). KanOn RNA precursor containing an intron is transcribed m the same plasmid (bottom, left) and spontaneously gives rise to the self-spliced KanOn RNA. later RNA form is recognized by an intracellular oligonucleotide (RT primer) and the ridized oligonucleotides are used by RT enzyme to synthesize KanOn cDNA which, in turn, ociates with λ Bet protein to patch the internal stop codon region of KanOff gene in the other smid (up, right; VN591) by homologous recombination. Thus, the initial KanOff gene is verted to a functional version (KanOn gene), the cells become resistant to kanamycin and can be conveniently isolated and sequenced. DNA is represented by double lines, RNA by pointed
le lines, RT primer oligonucleotide by a gray pointed line and cDNA by a full line. Stop codons indicated by “Stop” symbols. Transcription promoters are represented by arrows to the right transcription terminator as “T”. Plasmid harboring the KanOff gene (VN591), a non-functional kanamycin resistance gene erated by the introduction of a stop codon at the 5' coding region between td exon bases. Plasmid containing an RT enzyme, λ bet protein and KanOn gene with td intron insertion N575). The constitutive expression of tetR allows the regulation of expression from pLtetO moter and, consequently, the intracellular amount of the bicistronic RNA that codes for RT and et. ure 4: Generalization of the improved RT module by co-localization of the RNA responding to the evolving gene, RNA primer and reverse transcriptase enzyme. RNA corresponding to the gene to be evolved (gene L) is transcribed in fusion with an RTtag gion complementary to the RT primer) followed by a region that interacts with the scaffold (in me embodiments SPBM1 being Hfq proximal surface binding module). Protein corresponding to an RNA binding domain or peptide (RBD) fused to a reverse scriptase enzyme (RT) via linker peptide (line). The RBD is used to tether RT enzyme to one he annealing RNAs (in this embodiment, the RT primer). The transcribed primer RNA consists in a fusion of an RNA sequence motif that is recognized the RBD (RBM, RNA Binding module), a region that recognizes the scaffold (in some bodiments SPBM2, Hfq distal surface binding module), a region that is the reverse complement he RTtag (RT primer) and a region that will be released (tRNA in some embodiments) after avage by an RNAse (RNAse P in some embodiments). All molecular elements required for reverse transcription (A, B and processed C) are recruited the scaffold surface, thus, increasing the likelihood of RNA-dependent DNA polymerization DP). ure 5: Embodiment concerning an improved RT + RH system. The system designed to monstrate the coupling between RT and RH modules (Figure 3) was adapted to the improved erse transcription (Figure 4). (A) Main modifications include the removal of intron sequence he KanOn gene, and the design of fusions of KanOn and RT primer to allow recruitment on scaffold protein in order to improve the likelihood of reverse transcription. Same abbreviations are used.
Detail of the modified plasmid region compared to the system described in Figure 3C (plasmid 575). D: RNA binding domain; HPBM: Hfq proximal surface binding module - corresponds to the BM2 in the implementation; RBM: RNA binding module recognized by RNA Binding domain BD); HDBM: Hfq distal surface binding module - corresponds to the SPBM2 in the lementation. ure 6: Benchmark of different B2H systems tested over a range of affinities from 3 to usands of nanomolars. The enhanced B2H system (eB2H, module 3) performs better regarding the direct correlation ween affinities and fluorescence signals and the signal/noise ratios. Mean fluorescence nsities (MFI) of peptides with varying affinities (8000, 560, 84 and 3 nM) were evaluated using -hybrid responsive promoters previously described by Ann Hochchild (dotted line

ma Ranganathan (dashed line
) and, finally, by this work (2 plasmids direct system: - - - VN550 + VN515 to VN520; 1 plasmid direct system:
VN750 to VN754 and; 2 plasmids erse or inverse system:

VN572 + VN577 to VN581). Annotated sequence of the enhanced two-hybrid responsive promoter. OL2-62: lambda phage binding site; -35 and -10 boxes for Escherichia coli RNA polymerase sigma factor binding; S: ribosome binding site; eGFP: first ATG codon of eGFP is indicated. The predicted scription start site is indicated. ure 7. Dispersion of enrichment values of silent mutations coding for the wild type protein. ichment values were calculated as the ratio of the frequency of a variant after selection by the quency of the same variant before selection. The data was collected for the interaction between 1B variants and IP3. (A) Former version of the B2H corresponding to VN1197 tested in Acella. Current version of the enhanced B2H, corresponding to VN1296 tested in SB33. ure 8: Tunable switch for continuous evolution arrest (module 4) when a strong enough der variant is produced. Schematic representation of the B2H responsive cassette constructed in vector VN419. The moter that triggers the transcription following complex formation (B2H promoter) can be ulated using a repressor protein that can be released from its recognized DNA element (in some bodiments, tetO) using a range of inducer molecule concentration, thereby, tuning the ression of downstream genes and allowing the selection of stronger binders by applying weaker inducer concentrations. If the downstream genes expression exceed a given threshold, the arrest
e (Bbx1) activity will be sufficient to irreversibly block reverse transcription (Figure 2, module nd homologous recombination (Figure 2, module 2). Consequently, the continuous evolution cess stops and a stable binder variant can be identified and characterized for each cell (Figure . The genes related to reverse transcription (module 1) and homologous recombination (module can be flanked with DNA sequences (Bxb1 attB and Bxb1 attP) that are recognized by the lution arrest protein (Bxb1 resolvase ∕ DNA invertase) and consequently their expression can drastically affected by the latter. In the plasmid VN376, for instance, a bicistronic cassette resenting RT gene and l bet gene (Bet) are transcribed from a promoter (Bba_J23105 promoter). wnstream, a reporter/marker gene can be coded in the reverse complementary strand (KanR) is not expressed because it has no associated promoter. If a strong enough binder is produced, the sense of the genes is inverted (in other words, the A fragments between Bxb1_attB and attP sites is inverted) therefore, evolution is stopped and corresponding cells can be identified and isolated (for instance, in the presence of kanamycin). ure 9: Whole autonomous evolution system implemented in two plasmids. Zoom in on the ligand hybrid gene comprised in VN1238 plasmid. The gene expression is trolled by a pLPPlacUV5 promoter and a lacO operator (IPTG induced) and codes for a hybrid tein (rpoa-Shble*-SpyTag_D7A) that should be truncated at the N-terminus of Shble domain ocin resistance) because of the presence of a stop codon and a frame shift (Shble*). Only if the p codon is reverted and the frame shift corrected as expected by the coupling between RT and modules the full hybrid construction is expressed (rpoA-Shble-SpyTag_D7A), therefore, the become zeocin resistant and fluorescent. Diversity generation plasmid (VN1228) scheme. The plasmid contains the genetic elements uired for generation of diversity including: 1) The gene comprising RT and HR modules. This e is, respectively, composed by: i) a transcription promoter (pLtetO*) harboring operator ons (TetO) that are recognized by a repressor protein; ii) attB recognition site for an integrase b1); iii) An open reading frame (ORF) coding for an error-prone reverse transcriptase enzyme 1) which N-terminus is fused to an RNA binding domain (RBD, in this implementation responds to residues 1-22 of lambda, N-peptide); iv) a ribosome binding site (RBS) that allow expression of the downstream ORF; v) An ORF that codes for a single-stranded DNA ealing protein (SSAP, lambda bet), vi) a transcription terminator (spy_term); 2) an antibiotic resistance gene (aaDA, streptomycin/spectinomycin resistance) coded in the complementary DNA
nd; 3) attP recognition site for an integrase (Bxb1) in the complementary strand; 4) a scription terminator in the complementary strand (L3S2P56_term), 5) a transcription promoter 3119tetO) harboring operator regions (TetO) that are recognized by a repressor protein (TetR); he region of the evolving gene that should be diversified which contains in its 3’ region an ag_AS (i.e., the reverse complement of an RTtag_S) in order to allow targeted reverse scription; 7) a transcription terminator that function as Hfq proximal surface binding module PBM, SgrS_term – the SPBM1 in this implementation) followed by a spacer and a strong scription terminator (L3S2P21_term); 8) a transcription promoter (proK_promoter) harboring rator regions (TetO) that are recognized by a repressor protein. The promoter should allow the scription of an RNA, respectively, composed of by an RNA binding module (RBM) ognized by RBD ((nutL_box-B)x2), an Hfq distal surface binding module (HDBM, (AAC)x6, e SPBM2 in this implementation), an RTtag_S region, a pre-tRNA (proK tRNA, including its der sequence in 5’) and a transcription terminator (proK_term); 9) a replication origin (PBR322, ) and; 10) a bicistronic gene corresponding to an antibiotic resistance gene (AmpR) for ction of transformed cells and a repressor (TetR). The recognition of operator sequences tO) on DNA by the repressor (TetR) can be antagonized by an inducer (anhydrotetracycline, ), therefore, releasing the transcription from the repressed promoters. enhanced Bacterial two-hybrid (eB2H) scheme (VN1238). The plasmid contains the elements uired for sensing protein-protein interactions inside cells and to arrest the generation of ersity, that is encoded in the first plasmid (VN1228, Figure 9B), including: 1) an antibiotic stance gene (CmR, chloramphenicol) for selection of cells transformed by the plasmid; 2) a e coded in the complementary DNA strand including a promoter (lacUV5), an operator (lacO) ognized by a repressor (lacI) and the ORF coding for a hybrid protein (cI-SpyCatcher) responding to a DNA binding domain (DBD, cI) and an interaction partner (SpyCatcher); 3) a minator (bi-directional terminator, Bba_B1007); 4) a gene which expression correlates to the -hybrid proteins interaction comprising a promoter (B2H_prom), a multicistronic region taining ORFs for reporters, markers and system arrest (fluorescent reporter, eGFP; an antibiotic stance marker, KanR; and a DNA invertase enzyme, BxB1) and a terminator (bi-directional minator, Bba_B0014); 5) a replication origin (p15A) and; 6) a gene comprising a promoter placUV5), an operator sequence (lacO), an ORF coding for a second hybrid protein (rpoA- le*-SpyTag_D7A) corresponding to a bacterial RNA polymerase subunit (rpoA) and an raction partner (Shble*-SpyTag_D7A) and, a terminator (L3S2P21_term). gure 10: Observed frequencies of the expected phenotype for different genetic edition implementations. Different implementation are numbered. (1) Coupling test between RT and HR
ules, exclusively, corresponding to the “naive” implementation (without co-localization; smids VN575 and VN591). (2) Coupling test between RT and HR modules, exclusively, responding to the implementation of the co-localization approach (plasmids VN591 and 669). (3) Coupling of RT, HR and eB2H modules, exclusively, corresponding to the lementation of co-localization approach and to the selection of edited/fluorescent cells in the sence of zeocin (plasmids VN1228 and VN1237). (4) Same system described in “3” but acing the bicistronic expression of λ-Bet protein by rhlB (domain that can improve RNA half- by inhibiting RNAse E, plasmids VN1229 and VN1237). (5) Same system described in “3” replacing the bicistronic expression of λ-Bet protein by Dam (DNA methylase that can improve mologous recombination, plasmids VN1230 and VN1237). (6) Coupling of all modules (RT, , eB2H and Stop) corresponding to the implementation with co-localization approach, selection dited/fluorescent cells in the presence of zeocin and system arrest by DNA inversion (plasmids 1228 and VN1238). (7) Same system described in “6” but replacing the bicistronic expression λ-Bet protein by rhlB (VN1229 and VN1238). (8) Same system described in “6” but replacing bicistronic expression of λ-Bet protein by Dam (plasmids VN1230 and VN1238). (9) Same tem described in “3” but the frequency of edited cells was estimated by ratio of the number of orescent colonies and non-fluorescent colonies in the absence of zeocin. TAILED DESCRIPTION OF THE INVENTION present invention relates to methods for generating diversity in a selected gene (gene L) in a terial cell, preferably based on an innovative strategy of co-localization. strategy of co-localization implies the assembly of a molecular complex in a bacterial cell in er to promote an editing process directed to the gene L. The gene editing process implemented the methods of the invention is based on the inherent error-rate of any reverse transcriptase T), that is responsible for the generation of altered complementary DNA (cDNA) copies from a plate RNA comprising the sequence of the gene L. A molecular complex (RTC) may be uired for carrying out some methods of the invention and corresponds to the assembly on a ffold protein (SP), of an RT-containing fusion protein (RBD-RT), a template RNA (tpRNA) mprising the sequence of the gene L and a tag sequence complementary of the primer RNA, and imer RNA (prRNA) suitable for initiating retro-transcription. According to a preferred aspect he invention, the RTC assembled on an SP advantageously promotes the reverse transcription he gene L, thereby enhancing the rate of gene L editing. In particular, the co-localization tegy over an SP developed by the inventors increases the half-life of the involved RNAs, also promotes the double-stranded RNA annealing between the prRNA and tpRNA (i.e., between the
sequence of tpRNA and the primer sequence required for initiating retro-transcription), and her increases the local concentration of the three partners required for the reverse transcription BD-RT, tpRNA and prRNA), which therefore improves the efficiency of cDNA synthesis. methods of the invention are particularly useful for evolution purposes in bacteria, and ecially, can be used to increase the frequency of occurrence of phenotypes of interest. For ance, the molecular system of the invention can be used for ligand screening or metabolic ineering strategies. a first aspect, the invention provides a method for generating diversity in a gene L, using a terial cell as a host organism. In a second aspect of the invention, the method is supplemented the addition of optional effectors that enhance the editing process directed to the gene L. In a d aspect of the invention, the method is adapted and complemented for the specific purpose of nd screening. In a fourth aspect of the invention, the method adapted for ligand screening is roved to trigger the termination of the gene L editing process when an effective ligand is erated by the method. Further, an additional aspect of the invention relates to DNA vectors mprising all the exogeneous genetic elements required for the implementation of the methods he invention in a bacterial cell. first aspect of the disclosure, a first module is provided for generating diversity in a gene of rest. In this aspect, the method comprises a step of providing a bacterial cell which comprises RT protein, a template RNA including a priming sequence and a sequence encoding the gene nterest, and a primer initiating the reverse transcription of the gene of interest by the RT upon annealing of the priming sequence with the primer. In a specific aspect, the method comprises ep of providing a bacterial cell which comprises the four interacting partners of the RTC, i. e., RBD-RT fusion protein, a tpRNA, a prRNA and an SP. Accordingly, one of the simplest hod of the invention only requires the implementation of the RTC. In addition, as the function he assembled molecular complex is to synthesize cDNA copies from the tpRNA, the methods he invention necessarily comprise a second step consisting in placing the bacterial cell in ironmental conditions allowing an efficient reverse transcription. These conditions may then y according to the bacterial species and strain in which the method is applied. Classically, these ditions may correspond to the optimal growth conditions that are known from the person skilled he art and defined by several environmental factors, such as temperature, nutrients type and els, aerobic or non-aerobic conditions. ionally, the first module for generating diversity can be supplemented by other modular elements expressed by the bacterial cell. In a second aspect of the disclosure, a second module is
vided aiming to stably implement mutated cDNA into replicating DNA molecules by the ression of homologous recombination (HR) factors. Functional improvement of the first dule can be obtained by protecting the oligonucleotides involved (template RNA and primer A, especially tpRNA and prRNA) or generated (cDNA copies) from intracellular degradation, eby improving cDNA synthesis or stability. These optional elements may be called servative effectors. For instance, the bacterial cell homeostasis can be modified in order to rease RNA and/or DNA degradation and the cDNA can be stably implemented into the genome a plasmid. This stable implementation by the second module can be further improved by airing the methyl directed mismatch repair (MMR) system function. third aspect of the disclosure, a third module is provided allowing to select a modified ligand a target molecule. This third aspect of the invention provides methods that are specifically pted for ligand screening purposes. Such methods imply that the gene L to be edited encodes a potential ligand. In a first aspect, a potential ligand corresponds to a peptide or a protein that st be mutated in order to be converted in an effective ligand capable of binding to a target ecule. In a second alternative aspect, a potential ligand corresponds to a peptide or a protein must be modified in order to be converted in an ineffective ligand with impaired binding to a et molecule. The methods for ligand screening according to the third aspect of the invention uires that the bacterial cell further comprises a bacterial double hybrid system (B2H) that resses both the target molecule and a potential ligand. Alternatively, protein fragmentmplementation (PCA) can also be used instead of B2H, for instance DHFR complementation or P fluorescence complementation). Importantly, the B2H module must be functionally coupled n HR factor so as to allow the integration of neosynthesized cDNA copies of the gene L in aH expression cassette that comprises a copy of the gene L. The additional B2H module then ws to detect binding occurrences between an effective ligand and a given target molecule, via expression of a reporter into the bacterial cell. According to the design of the B2H elements, detection of binding occurrence is detected by the reporter signal. fourth aspect of the present disclosure, a fourth module is provided to functionally impair the function once an effective ligand has been generated from altered copies of gene L, therebyulting in the arrest of cDNA synthesis from tpRNA. Therefore, the bacterial cell may furthermprise a diversity generation arrest (DGA) module functionally coupled to the B2H system dule. According to the design of the DGA module, the HR sequence can also be targeted,ulting in the additional impairment of the HR function.
additional aspect of the invention relates to DNA vectors that encompass all the exogenous etic elements required to the implementation of the methods of the invention or to bacterial s comprising these DNA vectors. initions used herein, a “retro-transcription complex” (RTC) refers to a functional molecular complex mprising a tpRNA, a prRNA, an RBD-RT and an SP, the assembled complex being capable of forming the retro-transcription of the gene L sequence included in the tpRNA. used herein, a “template RNA” (tpRNA) refers to an oligoribonucleotide capable of binding to pecific domain of an SP and comprising from 5’ to 3’: a selected gene or gene of interest (gene an RTtag sequence operably linked to the gene L coding sequence, the RTtag being stantially complementary to the primer required for initiating the retro-transcription Tprimer) of the gene L by the RT; and optionally a SPBM1 sequence capable of binding to a cific domain of an SP. According to the disclosure, the template RNA is a transcript of an geneous DNA sequence introduced in the bacterial cell. The role of the template RNA in the ecular system is to provide a transcript of the gene L to be retro-transcribed into cDNA copies he reverse-transcriptase (e.g., RBD-RT). “selected gene” or “gene of interest” (gene L) of the tpRNA refers to a sequence of any protein nucleic acid of interest that should be submitted to the targeted molecular evolution approach he invention. According to a particular aspect of the disclosure, the gene L codes for a potential nd whose sequence must be edited by the method of the invention in order to modulate rease or decrease) its binding to a target molecule. In alternative embodiments, the gene L es for an enzyme directly or indirectly related to the generation of a molecule of interest. “RTtag” of the tpRNA refers to an oligoribonucleotide sequence corresponding to the stantially complementary sequence of another oligoribonucleotide that functions as a primer reverse transcription (RTprimer). According to the disclosure, the RTtag constitutes the stantially complementary sequence of the RTprimer sequence, thereby allowing a partial ble stranded annealing between the prRNA and the tpRNA, more specifically between the primer of the prRNA and the RTtag of the tpRNA, hence enabling the reverse transcription of gene L by a reverse-transcriptase. “Scaffold Protein Binding Module 1” (SPBM1) of the tpRNA refers to an oligoribonucleotide uence capable of binding to the SP at a specific site (SPS1). In a preferred aspect, the SPBM1 has a secondary structure portion that allows a specific binding to the SP.
used herein, a “primer RNA” (prRNA) refers to an oligoribonucleotide comprising anprimer sequence positioned at the 3’ end, and optionally a SPBM2 sequence capable of binding specific domain of an SP and an RT binding module (RBM) sequence capable of binding to RBD fused to a reverse-transcriptase RT (RBD-RT). ”RTprimer” of the prRNA refers to an oligoribonucleotide sequence that functions as ancient primer for the RT, in particular in the context of the RBD-RT fusion protein, thus allowing initiation of the reverse transcription of the gene L of the tpRNA. According to the disclosure, RTprimer constitutes the sequence that is substantially complementary to the RTtag sequence,reby allowing a partial double stranded annealing between the prRNA and the tpRNA, morecifically between the RTprimer of the prRNA and the RTtag of the tpRNA, capable of enabling reverse transcription of the gene L by a reverse-transcriptase. “Scaffold Protein Binding Module 2” (SPBM2) of the prRNA refers to an oligoribonucleotideuence capable of binding to the SP at a specific site (SPS2). In a preferred aspect, the SPBM2 a secondary structure portion that allows a specific binding to the scaffold protein SP.portantly, the SPBM2 of the prRNA sequence is sufficiently distinct from the SPBM1 of the NA as to avoid a binding competition to the same SP binding site, i.e. SPS1 or SPS2. “RT binding module” (RBM) of the prRNA refers to an oligoribonucleotide sequence capable inding to the RBM binding domain (RBD) of the RBD-RT fusion protein. In a preferred aspect, RBM has a secondary structure portion that is involved in the binding to the RBD of the RBD- fusion. This sequence thus allows the prRNA to recruit the RBD-RT in the context of module used herein, a “RT-containing fusion protein” (RBD-RT) refers to a fusion protein comprising RT domain fused to an RBD capable of binding to the prRNA and responsible for the uitment of the RT fusion protein by the RBM of the prRNA. The RBD of the RBD-RT refers omain capable of binding to the RBM of the prRNA. reverse transcriptase domain (RT), optionally of the RBD-RT, refers to an error-prone RT, an enzyme capable of generating altered copies of cDNA from an RNA template. Accordingly, role of the RT used in the methods of the disclosure is to generate altered cDNA copies from gene L sequence of the tpRNA. Besides, as the error rate of any RT is theoretically > 0, itows that any RT is an error-prone RT and is therefore compatible with the methods of theclosure. The RT can be a natural or engineered RT.
used herein, a “scaffold protein” (SP) refers to a protein expressed by the bacterial cell and able of binding both to the SPBM1 of the tpRNA via a first specific binding site (SPS1) and to SPBM2 of the prRNA via a second binding site (SPS2). In some aspects, the SP is an ogenous protein constitutively expressed by the bacterial cell. In alternative embodiments, the is an exogenous or modified protein expressed by the bacterial cell. used herein, a “preservative effector” refers to a protein or peptide that is expressed by the terial cell and allows to protect the oligonucleotides from intracellular degradation, in particular oligoribonucleotides tpRNA and prRNA or the oligodeoxyribonucleotides generated (cDNA ies) by the RT. used herein, a single-strand annealing protein (SSAP) intended for “homologous ombination” (HR) refers to a protein capable of exchanging identical or similar DNA sequences m distinct DNA strands. Accordingly, the role of the HR used in the methods of the disclosure o integrate altered cDNA copies of gene L into DNA vector comprising a copy of the gene L. used herein, “MMR” refers to the Methyl Directed Mismatch Repair system. MMR is a highly served molecular mechanism that plays an essential role in bacteria by identifying and airing the DNA mismatch. Classically, mismatch repair occurs on the non-methylated strand hemi-methylated DNA, which is newly synthesized DNA strand. MMR consists of three ortant protein components: MutS, MutL, and MutH. MutS is responsible for the recognition he mismatched base pairs that initiates the mismatch repair; MutL recognizes MutS-DNA eroduplex complex and the assembly of the MutS-MutL-DNA heteroduplex ternary complex n activates MutH; MutH is responsible for an incision of the neosynthesized unmethylated nd at a hemi-methylated DNA site. According to the methods of the disclosure, MMR system mpaired by certain preservative effectors in order to prevent neosynthetized cDNA strands of gene L from being removed by the system. used herein, the “DNA methylase” (Dam) refers to an enzyme capable of adding methyl groups neosynthesized DNA. According to the methods of the disclosure, Dam can be expressed or rexpressed in the bacterial cell in order to prevent neosynthesized copies of gene L from being eted by the MMR system. used herein, a “ribonuclease” (RNAse) refers to an enzyme that catalyzes the degradation of A strands, such as the RNAse E, the RNAse R or the polynucleotide phosphorylase (PnPase). acteria such as Escherichia coli, RNAses are involved in the fast turnover of RNAs that reduces the probability of retro-transcription complex formation, and thus reduce the retro-transcription
ciency of the first module in the context of the disclosure. According to the methods of theclosure, an RNAse can be mutated in order to impair its degradation function, thereby easing the RNA stability in the bacterial cell. used herein, a “single-strand DNA exonuclease” (ssDNA exonuclease) refers to an enzyme able of fragmenting ssDNA strands in the bacterial cell by cleaving nucleotides at the 5’ or 3’ of the ssDNA strand. For instance, xonA, xseA, exoX and recJ are known ssDNA nucleases. According to the methods of the disclosure, an ssDNA exonuclease can be mutated nvalidated in order to increase the stability of neosynthetized cDNA copies of the gene L. used herein, a “bacterial two hybrid” (B2H) system refers to a molecular system designed toect protein-protein interactions between a ligand (L) and a target molecule (T). The B2H system resses two fusion proteins, a fusion protein being a potential ligand (FPL) and a fusion protein ng as a receptor (FPR) for the FPL. The B2H system further comprises a DNA sequence, or ression cassette, comprising a reporter gene sequence and a ribosome binding site (RBS), both rably linked to a specific promoter (P). The interest of such a B2H system is to trigger the ression of a reporter protein only when the binding between FPR and FPL occurs. “fusion protein Ligand” (FPL) of the B2H system refers to a protein expressed in the bacterial that comprises a ligand domain (L), either fused to transcription subunits (e.g., TrSu) capable ecruiting an RNA polymerase or to a DNA binding domain (DBD) capable of binding to a cific DNA site, the other partner, i.e., transcription subunits or DBD, not fused to the ligandmain (L), being fused to a target molecule (T) capable of binding to the ligand (L) domain of FPL when the L domain correspond to an effective ligand. The L domain of FPL is derivedm the expression of a copy of the gene L. The gene L can be both mutated by the RT and grated into the DNA vector coding the FPL of the B2H system via an HR. As a result, the genehat encodes the L domain of FPL corresponds to the original version of the gene L or to adified version of the gene L. Since the L domain of FPL either corresponds to an effective nd or an ineffective ligand, the L domain of FPL is considered as a potential ligand. “fusion protein Receptor” (FPR) of the B2H system refers to a protein expressed in the terial cell that comprises a target molecule (T) capable of binding to the ligand (L) domain of FPL when the L domain correspond to an effective ligand and either a DBD capable of binding a specific DNA site or transcription subunits (e.g., TrSu) capable of recruiting an RNAymerase.
DBD allows the FPR or FPL to bind to a specific DNA site positioned at proximity of the moter P, so as to promote the recruitment of an RNA polymerase nearby the promoter P when nding between FPR and FPL occurs, thus allowing the expression of a reporter gene. used herein, an “effective ligand” refers to an L domain of FPL capable of binding to the target ecule of FPR, and reciprocally an “ineffective ligand” refers to an L domain that cannot bind he target molecule. In addition, an “improved ligand” refers to an effective ligand whose ding affinity to the target molecule has been improved compared to those of the original ligand ressed from the original gene L. In contrast, an “debased ligand” refers to an effective ligand ose binding affinity to the target molecule has been decreased compared to those of the original nd expressed from the original gene L. used herein, a “DNA invertase” refers to an enzyme capable of catalysing the inversion of a A segment that is flanked by a pair of DNA invertase sites. In a DNA strand, such an inversion ults in the replacement of the 5’ end of the targeted sequence by its 3’ complementary end, and e versa. Accordingly, the role of the DNA invertase used in some methods of the disclosure is arget and invert specific DNA sequences that are flanked by invertase sites. Then, once erted, the targeted sequence is no longer transcribed as the original DNA sequence but as a mpletely different sequence. As a result, in case the original DNA sequence codes for a protein, n the inversion by a DNA invertase prevents the expression of this protein. term “gene” designates any nucleic acid encoding a protein. The term gene encompasses A, such as cDNA or gDNA, as well as RNA. The gene may be first prepared by e.g., ombinant, enzymatic and/or chemical techniques, and subsequently replicated in a host cell or n vitro system. The gene typically comprises an open reading frame (ORF) encoding a desired tein but could also be reduced to a fragment thereof. The gene may contain additional sequences h as a transcription terminator or a signal peptide. term " vector" includes plasmids, cosmids or phages. Preferred vectors are those capable of onomous replication. In the present specification, "plasmid" and "vector" are used rchangeably, as the plasmid is the most commonly used form of vector. In general, vectors mprise an origin of replication, a multicloning site and a selectable marker. ucleic acid is said to be "operably linked" when it is placed into a functional relationship with ther nucleic acid sequence. The term "operably linked" means a configuration in which a trol sequence is placed at an appropriate position relative to a coding sequence, in such a way that the control sequence directs expression of the coding sequence. In particular, for the purposes
he present invention, a promoter or enhancer is operably linked to a coding sequence if it drives transcription of the sequence. Generally, "operably linked" means that the DNA sequences ng linked are contiguous. used herein, an “expression cassette” refers to a construct, whether integrated into a host ome or present on an extra-chromosomal element, which has sufficient elements to permit the ression of the RNA and its translation in a protein when in the proper cell type or under uctive conditions. More particularly, the expression cassette may comprise a promoter (P) able of recruiting a partner, such as RNA polymerase, that initiates the transcription of the 5’ wnstream DNA sequence; an operably linked RBS capable of recruiting ribosomes allowing the slation of the 3’ downstream RNA sequence of the transcribed RNA; an operably linked DNA uence of interest to be transcribed and translated; and a terminator sequence that causes the st of the transcription. According to the disclosure, when a first coding sequence of interest of expression cassette, e.g., the gene L, is operably linked to the second coding sequence of rest (e.g., TrSu), a protein fusion can be expressed. used herein, a “transcription cassette” refers to a construct, whether integrated into a host ome or present on an extra-chromosomal element, which has sufficient elements to permit the ression of the RNA when in the proper cell type or under inductive conditions. More icularly, the expression cassette may comprise a promoter (P) capable of recruiting a partner, h as RNA polymerase, that initiates the transcription of the 5’ downstream DNA sequence; an rably linked DNA sequence of interest to be transcribed; and a terminator sequence that causes arrest of the transcription. term "control sequences" means nucleic acid sequences necessary for expression of a gene. ntrol sequences may be native, homologous or heterologous. Well-known control sequences currently used by the person skilled in the art will be preferred. Such control sequences ude, but are not limited to, a leader, polyadenylation sequence, propeptide sequence, promoter, nal peptide sequence, and transcription terminator. Preferably, the control sequences include a moter and a transcription terminator. “reporter” of the B2H system refers to a protein expressed by the bacterial cell that generates gnal. The signal can be a luminescence or fluorescence signal. Alternatively, the reporter can an enzyme producing a product that generates a signal. According to the classical principle of H systems, the reporter is expressed when an interaction between two partners, i. e. FPR and L in the context of the invention, and the generated signal allows to detect this interaction. For instance, the reporter may be a luminescent or a fluorescent protein such as GFP and its derivatives,
articular the protein eGFP. Alternatively, the signal can also be any antibiotic resistance or any otrophic factor. used herein, a “promoter” (P) refers to a DNA sequence capable of recruiting an RNAymerase in order to initiate the transcription of DNA sequences that are operably linked to saidmoter, which are positioned downstream in the DNA strand. In addition, according to itsuence, a promoter can strongly promote transcription events (strong promoter) or promote themre moderately (moderate or weak promoter). used herein, a “ribosome binding domain” (RBS) refers to an RNA sequence capable of uiting ribosomes thus allowing the translation of the 3’ downstream RNA sequence. In ition, according to its sequence, an RBS can strongly promote translation events (strong RBS) promote them more moderately (moderate or weak RBS). eterologous”, as used herein, is understood to mean that a gene or encoding sequence has been oduced into the cell by genetic engineering. It can be present in episomal or chromosomal form. gene or encoding sequence can originate from a source different from the host cell in which introduced. However, it can also come from the same species as the host cell in which it is oduced but it is considered heterologous due to its environment which is not natural. For mple, the gene or encoding sequence is referred to as heterologous because it is under the trol of a promoter which is not its natural promoter, it is introduced at a location which differsm its natural location. The host cell may contain an endogenous copy of the gene prior to oduction of the heterologous gene or it may not contain an endogenous copy. used herein, the term “complementary” refers to complementarity properties of nucleobases define interactions occurring between specific nucleobases pairs, i.e. between adenine /thymine (T) pairs for DNA, between adenine (A)/uracil (U) pairs for RNA, or between guanine /cytosine (C) pairs for both DNA and RNA molecules. Accordingly, a “complementary ing” refers to the ability of distinct oligonucleotides, or distinct regions of a single onucleotide, to bind each other through a sum of A/T, A/U or G/C pairings. In addition, asd herein the term “substantially complementary” refers to a level of complementarity between oligonucleotide sequences that is enough to ensure a functional interaction. For instance, the leotides are complementary at 70, 75, 80, 85, 90, 95, 99 or 100% when two sequences are stantially complementary. Optionally, 1, 2 or 3 mismatches can be present when two sequences substantially complementary.
term "recombinant bacterium", “recombinant bacterial cell”, “genetically modified terium” or “genetically modified bacterial cell” designates a bacterium that is not found in ure and which contains a modified genome as a result of either a deletion, insertion or dification of genetic elements or which contains a vector or a set of vectors. A "recombinant leic acid" therefore designates a nucleic acid which has been engineered and is not found as h in wild type bacteria. term “about” means more or less 5% of a number. For instance, about 100 means between 95 105. dule 1: Diversity generation first module comprises means for allowing to generate diversity from a gene of interest in a terial cell. “gene” is intended to refer to any nucleic acid of interest, not only nucleic acid of interest oded by a gene. The gene of interest may code for a protein, a nucleic acid (DNA or RNA) or ymes (protein, DNA or RNA based) such as an antisense nucleotide, DNAzyme, ribozyme, A modifying enzymes, RNA modifying enzymes, metabolic enzymes and pathways, RBSs, A binding proteins, RNA binding proteins, RNA motifs recognized by proteins, RNA/RNA raction modules and partners of protein complexes. Roughly, every nucleotide sequence that be transcribed, retrotranscribed and can be used as substrate for HR can potentially be ersified and evolved in DNA, RNA and protein levels. In a particular aspect, the gene of interest odes a binding partner of a complex comprising at least a ligand molecule and a target ecule. Optionally, the gene of interest is intronless. diversity is created by a reverse-transcription by a reverse transcriptase RT of an RNA mprising the gene of interest, leading to the production of error-prone generation of cDNA in a terial cell. Indeed, the RT is responsible for the retro-transcription of the gene L of the tpRNA, eby generating diversity with neosynthesized altered copies of the gene L. This generation of ersity thus allows the emergence of new variants from gene L, i. e. new nucleic acid sequences new protein variants. These new variants may reveal new biological properties including perties of interest. The RT, optionally of the RBD-RT, is a low-fidelity RT and/or an RT with gh initiation rate/processivity. A low-fidelity RT is characterized by a relatively high error rate favors the synthesis of altered cDNA copies from gene L, i.e. an error rate ranging from about to about 10
-4, preferably from about 10
-5 to about 10
-4 error per nucleotides and more preferably an error rate of about 10
-4 error per nucleotides. In addition, a high initiation
processivity RT increases the number of retro-transcriptions performed for a single enzyme. RT can be an engineered RT from any source. a more preferred aspect, the RT is a low fidelity RT from sources such as retroviruses, sposons, retrons or diversity generating elements. RTs are well-known to the person skilled in art and some RTs are disclosed for instance in Jamburuthugoda et al (J Mol Biol. 2011, (5):661-72), Menéndez-Arias et al (Viruses. 2009, 1(3):1137-65) or Kirshenboim et al rology.2007, 366(2):263-76). In even more preferred aspect, the RT is selected in the group sisting in: the RT of the Long Terminal Repeat (LTR) retrotransposon Tf1, the human munodeficiency virus type 1 (HIV-1) RT, the simian immunodeficiency virus (SIV) RT, the ne immunodeficiency virus (FIV) RT, the Moloney murine leukemia virus (MMLV) RT (SEQ NO: 3), the feline leukemia virus (FeLV) RT, the alfalfa mosaic virus (AMV) RT, or the totype foamy virus (PFV) RT. a particular aspect, the RT sequence is the sequence of the Tf1 RT corresponding to SEQ ID : 1. In an alternative particular aspect, the RT sequence is the sequence of the HIV-1 RT responding to SEQ ID NO: 2 and SEQ ID NO: 57. In another alternative particular aspect, the sequence is the sequence of the MMLV RT corresponding to SEQ ID NO: 3. ionally, the RT is fused with a domain binding the prRNA (RBD). The RT can be fused either s N terminal end or at its C terminal end with the binding domain (RBD), optionally through nker. As used herein, the term "linker" refers to a sequence of at least one amino acid that links RT and the RBD. Such a linker may be useful to prevent steric hindrances. The linker is usually 4 amino acid residues in length. Preferably, the linker has 3-30 amino acid residues. In some bodiments, the linker has 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29 or 30 amino acid residues. Example of linker sequences are Gly/Ser linkers different length including (Gly4Ser)4, (Gly4Ser)3, (Gly4Ser)2, Gly4Ser, Gly3Ser, Gly3, 2ser and (Gly3Ser2)3, in particular (Gly4Ser)3. preferred aspect, the prRNA further comprises a transfer RNA (tRNA) sequence contiguously itioned downstream of the RTprimer sequence. The optional tRNA sequence comprises a cific site between the RTprimer and the tRNA that can be cleaved by a RNAse expressed in bacterial cell, thereby producing a well-defined 3’ end of prRNA corresponding to the primer and a tRNA. Using tRNA specific sites that can be cleaved off from prRNA allows to d a free 3'-OH required at the RTprimer for retro-transcription, thereby enhancing the efficacy he module 1. For instance, the specific site of the optional tRNA sequence is cleaved by a Ase P expressed by the bacterial cell. Any tRNA sequence could be implemented here and for instance the tRNA sequence corresponds to SEQ ID NO:4.
rder to improve the generation of diversity, a strategy of co-localization of RT, prRNA and NA forming the RTC has been developed. This strategy is based on the binding of these threements on a scaffold protein SP. Indeed, the co-localization strategy significantly enhances theo-transcription rate and thereby leads to an enhanced frequency of occurrence of new variantsm gene L. For instance, prRNA and tpRNA each comprise a sequence capable of binding the while the RT is fused to a domain capable to bind the prRNA or the tpRNA, preferably the NA. cording to this preferred aspect, the tpRNA and prRNA respectively comprise SPBM1 andBM2 sequence, prRNA further comprises an RBM sequence and the RT is fused with a domainding RBM (RBD) into an RBD-RT fusion protein. mber of pairs of peptide-RNA have been disclosed in the art (Keryer-Bibens et al, 2008, Biol.l., 100, 125-38; Lunde et al, 2007, Nat Rev Mol Cell Biol, 8, 479-90; Fujimori et al, 2012, information, 8, 729-30 ; Cook et al, 2011, Nucleic Acids Res, 39, D301-8; Chao et al, 2008, Struct Mol Biol, 15, 103-5; Delebecque et al, 2012, Nat Protoc, 7, 1797-807; Kappel et al, 9, Proc Natl Acad Sci USA, 116, 8336-8341; Kappel et al, 2019, 27, 140-151, the disclosurereof being incorporated herein by reference; DataBases (rbpdb.ccbr.utoronto.ca and pri.hgc.jp). ed on this knowledge, the person skilled in the art is able to design this co-localizationhering) elements, in particular the SP, SPBM1 and SPBM2 on one side and RBM and RBD on other side. particular aspect, the RBM of the prRNA comprises a secondary structure, preferably a stem- -loop RNA secondary structure, wherein the stem consists in 10 to 20 paired complementary leotides and the loop is composed of 4 to 6 unpaired nucleotides. Also, the stem can comprise unpaired nucleotide that breaks the homogeneity of nucleotides pairing into the stem portion. particular aspect, the sequence of the RBM of the prRNA corresponds to Lambda BoxB fromL (SEQ ID NO:7) and the associated RBD of RBD-RT corresponds to the Lambda phage Ntein sequence (SEQ ID NO:5, SEQ ID NO:6). In an alternative particular aspect, the sequencehe RBM of the prRNA corresponds to a wild type MS2 binding motif (SEQ ID NO:9) or to ah affinity variant of the MS2 binding motif (SEQ ID NO:10) and the associated RBD of RBD- corresponds to the MS2 phage coat protein sequence (SEQ ID NO:8). In another alternativeect, the sequence of the RBM of the prRNA corresponds to the PP7 binding motif (SEQ ID :12) and the associated RBD of RBD-RT corresponds to the PP7 phage coat protein sequence Q ID NO:11).
onally, the RBM may bind to the RBD with a relatively high affinity, i.e. an affinityracterized by a dissociation constant (Kd) lower than 1.10
-7M, preferably between 1.10
-8 and0
-9M. preferred aspect, the SPBM1 and the SPBM2 have at least a secondary structure portion thatnvolved in a specific binding to the SP, respectively to SPS1 and SPS2. ionally, the SPBM1 and/or SPBM2 may bind to the SP with a relatively high affinity, i.e. annity characterized by a dissociation constant (Kd) lower than 1.10
-7M, preferably between 1.10- d 1.10
-9M. RTprimer and RTtag sequences are selected in order to have complementary sequences and e suitable for initiating reverse-transcription by the RT, especially RBD-RT, of the gene L. Inarticular aspect, the sequence of RTprimer corresponds to SEQ ID NO:13 and the sequence of ag corresponds to SEQ ID NO:14. particular aspect, the SP is the Host factor required for replication of the RNA phage Qβ (Hfq)tein or a fragment or variant thereof. Any bacterial Hfq is suitable. Preferably, the Hfq ogenous of the bacterial cell can be used. Alternatively, the Hfq is from another bacteria. In a icular embodiment, the Hfq is from Escherichia coli. According to this particular aspect, theuence of the SP can correspond to SEQ ID NO:15. The Hfq presents an advantageous ternary arrangement that allows multiple binding sites to RNA motifs such as SPBM1 andBM2. In addition, the native Hfq protein comprises binding sites that allow interactions with RNAse E, a relatively well-conserved RNAse in bacteria that is capable of cleaving RNA suchpRNA and prRNA partners. To avoid disadvantageous cleavages and thus favor RNA stability, Hfq may be modified with a C-terminus deletion (HfqΔC-term) in order to hamper itsmbrane localization in proximity to RNAse E. Accordingly, in a more preferred aspect, the SP modified HfqΔC-term and that allows an advantageous reduction of the interactions betweenAse E and the SP. As disclosed in Vecerek et al (Nucleic Acids Research, 2008, 36, 133–143), essential part of Hfq, e.g. from E coli, for the hexamer core is the 65 N terminal residues of thetein. Therefore, the fragment of Hfq preferably comprises fragment corresponding to the dues 7-65 of SEQ ID NO: 15. Several HfqΔC-term variants have been disclosed such as Hfq (with deletion of residues 84-102), and Hfq 65 (with deletion of residues 66-102). According his alternative aspect, the sequence of the modified SP can correspond to SEQ ID NO:16.ernatively, the SP can be modular and can be a fusion protein of different RNA binding protein,h as different phage coat proteins, for instance a fusion protein of MS2 phage coat protein and
phage coat protein. Accordingly, the SPBM1 and SPBM2 could be the MS2 binding motif the PP7 binding motif. specific aspect, the SP is Hfq, a variant or a fragment thereof. In this specific aspect, SPBM1/or SPBM2 can be selected in the group consisting of SEQ ID NOs: 17 or 18. In a very icular aspect, SPBM1 has the sequence of SEQ ID NO: 17 and SPBM2 has the sequence ofQ ID NO: 18. ome aspects, the tpRNA further comprises a linker or spacer domain of variable size that isitioned between the RTtag sequence and the SPBM1 sequence. In other aspects, the prRNAher comprises a linker or spacer domain of variable size that is positioned between theprimer sequence and the SPBM2 sequence, the RTprimer sequence and the RBM sequence/or the SPBM2 sequence and the RBM sequence. In addition, theses domains may adjust the tive positioning of the three partners involved in the reverse transcription, namely tpRNA, NA and RBD-RT, in order to enhance the retro-transcription rate of the module 1. specific aspect, the prRNA comprises from 3’ end to 5’, the RTprimer sequence positioned in nd of the prRNA, the SPBM2 and the RBM. Alternatively, the prRNA may comprise from 3’ to 5’, the RTprimer sequence positioned in 3’ end of the prRNA, the RBM and the SPBM2. ing the design of the prRNA and tpRNA, the RNA secondary structure can be checked, forance by available software allowing to predict the RNA secondary structure, in order to avoidurbing the secondary structures, in particular of SPBM1, SPBM2 or RBM. very specific aspect, the SP is a Hfq protein, in particular the Hfq of SEQ ID NO: 15, a variant fragment thereof; the tpRNA comprises from 5’ to 3’: the gene L or an insertion site suitable introducing the gene L, an RTtag sequence, preferably of SEQ ID NO: 14, operably linked to gene L and the SPBM1 of SEQ ID NO: 17; the prRNA comprises from 3’ to 5’: an RTprimeruence positioned in 3’ end of the prRNA, preferably of SEQ ID NO: 13, the SPBM2 of SEQ NO: 18 and the RBM of SEQ ID NO: 7; and the RBD-RT comprises an RT, especially TF1 RT ., of SEQ ID NO: 1), MMLV RT (SEQ ID NO: 3) or HIV-1 RT (e.g., of SEQ ID NO: 2 or 57),ed to an RBD of SEQ ID NO: 5. If the RT is from HIV, One of the subunit is fused to the RBD the other subunit is co-expressed. In a particular aspect, the fused subunit is p66 (SEQ ID NO: In another particular aspect, the fused subunit is p51 (SEQ ID NO: 57). present invention relates to a bacterial cell comprising SP, tpRNA, prRNA and RBD-RT asailed above in any aspect and the use thereof for generating diversity in a gene of interest. present invention relates to a method for generating diversity in a gene L, comprising: - providing a bacterial cell comprising a molecular complex formed by the association of:
- a tpRNA comprising from 5’ to 3’: the gene L, an RTtag sequence operably linked to the gene L; - a prRNA comprising: an RTprimer sequence positioned in 3’ end of the prRNA, - a reverse transcriptase (RT), especially TF1 RT (e.g., of SEQ ID NO: 1), MMLV RT (e.g., SEQ ID NO: 3) or HIV-1 RT (e.g., of SEQ ID NO: 2 and 57); and - placing the bacterial cell in conditions that allow the reverse transcription of the gene L, thereby generating altered copies of said gene L of the tpRNA. ferably, the present invention relates to a method for generating diversity in a gene L,mprising: - providing a bacterial cell comprising a molecular complex formed by the association of: - an SP, preferably Hfq or a variant or fragment thereof, optionally a Hfq of Escherichia coli such as the Hfq of SEQ ID NO: 15; - a tpRNA comprising from 5’ to 3’: the gene L, an RTtag sequence, preferably an RTtag of SEQ ID NO: 14, operably linked to the gene L and a SPBM1 sequence capable of binding to the SP, preferably a SPBM1 of SEQ ID NO: 17; - a prRNA comprising: an RTprimer sequence positioned in 3’ end of the prRNA and capable of complementary pairing to the RTtag sequence, preferably an RTprimer of SEQ ID NO: 13, a SPBM2 sequence capable of binding to the SP, preferably the SPBM2 of SEQ ID NO: 18, and an RBM, preferably the RBM of SEQ ID NO:7, - a fusion protein (RBD-RT) comprising a reverse transcriptase (RT) and an RBD capable of binding to the RBM of the prRNA, preferably an RT, especially TF1 RT (e.g., of SEQ ID NO: 1), MMLV RT (e.g., SEQ ID NO: 3) or HIV-1 RT (e.g., of SEQ ID NO: 2 or 57), fused to an RBD of SEQ ID NO: 5; and - placing the bacterial cell in conditions that allow the reverse transcription of the gene L, thereby generating altered copies of said gene L of the tpRNA. present invention further relates to a vector or set of vectors, said vector or set of vectorsmprising: - a transcription cassette (tC1) comprising a sequence encoding a pre-tpRNA operably linked to a promoter (P1), said pre-tpRNA comprising from 5’ to 3’: an insertion site suitable for the insertion of a gene L, an RTtag sequence, preferably an RTtag of SEQ ID NO: 14, operably linked to the gene L to be inserted and a SPBM1 sequence, preferably a SPBM1 of SEQ ID NO: 17, wherein said tC1 is suitable for allowing, in the bacterial cell, the transcription of a tpRNA including an inserted gene L, wherein the SPBM1 is capable of binding to an SP present in the bacterial cell at a first specific binding site (SPS1);
- a transcription cassette (tC2) comprising a sequence encoding a prRNA operably linked to a promoter (P2), said prRNA comprising: an RBM sequence positioned in 5’ end, preferably the RBM of SEQ ID NO: 7, an SPBM2 sequence, preferably the SPBM2 of SEQ ID NO: 18, and an RTprimer, preferably an RTprimer of SEQ ID NO: 13, wherein said tC2 is suitable for allowing, in the bacterial cell, the transcription of a prRNA, wherein the RTprimer is capable of complementary pairing to the RTtag, the SPBM2 is capable of binding to the SP, the sequence encoding the prRNA optionally further comprising a sequence encoding a tRNA sequence contiguously positioned downstream of the RTprimer sequence, a site cleavable by an RNAse of the bacterial cell is present between said tRNA sequence and said RTprimer, thereby allowing the production of a well-defined 3’ prRNA end; - an expression cassette (eC1) comprising a sequence encoding an RBD-RT fusion protein operably linked to a promoter (P3), said RBD-RT comprising a reverse transcriptase (RT) sequence, especially TF1 RT (e.g., of SEQ ID NO: 1), MMLV RT (e.g., SEQ ID NO: 3) or HIV-1 RT (e.g., of SEQ ID NO: 2), and an RBD sequence, preferably an RBD of SEQ ID NO: 5, wherein said eC1 is suitable for allowing, in the bacterial cell, the expression of the RBD-RT fusion protein, wherein the RBD is capable of binding to the RBM of prRNA, and - optionally, an expression cassette (eC2) comprising a sequence encoding the SP operably linked to a promoter (P4), preferably said SP being the Hfq protein, preferably the Hfq of SEQ ID NO: 15, wherein eC2 is suitable for allowing, in the bacterial cell, the expression of the SP, preferably the Hfq protein. present invention also relates to a vector or set of vectors comprising the elements as definedow and a bacterial cell comprising this vector or set of vectors or comprising the elements as ned below, the elements being: - a transcription cassette (tC1) comprising a sequence encoding a tpRNA operably linked to a promoter (P1), said tpRNA comprising from 5’ to 3’: a gene L, an RTtag sequence operably linked to the gene L and a SPBM1 sequence, wherein said tC1 is suitable for allowing, in the bacterial cell, the transcription of a tpRNA, wherein the SPBM1 is capable of binding to an SP present in the bacterial cell at a first specific binding site (SPS1); - a transcription cassette (tC2) comprising a sequence encoding a prRNA operably linked to a promoter (P2), said prRNA comprising: an RBM sequence positioned in 5’ end, preferably the RBM of SEQ ID NO: 7, an SPBM2 sequence, preferably the SPBM2 of SEQ ID NO: 18, and an RTprimer, preferably an RTprimer of SEQ ID NO: 13, wherein
said tC2 is suitable for allowing, in the bacterial cell, the transcription of a prRNA, wherein the RTprimer is capable of complementary pairing to the RTtag, the SPBM2 is capable of binding to the SP, the sequence encoding the prRNA optionally further comprising a sequence encoding a tRNA sequence contiguously positioned downstream of the RTprimer sequence, a site cleavable by an RNAse of the bacterial cell is present between said tRNA sequence and said RTprimer, thereby allowing the production of a well-defined 3’ prRNA end; - an expression cassette (eC1) comprising a sequence encoding an RBD-RT fusion protein operably linked to a promoter (P3), said RBD-RT comprising a reverse transcriptase (RT) sequence, especially TF1 RT (e.g., of SEQ ID NO: 1), MMLV RT (e.g., SEQ ID NO: 3) or HIV-1 RT (e.g., of SEQ ID NO: 2), and an RBD sequence, preferably an RBD of SEQ ID NO: 5, wherein said eC1 is suitable for allowing, in the bacterial cell, the expression of the RBD-RT fusion protein, wherein the RBD is capable of binding to the RBM of prRNA, and - an expression cassette (eC2) comprising a sequence encoding the SP operably linked to a promoter (P4), preferably said SP being the Hfq protein, preferably the Hfq of SEQ ID NO: 15, wherein eC2 is suitable for allowing, in the bacterial cell, the expression of the SP, preferably the Hfq protein. ferably, the vector or the set of vectors is low copy vectors. a particular aspect, the diversity generation could be multiplexed in order to allow the co- lution of several genes of interest, allowing for instance the evolution of biological pathways multiprotein complexes. In the context of a multiplexed method, then a couple of tpRNA and NA will be designed for each gene of interest to be evolved. For instance, for a pathway or mplex comprising two genes of interest, the method comprises the providing of a first couple of NA and prRNA for the first gene of interest and of a second couple of tpRNA and prRNA for second gene of interest. If the module 1 is carried out with an SP, the same system of SP, BM1 and 2, SPS1 and 2, RBM and RBD can used for the different couples of tpRNA and NA or distinct systems can be used for each couple of tpRNA and prRNA. Alternatively, erent tpRNAs with the same RTtag could share the same prRNA. The multiplexed version of invention can be applied, for instance, for metabolic engineering or strain development. believed that it is the first time that the use of an error-prone retroviral/retrotransposon reverse scriptase in bacteria for evolution purposes is reported, as well as the strategy of using pre- NA fusions to obtain RNAs with well defined 3’ sequence that are required for efficient reverse transcription. Indeed, the inventors overcome a series of difficulties such as the very short half-
of RNAs and linear DNA in bacteria that result, respectively, in low reverse transcription ciency and low cDNA amounts, in particular by the combination of the module 1 with the dule 2. dule 2: Preservative effectors second module comprises means for allowing to improve the stability of oligonucleotides in bacterial cell. second module is an optional module that can be combined to the first module in order to ance the retro-transcription efficiency of the RT. preferred second aspect, the preservative effector corresponds to an HR factor that is expressed overexpressed by the bacterial cell. Advantageously, the HR factor of the second module can grate the neosynthesized cDNA copies of gene L in DNA vectors that comprises a copy of the e L. Such an integration thus prevents neosynthesized cDNA copies from degradation in the terial cell. Accordingly, the HR factor allows to replace a copy of the gene L included in a tor introduced into the bacterial cell or a copy of the gene L present in the genome of the terial cell, e.g., vector(s) that encodes exogenous required elements of the modules, described ein. Importantly, the capacity of HR factor to integrate the cDNA copies of gene L generated he module 1 into a DNA vector or set of vectors that codes for elements of the module 3, allows ctional coupling between the first and third modules. HR factor is a recombinase that mediates recombination-mediated genetic engineering usinggle-strand DNA, in particular the neosynthesized cDNA copies of the gene L. The HR factor is ferably a beta recombinase. Beta recombinase binds to ssDNA and anneals to the ssDNA tomplementary ssDNA such as, for example, complementary genomic DNA. The betaombinase can be a recombinase as disclosed in Datta et al (Proc Natl Acad Sci USA 105: 1626- 1 (2008)) or a recombinase selected in the non-exhaustive group comprising bet of lambda ge of E coli, s065/s066 of SXT element of Vibrio cholerae, plu2935 of Photorhabdus inescens, EF2132 of Enterococcus faecalis, recT of Rac prophage of E coli, orfC of Legionella umophila, gp35 of SPP1 phage of Bacillus subtilis, gp61 of Che9c phage of Mycobacterium gmatis, orf48 of A118 phage of Listeria monocytogenes, orf245 of ul36.2 of Lactococcus lactis gp20 of phiNM3 phage of Staphylococcus aureus. See also, recombinase as disclosed inO2017/184227, the disclosure thereof being incorporated herein by reference.
more preferred aspect, the HR factor of the second module corresponds to a beta recombinase h as the lambda phage recombinant factor (λBet) whose sequence may correspond to SEQ ID : 19. he method includes the modules 3 and 4, then the RH factor is mandatory. Of course, in order obtain the recombination, the bacterial cell comprise a copy of the gene L or a part thereof able for allowing the introduction of a neosynthesized copy of the gene L into the vector or ome by recombination. In a preferred aspect, the copy of the gene L or a part thereof is operably ed to a promoter, more preferably part of an expression cassette. The expression cassette may her comprise elements of module 3. present invention relates to a bacterial cell comprising the above-mentioned components of first module, preferably the tpRNA, the prRNA and RT, more preferably the SP, the tpRNA, prRNA and the RBD-RT, and further comprises an HR factor, preferable beta recombinase h as λBet and the use thereof for generating diversity in a gene of interest and for increasing stability of oligonucleotides in the bacterial cell, thereby improving the generation of diversity gene L. present invention relates to a method for generating diversity in a gene L comprising any ect of the two steps described for the module 1, wherein the bacterial cell further comprises an factor, preferable beta recombinase such as λBet. present invention further relates to a vector or set of vectors as described for the module 1, i.e.mprising tC1, tC2, eC1 and optionally eC2, and further comprising: expression cassette (eC3) comprising an HR factor gene operably linked to a promoter (P5),erein said eC3 is suitable for allowing, in the bacterial cell, the expression of an HR factor able of integrating the altered copies of the gene L into a DNA vector or into the genome of bacterial cell, said vector or genome comprising a copy of the gene L, thereby preserving the red copies of the gene L from degradation. present invention also relates to a vector or set of vectors as described for module 1 that furthermprises the elements described below, and a bacterial cell comprising this vector or set of tors or comprising the elements of the vector or set of vectors as described for module 1 andments as defined below, the elements being: - an expression cassette (eC3) comprising an HR factor gene operably linked to a promoter (P5), wherein said eC3 is suitable for allowing, in the bacterial cell, the expression of an HR factor capable of integrating the altered copies of the gene L into a DNA vector or into
the genome of the bacterial cell, said vector or genome comprising a copy of the gene L, thereby preserving the altered copies of the gene L from degradation. ferably, the HR factor gene is a beta recombinase, especially λBet. ferably the vector or the set of vectors is low copy vector. dule 3: Two hybrid system (B2H) module 3 can be added to the modules 1 and 2. This module is a bacterial two-hybrid system able for selecting variants of the gene L based on their binding capacity to a target molecule In particular, the functional coupling between the first module and the third module requires presence of a second module that necessarily comprises an HR factor. Alternatively, the dule 3 in its improved and optimal aspects is also of interest even in absence of the modules 1 2 as further discussed below. portantly, the addition of the third module allows to adapt the methods disclosed herein for nd screening purposes. Indeed, the third functional module comprises a B2H system whosemponents are expressed by the bacterial cell in order to detect interactions between FPR (a on protein comprising the target molecule) and FPL (a fusion protein comprising the ligandmain encoded by the variants of the gene L, generated by the diversity generation of module 1 integrated into a vector/genome by the homologous recombination of module 2).cording to the third aspect of the disclosure, the FPL comprises a ligand domain that is derivedm a copy of the gene L that is included in a DNA vector of the bacterial cell. Since the required allows to integrate altered copies of the gene L in such a vector, the L domain of the FPL can modified and ligand variants can thus be generated. Modifications of the original gene L coding nd domain of FPL can convert an original ineffective ligand domain into an effective ligandmain. Conversely, an original effective ligand can be converted in an improved, debased or fective ligand domain. ferent ligand screening strategies can be implemented. In case the original gene L encodes an fective ligand, some methods according to the third aspect of the disclosure allow to detect red copies of the gene L that are responsible for the expression of an effective ligand. ernatively, in case the original gene L encodes an effective ligand, methods according to the d aspect of the disclosure allow to detect altered copies of the gene L that are responsible for expression of an improved, debased or ineffective ligand.
B2H system of the third functional module allows to positively couple the binding eventsween FPR and FPL with the expression of the reporter gene. instance, when the L domain of FPL corresponds to an effective ligand, the interaction betweenL and FPR allow to recruit an RNA polymerase that interacts with a promoter operably linkedhe reporter gene, so as to trigger the expression of the latter. The signal intensity provided by reporter protein is thus directly correlated to the binding affinity of the ligand. In a consistentnner, when an effective ligand is converted in an improved ligand, the quantifiable reporternal increases. Conversely, when an effective original ligand is converted in an ineffective nd, the quantifiable reporter signal decreases. quantification of the reporter signal is particularly important in ligand screening methods,ce it allows to select a desired ligand variant, i.e. an effective, improved, debased or ineffective , encoded by an altered copy of the gene L. More particularly, ligand screening methods lementing the third module of the disclosure allow the selection of the ligand variant encoded an altered copy of the gene L when the reporter is expressed, optionally at least at adetermined level. an alternative aspect, the B2H system of the third module allows to negatively couple theding events between FPR and FPL with the expression of the reporter gene. Then, the presentclosure relates to a method for screening a ligand molecule capable of binding a target moleculem variants encoded by altered copies of a gene L, wherein the bacterial cell comprises aterial two-hybrid system (B2H) comprising a construct with a promoter (P), a sequence ning a ribosome binding site (RBS) and a reporter gene, the P sequence being operably linkedhe RBS sequence and the reporter gene, and the expression of the promoter being controlled B2H system including FPR and FPL, and the method comprises the selection of the variantoded by an altered copy of the gene L when the reporter is expressed, optionally at least at adetermined level. his aspect, when the L domain of FPL corresponds to an effective ligand, the interactionween FPL and FPR allow to recruit an RNA polymerase that interacts with a promoter operably ed to a repressor gene. The B2H-regulated repressor gene then allows to inhibit the scription from the promoter gene operably linked to the reporter gene, thereby decreasing the ression of said reporter gene. The signal intensity provided by the reporter protein is thus rectly correlated to the binding affinity of the ligand. Therefore, when an effective ligand is verted in an improved ligand, the quantifiable reporter signal decreases or disappears.
versely, when an effective original ligand is converted in an altered or ineffective ligand, the ntifiable reporter signal increases. n, the present disclosure relates to a method for screening a ligand molecule capable of binding rget molecule from variants encoded by altered copies of a gene L, wherein the bacterial cell mprises a bacterial two-hybrid system (B2H) comprising a first construct comprising a first moter P, a first RBS and a reporter gene, the first promoter P allowing a stable basal level of ression of the reporter gene, and a second construct comprising a second promoter P’, a second S and a repressor gene, said repressor being capable of targeting the first promoter P to block transcription of the reporter gene, and the expression of the promoter P’ being controlled the H system including FPR and FPL, and the method comprises the selection of the variant oded by an altered copy of the gene L when the expression of the reporter is decreased, ionally under a predetermined level. terial two-hybrid (B2H) systems are well known by the person skilled in the art. For instance, mples of B2H are disclosed in WO9825947, McLaughlin et al (2012, Nature, 491, 138-142), gh et al (2016, PLOS Pathogen, DOI:10.1371) and Poelwijk et al (2019, Nature mmunications, 10, 4213), the disclosure thereof being incorporated herein by reference. In icular, B2H used in the present disclosure can be a B2H system as developed and described by ve et al (Methods Mol Biol. 2004;261:231-46) with one of the fusion proteins having scription activator when its interaction partner is fused to a subunit of the bacterial RNA ymerase. particular aspect, the first partner is a DNA binding domain (DBD) and the second partner is anscription subunit (TrSu). For instance, the DBD can be cI protein of bacteriophage lambda may have a sequence of SEQ ID NO: 22 and the transcription activator can be the subunit ha of the RNA polymerase and may a sequence of SEQ ID NO: 23. Other DBDs and TrSus can used in order to build two hybrid systems. Theoretically, the great majority of the domain that bind to DNA could be used as DBD in a B2H set-up. Especially, but not limited to, repressors m different families (such as cI, lacI and tetR), zinc-fingers, transcription activator-like ctors (TALE) and dead Cas9 (dCas9). Badran et al (2016, Nature, 533, 58-63) demonstrated used the DBD from 494 phage cI while Joung et al (2000, PNAS, 97, 7382-7387) demonstrated use of zinc-finger domains; Yurlova et al the use of lacI in a fluorescent two-hybrid assay 14, Journal of Biomolecule Screening, 19, 516-525); Li, et al the use of TALEs (2012, entific Reports, 2, 897) and; Hass & Zappulla the use of dCas9 (DOI: 10.1101/139600). Concerning the use of other Escherichia coli RNA polymerase subunits as TrSus, Dove &
hschild (1998, Genes & Development, 12, 745-754) and Badran et al (2016, Nature, 533, 58- used omega subunit of Escherichia coli RNA polymerase (coded by gene rpoZ). Hennecke et (2005, Protein Engineering, Design and Selection, 18, 477-486) also demonstrated the sibility of a B2H system inspired from toxR that can probe membrane and periplasmic ractions and that employs a domain that encompasses both functions DBD and TrSU without uding a bacterial RNA polymerase subunit thus acting as a transcription activator. ne aspect, the DBD is linked to the target molecule and forms a fusion protein (FPR) while the scription subunit is linked to the ligand domain encoded by the gene L and its variants and ms a fusion protein (FPL). In an alternative aspect, the transcription subunit is linked to the et molecule and forms a fusion protein (FPR) while the DBD is linked to the ligand domain oded by the gene L and its variants and forms a fusion protein (FPL). DBD and the transcription subunit are selected in order to promote the expression of the orter gene or the repressor gene when a binding between FPR and FPL occurs, more particularly en a binding of the ligand domain L and the target molecule occurs. The B2H system can be usted to be able to select a suitable affinity for the binding of the ligand domain L and the target lecule. inventors designed an optimal reporting system for the B2H based on at least three main ures that are: a) improved signal-to-noise ratio; b) the good correlation between affinity and genetic signal generated and; c) the reduction of signal stochasticity. The first is required to ably distinguish interactions from the basal expression level (or background noise), the second the trustworthy comparison of affinities and the third to allow the retrieval of reliable ormation from large scale experiments. This optimized B2H differs from previous known B2H ems by these three properties which are essential for simultaneous large scale analysis of tein-protein interactions. irst element of this B2H system is the promoter controlling the expression of the reporter gene he repressor gene. Then, in a more preferred aspect, the reporter gene or the repressor gene of B2H system is associated with the promoter epB2H (SEQ ID NO: 24) or an derivative thereof defined below. This particular promoter surprisingly provides an optimal balance between an antageous strong genetic output, i.e. a stronger reporter signal intensity, and a good correlation ween ligand affinity and signal intensity. Furthermore, the designed promoter also invalidates methylation site that was associated to low frequency expontaneous autoactivation thereby viding more consistent outputs and making it more suitable for molecular evolution applications with large number of cells and for longer selection periods.
articular, the methylation motif CC(A/T)GG, the methylated nucleotide being in bold, is tated to invalidate methylation site. In a particular aspect, CCAGG can be substituted by CGG. This modification allows more homogeneous transcription among different cells creased stochasticity) and a decreased frequency of interaction-independent transcription desirable transcription in absence of interaction between fusions). promoter comprises a -10 box and a -35 box, the distance between the boxes being between and 19 bases. The sequence between the two boxes has minor effect on promoter activity. difications have been carried out in -10 and -35 boxes for improving recognition by scription sigma factor, thereby allowing a better signal-to-noise ratio in B2H systems. More icularly, the -10 box has a sequence of GATACT and the -35 box has a sequence of TTGACA. ally, the last element of the promoter is the operator, the sequence recognized by the DBD, for ance cI protein. The operator can be selected among OR1, OR2, OR3, OL1, OL2 and OL3 bda operators. In a particular aspect, the operator is OL2. The centre of the operator is ferably placed 62 bases upstream the transcription start. n, the promoter may comprise, from 5’ to 3’, an operator recognized by DBD, an invalidated hylation site, a modified -35 box of sequence of TTGACA, a modified -10 box has a sequence GATACT. More specifically, the promoter meets one or several of the following features: - centre of the operator is placed about 62 bases upstream the transcription start; - invalidated methylation site has a sequence of GGCGG; - the distance between the -35 and -10 boxes is between 15 and 19 bases; and - the operator is selected among OR1, OR2, OR3, OL1, OL2 and OL3 lambda operators. one aspect, the promoter has the following sequence/structure: erator – (N)
11-GGCGG-N-TTGACA-(N)
15-19-GATACT-(N)
6-Start, lidated methylation site -35 box -10 box h N being any base (A, T, C or G). specific aspect, the promoter has an operator selected among OR1, OR2, OR3, OL1, OL2 and 3 lambda operators operably linked to a sequence CGGCTTGACACTTTATGCTTCCGGCTCGGATACTGTGTGGA (SEQ ID NO 68) or a sequence ing at least 80, 85, 90 or 95 % of identity with SEQ ID NO 68 and no modification in the region with bold and underlined nucleotides.
more specific aspect, the promoter has the following sequence: ACACCGCCAGAGATACATTAGGCACCGGCGGCTTGACACTTTATGCTTCCGGCTCGGATACTGTGGA (SEQ ID NO 24) or a sequence having at least 80, 85, 90 or 95 % of identity with SEQ NO 24 and no modification in the region with bold and underlined nucleotides. ranscription terminator has been placed upstream the operator element of epB2H promoter in er to avoid that transcription from upstream elements disturbs epB2H regulation. For instance, terminator last base could be placed between 15 and 53 bases (about 1.5 to 5 DNA helix turns) tream of the first operator base. More specifically, the terminator last base could be placed 26 es upstream of the first operator base. The terminator can be selected among small and strongminators, for instance those disclosed in Chen et al (2013, Nature Methods, 10, 659-666), theclosure thereof being incorporated herein by reference, in particular the terminators specificallyclosed in Supplementary Tables 2–4 of Chen et al. In a particular aspect, the transcriptionminator has the following sequence (SEQ ID NO: 69 = CAAAAAACCCCGCCCCTGACAGGGCGGGGTTTTTTCGC). n, the B2H system of the present invention comprises a promoter as disclosed above and a scription terminator placed upstream of the first base of the operator. ferably, the expression cassette of the reporter gene is on a single and low copy number vector s integrated into the bacterial genome. a more preferred aspect, the expression of the FPR and/or FPL component, optionally themponent comprising the DBD, is controlled by the association of a strong promoter and a weak S. Accordingly, the sequences of the FPR and/or FPL component, optionally the componentmprising the DBD, are operably linked both to a strong promoter and a weak RBS. Interestingly, inventors show that this association of a strong promoter and a weak RBS decreases thechastic behaviour, thereby further improving the B2H system. In a particular aspect, the uences of the FPR and/or FPL component of the B2H system are associated with the weak RBSmed RBS7 (SEQ ID NO:20) and the strong promoter pLTetO (SEQ ID NO:21). In a particular ect, the sequences of the FPR and/or FPL component of the B2H system are operably linked toombination of the promoter pLTetO with the RBS7 and has the following sequence GACATCCCTATCAGTGATAGAGATACTGCTAGCACTTAAGTAGACCAGCTCGC GGTCATATA (SEQ ID NO 70) or a sequence having at least 95 % of identity with SEQ ID 70 and no modification in the region with bold and underlined nucleotides.
present invention relates to a bacterial cell comprising the above-mentioned components of first module and the second module comprising an HR factor as detailed above, and that furthermprises the B2H components as detailed herein in any aspect and uses thereof for detecting the raction between a target molecule and a ligand variant generated from the altered copies of e L and/or select an altered copies of gene L for its interacting abilities. The present inventiono relates to a bacterial cell comprising the above-mentioned components of the third module, ecially with its improved and optimal aspects. ne aspect, the present invention relates to a method for screening a ligand molecule capable of ding a target molecule from variants encoded by altered copies of a gene L, comprising any ects of the steps described for the module 1 and steps described for module 2 wherein the dule 2 comprises an HR factor, wherein the provided bacterial cell further comprises a B2H em comprising : - a promoter (P), a sequence defining a ribosome binding site (RBS) and a reporter gene, the P sequence being operably linked to the RBS sequence and the reporter gene, - a fusion protein (FPR) comprising the target molecule and a DNA binding domain (DBD), said DBD being capable of binding to a site located at proximity of the promoter P so as to promote the expression of the reporter gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, and - a fusion protein (FPL) comprising a variant encoded by an altered copy of the gene L and transcription subunits (TrSu) capable of recruiting an RNA polymerase, and. method comprises the selection of the variant encoded by an altered copy of the gene L when reporter is expressed, optionally at least at a predetermined level. ferably, the B2H comprises a strong promoter and a weak RBS operably linked to the FPR /or FPL component, preferably FPR. In a particular aspect, the sequences of the FPR and/orL component, preferably FPR, of the B2H system are associated with the weak RBS named S7 (SEQ ID NO:20) and the strong promoter pLTetO (SEQ ID NO:21). In a particular aspect, sequences of the FPR and/or FPL component of the B2H system are operably linked to ambination of the promoter pLTetO with the RBS7 and has the following sequence: GACATCCCTATCAGTGATAGAGATACTGCTAGCACTTAAGTAGACCAGCTCGC GGTCATATA (SEQ ID NO 70) or a sequence having at least 95 % of identity with SEQ ID 70 and no modification in the region with bold and underlined nucleotides.
erably, the promoter P is the promoter epB2H (SEQ ID NO: 24) or a derivative thereof as ailed above. Accordingly, the promoter P has the following structure: Operator – (N)
11-GGCGG-N-TTGACA-(N)
15-19-GATACT-(N)
6-Start, h operator being the sequence recognized by DBD, Start being the nucleotide where the scription starts, and N being any base (A, T, C or G). a preferred aspect, a transcription terminator is placed upstream the operator, preferably of a scription terminator having a sequence as shown in SEQ ID NO: 69. a more specific aspect, the DBD is a cI protein and the promoter P has an operator selected ong OR1, OR2, OR3, OL1, OL2 and OL3 lambda operators operably linked to a sequenceGCGGCTTGACACTTTATGCTTCCGGCTCGGATACTGTGTGGA (SEQ ID NO: 68) or a uence having at least 80, 85, 90 or 95 % of identity with SEQ ID NO: 68 and no modification he region with bold and underlined nucleotides. an even more specific aspect, the DBD is a cI protein and the promoter P has the following uence: ACACCGCCAGAGATACATTAGGCACCGGCGGCTTGACACTTTATGCTTCCGGCTCGGATACT GTGGA (SEQ ID NO: 24) or a sequence having at least 80, 85, 90 or 95 % of identity with Q ID NO: 24 and no modification in the region with bold and underlined nucleotides. n alternative aspect, the present invention relates to a method for screening a ligand molecule able of binding a target molecule from variants encoded by altered copies of a gene L,mprising any aspects of the steps described for the module 1 and steps described for module 2 erein the module 2 comprises an HR factor, wherein the provided bacterial cell furthermprises a B2H system comprising : - a promoter (P), a sequence defining a ribosome binding site (RBS) and a reporter gene, the P sequence being operably linked to the RBS sequence and the reporter gene, - a fusion protein (FPR) comprising the target molecule and transcription subunits (TrSu) capable of recruiting an RNA polymerase, and - a fusion protein (FPL) comprising a variant encoded by an altered copy of the gene L and a DNA binding domain (DBD), said DBD being capable of binding to a site located at proximity of the promoter P so as to promote the expression of the reporter gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, and
method comprises the selection of the variant encoded by an altered copy of the gene L when reporter is expressed, optionally at least at a predetermined level. ernatively, when the method is for screening a ligand molecule that loses the binding capacity target molecule from variants encoded by altered copies of a gene L, the method comprises selection of the variant encoded by an altered copy of the gene L when the reporter is decreased, onally under a predetermined level. ferably, the B2H comprises a strong promoter and a weak RBS operably linked to the FPR /or FPL component, preferably FPL. In a particular aspect, the sequences of the FPR and/or L component, preferably FPL, of the B2H system are associated with the weak RBS named S7 (SEQ ID NO: 20) and the strong promoter pLTetO (SEQ ID NO: 21). In an alternative ect, the sequences of the FPR and/or FPL component of the B2H system are operably linked to ombination of the promoter pLTetO with the RBS7 and has the following sequence GACATCCCTATCAGTGATAGAGATACTGCTAGCACTTAAGTAGACCAGCTCGC GGTCATATA (SEQ ID NO 70) or a sequence having at least 95 % of identity with SEQ ID 70 and no modification in the region with bold and underlined nucleotides. ferably, the promoter P is the promoter epB2H (SEQ ID NO: 24) or an alternative thereof. cordingly, the promoter P has the following structure: Operator – (N)
11-GGCGG-N-TTGACA-(N)
15-19-GATACT-(N)
6-Start, h operator being the sequence recognized by DBD, Start being the nucleotide where the scription starts, and N being any base (A, T, C or G). a preferred aspect, a transcription terminator is placed upstream the operator, preferably of a scription terminator having a sequence as shown in SEQ ID NO: 69. a more specific aspect, the DBD is a cI protein and the promoter P has an operator selected ong OR1, OR2, OR3, OL1, OL2 and OL3 lambda operators operably linked to a sequenceGCGGCTTGACACTTTATGCTTCCGGCTCGGATACTGTGTGGA (SEQ ID NO: 68) or a uence having at least 80, 85, 90 or 95 % of identity with SEQ ID NO: 68 and no modification he region with bold and underlined nucleotides. an even more specific aspect, the DBD is a cI protein and the promoter P has the following uence
CACCGCCAGAGATACATTAGGCACCGGCGGCTTGACACTTTATGCTTCCGGCTCGGATACT GTGGA (SEQ ID NO: 24) or a sequence having at least 80, 85, 90 or 95 % of identity with Q ID NO: 24 and no modification in the region with bold and underlined nucleotides. present invention also relates to a method for screening a ligand molecule that loses the acity of binding a target molecule from variants encoded by altered copies of a gene L, mprising any aspects of the steps described for the module 1 and steps described for module 2 erein the module 2 comprises an HR factor as detailed above, wherein the provided bacterial further comprises a B2H system comprising : - a first promoter P, a sequence defining a first ribosome binding site (RBS) and a reporter gene, the first promoter P being operably linked to the first RBS sequence and the reporter gene and allowing a stable basal level of expression of the reporter gene, and - a second promoter P’, a sequence defining a second RBS and a repressor gene, the second promoter P’ being operably linked to the second RBS sequence and the repressor gene, said repressor being capable of targeting the first promoter P to block the transcription of the reporter gene, - a fusion protein (FPR) comprising the target molecule and a DNA binding domain (DBD), said DBD being capable of binding to a site located at proximity of the promoter P’ so as to promote the expression of the repressor gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, and - a fusion protein (FPL) comprising a variant encoded by an altered copy of the gene L and transcription subunits (TrSu) capable of recruiting an RNA polymerase; and method comprises the selection of the variant encoded by an altered copy of the gene L when expression of the reporter is increased, optionally at least to a predetermined level. ferably, the B2H comprises a strong promoter and a weak RBS operably linked to the FPR /or FPL component, preferably FPR. In a particular aspect, the sequences of the FPR and/or L component, preferably FPR, of the B2H system are associated with the weak RBS named S7 (SEQ ID NO: 20) and the strong promoter pLTetO (SEQ ID NO: 21). In an alternative ect, the sequences of the FPR and/or FPL component of the B2H system are operably linked to ombination of the promoter pLTetO with the RBS7 and has the following sequence GACATCCCTATCAGTGATAGAGATACTGCTAGCACTTAAGTAGACCAGCTCGC GGTCATATA (SEQ ID NO 70) or a sequence having at least 95 % of identity with SEQ ID NO 70 and no modification in the region with bold and underlined nucleotides.
erably, the promoter P’ is the promoter epB2H (SEQ ID NO: 24) or a derivative thereof as ned above. For instance, the promoter P’ has the following sequence: ACACCGCCAGAGATACATTAGGCACCGGCGGCTTGACACTTTATGCTTCCGGCTCGGATACT GTGGA (SEQ ID NO 24) or a sequence having at least 80, 85, 90 or 95 % of identity with SEQ NO 24 and no modification in the region with bold and underlined nucleotides. ionally, the repressor could be SrpR and the promoter P could be T7-SprOx2. ernatively, the present invention also relates to a method for screening a ligand molecule that es capable the capacity of binding a target molecule from variants encoded by altered copies of ene L, comprising any aspects of the steps described for the module 1 and steps described for dule 2 wherein the module 2 comprises an HR factor as detailed above, wherein the provided terial cell further comprises a B2H system comprising : - a first promoter P, a sequence defining a first ribosome binding site (RBS) and a reporter gene, the first promoter P being operably linked to the first RBS sequence and the reporter gene and allowing a stable basal level of expression of the reporter gene, and - a second promoter P’, a sequence defining a second RBS and a repressor gene, the second promoter P’ being operably linked to the second RBS sequence and the repressor gene, said repressor being capable of targeting the first promoter P to block the transcription of the reporter gene, - a fusion protein (FPR) comprising the target molecule and transcription subunits (TrSu) capable of recruiting an RNA polymerase, and - a fusion protein (FPL) comprising a variant encoded by an altered copy of the gene L and a DNA binding domain (DBD), said DBD being capable of binding to a site located at proximity of the promoter P’ so as to promote the expression of the repressor gene when the target molecule is bound to a variant encoded by an altered copy of the gene L; and method comprises the selection of the variant encoded by an altered copy of the gene L when expression of the reporter is increased, optionally at least at a predetermined level. ferably, the B2H comprises a strong promoter and a weak RBS operably linked to the FPR /or FPL component, preferably FPL. In a particular aspect, the sequences of the FPR and/or L component, preferably FPL, of the B2H system are associated with the weak RBS named S7 (SEQ ID NO: 20) and the strong promoter pLTetO (SEQ ID NO: 21). In an alternative ect, the sequences of the FPR and/or FPL component of the B2H system are operably linked to a combination of the promoter pLTetO with the RBS7 and has the following sequence
GACATCCCTATCAGTGATAGAGATACTGCTAGCACTTAAGTAGACCAGCTCGCGGTCATATA (SEQ ID NO 70) or a sequence having at least 95 % of identity with SEQ ID 70 and no modification in the region with bold and underlined nucleotides. ferably, the promoter P’ is the promoter epB2H (SEQ ID NO: 24) or a derivative thereof asclosed above. For instance, the promoter P’ has the following sequence: ACACCGCCAGAGATACATTAGGCACCGGCGGCTTGACACTTTATGCTTCCGGCTCGGATACTGTGGA (SEQ ID NO 24) or a sequence having at least 80, 85, 90 or 95 % of identity with SEQ NO 24 and no modification in the region with bold and underlined nucleotides. present invention further relates to a vector or set of vectors as described above for module 1 module 2 including HR, to a bacterial comprising said vector or set of vectors, and to the useaid vector or set of vectors or said bacterial cell, said vector or set of vectors further comprising: an expression cassette (eC4) comprising a sequence encoding a reporter gene operably linked to a promoter (P6), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the reporter gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising an insertion site suitable for the insertion of the gene L and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of a FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L; an expression cassette (eC4) comprising a sequence encoding a reporter gene operably linked to a promoter (P6), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and
an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising an insertion site suitable for the insertion of the gene L and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the reporter gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of an FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L; an expression cassette (eC4) comprising a sequence encoding a reporter gene operably linked to a promoter (P6), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the reporter gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising the gene L and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of a FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L; an expression cassette (eC4) comprising a sequence encoding a reporter gene operably linked to a promoter (P6), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising the gene L and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the reporter gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC6 is suitable for allowing, in the bacterial cell, the
expression of an FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L. present invention also relates to a vector or set of vectors as described above, that furthermprises: an expression cassette (eC4) comprising a sequence encoding a repressor gene operably linked to a promoter (P6), an expression cassette (eC4’) comprising a sequence encoding a reporter gene operably linked to a promoter (P6’), the expression of the reporter gene being negatively controlled by the repressor encoded by (eC4), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the repressor gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising an insertion site suitable for the insertion of the gene L and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of a FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L; an expression cassette (eC4) comprising a sequence encoding a repressor gene operably linked to a promoter (P6), an expression cassette (eC4’) comprising a sequence encoding a reporter gene operably linked to a promoter (P6’), the expression of the reporter gene being negatively controlled by the repressor encoded by (eC4), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising an insertion site suitable for the insertion of the gene L and a DBD sequence, said DBD being capable of binding to a site located at proximity of
he promoter P6 so as to promote the expression of the repressor gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of an FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L; an expression cassette (eC4) comprising a sequence encoding a repressor gene operably linked to a promoter (P6), an expression cassette (eC4’) comprising a sequence encoding a reporter gene operably linked to a promoter (P6’), the expression of the reporter gene being negatively controlled by the repressor encoded by (eC4), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the repressor gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising the gene L and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of a FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L; an expression cassette (eC4) comprising a sequence encoding a repressor gene operably linked to a promoter (P6), an expression cassette (eC4’) comprising a sequence encoding a reporter gene operably linked to a promoter (P6’), the expression of the reporter gene being negatively controlled by the repressor encoded by (eC4), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising the gene L and a DBD sequence, said DBD being
capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the repressor gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of an FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L. ionally, the promoter P7 and/or P8 comprises a strong promoter and a weak RBS, in particular eak RBS named RBS7 (SEQ ID NO: 20) and a strong promoter such as pLTetO (SEQ ID NO: . In a particular aspect, the sequences of the FPR and/or FPL component of the B2H system are rably linked to a combination of the promoter pLTetO with the RBS7 and has the following uence GACATCCCTATCAGTGATAGAGATACTGCTAGCACTTAAGTAGACCAGCTCGC GGTCATATA (SEQ ID NO 70) or a sequence having at least 95 % of identity with SEQ ID 70 and no modification in the region with bold and underlined nucleotides. ionally, the promoter P6 or P6’ is the promoter epB2H (SEQ ID NO: 24) or an alternative reof. cordingly, the promoter P6 or P6’ has the following structure: Operator – (N)
11-GGCGG-N-TTGACA-(N)
15-19-GATACT-(N)
6-Start, h operator being the sequence recognized by DBD, Start being the nucleotide where the scription starts, and N being any base (A, T, C or G). a preferred aspect, a transcription terminator is placed upstream the operator, preferably of a scription terminator having a sequence as shown in SEQ ID NO: 69. a more specific aspect, the DBD is a cI protein and the promoter P6 or P6’ has an operator cted among OR1, OR2, OR3, OL1, OL2 and OL3 lambda operators operably linked to a uence GCGGCTTGACACTTTATGCTTCCGGCTCGGATACTGTGTGGA (SEQ ID NO: 68) or a uence having at least 80, 85, 90 or 95 % of identity with SEQ ID NO: 68 and no modification he region with bold and underlined nucleotides. an even more specific aspect, the DBD is a cI protein and the promoter P6 or P6’ has the owing sequence
CACCGCCAGAGATACATTAGGCACCGGCGGCTTGACACTTTATGCTTCCGGCTCGGATACTGTGGA (SEQ ID NO: 24) or a sequence having at least 80, 85, 90 or 95 % of identity withQ ID NO: 24 and no modification in the region with bold and underlined nucleotides.ferably the vector or the set of vectors is low copy vector. present invention also relates to the B2H system with the improvements and its uses,ependently of the modules 1 and 2. cordingly, the present invention relates to a method for determining a capacity of a ligandolecule and variants of the ligand molecule of binding a target molecule in a bacterial cell, erein the bacterial cell comprises a two-hybrid system (B2H) comprising: a promoter (P), a sequence defining a ribosome binding site (RBS) and a reporter gene, the P sequence being operably linked to the RBS sequence and the reporter gene, and a fusion protein (FPR) comprising the target molecule and a DNA binding domain (DBD), said DBD being capable of binding to a site located at proximity of the promoter P so as to promote the expression of the reporter gene when the target molecule is bound to the ligand molecule or a variant thereof, and a fusion protein (FPL) comprising the ligand molecule or a variant thereof and transcription subunits (TrSu) capable of recruiting an RNA polymerase, or a fusion protein (FPL) comprising the ligand molecule or a variant thereof and a DNA binding domain (DBD), said DBD being capable of binding to a site located at proximity of the promoter P so as to promote the expression of the reporter gene when the target molecule is bound to the ligand molecule or a variant thereof, and a fusion protein (FPR) comprising the target molecule and transcription subunits (TrSu) capable of recruiting an RNA polymerase, and the method comprises the measure of the level of expression of the reporter gene, thereby determining the capacity of a ligand molecule and a variant thereof of binding a target molecule; wherein the promoter (P) has the following structure: Operator – (N)
11-GGCGG-N-TTGACA-(N)
15-19-GATACT-(N)
6-Start,
with operator being the sequence recognized by DBD, Start being the nucleotide where the transcription starts, and N being any base (A, T, C or G); and wherein the fusion protein comprising DBD is operably linked to a strong promoter and a weak RBS. ferably, a transcription terminator is placed upstream the operator, preferably of a transcription minator having a sequence as shown in SEQ ID NO: 69. particular aspect, the DBD is a cI protein and the promoter (P) has an operator selected among 1, OR2, OR3, OL1, OL2 and OL3 lambda operators operably linked to a sequence GCGGCTTGACACTTTATGCTTCCGGCTCGGATACTGTGTGGA (SEQ ID NO: 68) or a uence having at least 80, 85, 90 or 95 % of identity with SEQ ID NO: 68 and no modification he region with bold and underlined nucleotides. more particular aspect, wherein the DBD is a cI protein and the promoter (P) has the following uence ACACCGCCAGAGATACATTAGGCACCGGCGGCTTGACACTTTATGCTTCCGGCTCGGATACT GTGGA (SEQ ID NO: 24) or a sequence having at least 80, 85, 90 or 95 % of identity with Q ID NO: 24 and no modification in the region with bold and underlined nucleotides. ionally, the strong promoter with the weak RBS has the following sequence GACATCCCTATCAGTGATAGAGATACTGCTAGCACTTAAGTAGACCAGCTCGC GGTCATATA (SEQ ID NO 70) or a sequence having at least 95 % of identity with SEQ ID 70 and no modification in the region with bold and underlined nucleotides. ionally, the weak RBS has the sequence as shown in SEQ ID NO: 20 and the strong promoter a sequence as shown in SEQ ID NO: 21. ionally, the method comprises the comparison of the level of expression of the reporter gene he ligand molecule to the level of expression of the reporter gene of the variant, thereby ermining the effect of the modification in the variant on the binding to the target molecule. present invention relates to any use of the method in any kind of applications. For instance, B2H system is well-adapted interface mapping of interacting proteins. This system is well pted to the Deep mutational scanning. Then, in a particular aspect, the present invention relates method for mapping amino acids in two interacting molecules (ligand and target), wherein ants of the ligand are prepared and the effect of the amino acid substitution(s) on their interaction with the target protein is determined by the method as detailed above. The variants of
igand can be generated by Deep mutational scanning, in which selected amino acid positions substituted by one or several amino acids, preferably by all amino acids. present invention also relates to a B2H system for determining a capacity of a ligand molecule variants of the ligand molecule of binding a target molecule comprising a bacterial cell mprising following expression cassettes an expression cassette (eC4) comprising a sequence encoding a reporter gene operably linked to a promoter (P6), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the reporter gene when the target molecule is bound to a ligand molecule or a variant thereof, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising the ligand molecule or a variant thereof and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of an FPL protein; an expression cassette (eC4) comprising a sequence encoding a reporter gene operably linked to a promoter (P6), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising the ligand molecule or a variant thereof and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the reporter gene when the target molecule is bound to a ligand molecule or a variant thereof, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising the target molecule and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of an FPL protein; wherein the promoter (P6) has the following structure:
Operator – (N)
11-GGCGG-N-TTGACA-(N)
15-19-GATACT-(N)
6-Start, with operator being the sequence recognized by DBD, Start being the nucleotide where the transcription starts, and N being any base (A, T, C or G); and erein the promoter (P7) and/or the promoter (P8) is/are a strong promoter operably linked to a ak RBS. ferably, a transcription terminator is placed upstream the operator, preferably of a transcription minator having a sequence as shown in SEQ ID NO: 69. a more specific aspect, the DBD is a cI protein and the promoter P6 has an operator selected ong OR1, OR2, OR3, OL1, OL2 and OL3 lambda operators operably linked to a sequence GCGGCTTGACACTTTATGCTTCCGGCTCGGATACTGTGTGGA (SEQ ID NO: 68) or a uence having at least 80, 85, 90 or 95 % of identity with SEQ ID NO: 68 and no modification he region with bold and underlined nucleotides. an even more specific aspect, the DBD is a cI protein and the promoter P6 has the following uence: ACACCGCCAGAGATACATTAGGCACCGGCGGCTTGACACTTTATGCTTCCGGCTCGGATACT GTGGA (SEQ ID NO: 24) or a sequence having at least 80, 85, 90 or 95 % of identity with Q ID NO: 24 and no modification in the region with bold and underlined nucleotides. a particular aspect, the sequences of the FPR and/or FPL component, preferably FPL, of the H system are associated with the weak RBS named RBS7 (SEQ ID NO: 20) and the strong moter pLTetO (SEQ ID NO: 21). In a particular aspect, the sequences of the FPR and/or FPL mponent of the B2H system are operably linked to a combination of the promoter pLTetO with RBS7 and has the following sequence GACATCCCTATCAGTGATAGAGATACTGCTAGCACTTAAGTAGACCAGCTCGC GGTCATATA (SEQ ID NO 70) or a sequence having at least 95 % of identity with SEQ ID 70 and no modification in the region with bold and underlined nucleotides. dule 4: Arrest of the evolution fourth module comprises means for allowing to stop the generation of diversity carried out by first and second modules of the disclosure. fourth module is an optional module that can be added to the combination of the three other modules in order to stop the evolution process, in particular when a ligand of interest has been
erated in a bacterial cell. The advantage of stopping the generation of diversity by using the rth module is the possibility to preserve the altered copy of the gene L that is expressed by the H system, i.e. by avoiding its replacement by another variant of the gene L. In addition, although eration of diversity is stopped by the fourth module, the expression of the selected variant and detection by the third module continue, thus allowing the isolation of the corresponding cells, the identification and characterization of the variant by suitable techniques known by the son skilled in the art. particular, the fourth module is functionally coupled to the B2H system of the third module. s functional coupling results from the fact that the sequence coding for the arrest factor of the rth module is operationally linked to a promoter controlled by the B2H, especially the reporter e and to its promoter or the repressor gene and its promoter. In other words, the expression of arrest factor depends on the binding or non-binding between FPL and FPR. By “arrest factor” he fourth module is intended to refer to proteins such as enzyme that actively triggers the arrest he generation of diversity. In addition, other elements can cooperate with the arrest factor in er to allow the arrest of the generation of diversity. arrest factor of the fourth module impairs the HR function and/or the RT function. In a more ferred aspect, the arrest factor of the fourth module impairs both the HR function and the RT ction. Impairment of the RT function allows to abolish the generation of altered copies of the e L while the impairment of the HR function allows to prevent these altered copies from being grated in an expression cassette of the FPL or FPR of the B2H system. preferred aspect, the arrest factor of the fourth module is expressed by the B2H system of the d module when the latter detects a binding between the FPL and the FPR. According to this ect, an effective ligand variant is generated from an original gene L that codes for an ineffective nd. The arrest of the generation of diversity then favours the identification of this effective nd variant. In this aspect, the expression of the arrest factor is controlled by the promoter of reporter gene or the repressor gene. sequence encoding the arrest factor can then be expressed by a polycistronic construct wing the expression of the reporter gene and the arrest factor or the expression of the repressor e and the arrest factor. Alternatively, the expression of the reporter or repressor gene and of the st factor can be controlled by similar but distinct promoters, all controlled by the B2H system. ionally, the arrest factor is an invertase. In a particular aspect, the fourth module comprises a DNA invertase that recognizes DNA sequences that are flanked by a pair of DNA invertase sites.
ording to this aspect, the expression of the DNA invertase is controlled by the B2H system the DNA invertases sites flank DNA sequence coding the RT and/or DNA sequence coding HR, thereby allowing their targeting by the DNA invertase. Optionally, the DNA invertase can he BxB1 DNA invertase (e.g., SEQ ID NO: 25) and the DNA invertase sites correspond tob1 attB (e.g., SEQ ID NO: 26) and Bxb1 attP (e.g., SEQ ID NO: 27). More particularly, attP isated in the reverse/complementary strain of the attB sequence. Other invertases and DNA ertase sites are known by the person skilled in the art and can be used in the fourth module. ernatively, the arrest factor can be a highly specific restriction enzyme. By highly specific, it rs to restriction enzymes having a long recognition site, preferable at least 11, 12, 13, 14, 15, 17, 18, 19 or 20 bp. In a particular aspect, the fourth module comprises a highly specific riction enzyme that recognizes DNA sequences that are flanked by a pair of restriction enzyme s. According to this aspect, the expression of the restriction enzyme is controlled by the B2H em and the restriction enzyme sites flank DNA sequence coding the RT and/or DNA sequence ing the HR factor, thereby allowing their targeting by the restriction enzyme. Once the bindingween the target molecule and the ligand molecule occurs, restriction enzyme introduces double- nded break at restriction sites that flank DNA sequences encoding the RT and/or the HR factor thereby remove the DNA sequences encoding the RT and/or the HR factor. The restriction yme can be wildtype such as I-SceI, I-CreI and the like or artificial such as Zinc finger leases or meganucleases, especially of the LAGLIDADG family. nother alternative, the method for generating diversity in the gene L can be stopped by using a scription repressor. In this aspect, the B2H further comprises a gene encoding a transcription ressor to the promoter P or P’, and this transcription repressor is capable of stopping or ressing the expression of the DNA sequences encoding the RT and/or the HR factor, therebypping the method for generating diversity in a gene L once the binding between the target ecule and the ligand molecule occurs. Optionally, the repressor under the control of the second moter P’ could be capable of stopping the expression of the DNA sequences encoding the RT /or the HR factor. In other words, the expression of the DNA sequences encoding the RT and/or HR factor can be controlled by the repressor under the control of the second promoter P’. present invention relates to a bacterial cell comprising the above-mentioned components of first module, the second module including HR, the components of the third module, that furthermprises at least one arrest factor of the fourth module in any aspect and uses thereof for leading he arrest of the generation of diversity in a gene L.
present invention relates to a method for screening a ligand molecule capable of binding a et molecule from variants encoded by altered copies of a gene L, comprising any aspect of the viously described steps of the methods implementing Module 3, wherein the B2H system her comprises at least one arrest factor according to the fourth module, preferably a DNA ertase such as the Bxb1 DNA invertase capable of targeting DNA invertase sites that flank A sequences encoding the RT and/or the HR; or a restriction enzyme such as I-SceI capable of oduces double-stranded breaks at restriction sites that flank DNA sequences encoding the RT /or the HR factor and thereby of removing the DNA sequences encoding the RT and/or the HR or; or a transcription repressor capable of stopping or repressing the expression of the DNA uences encoding the RT and/or the HR factor. first aspect, the present invention further relates to a vector or set of vectors as described for dules 1, 2 and 3 and said vector or set of vectors have the following features: sequence encoding a DNA invertase gene operably linked to P6 in the eC4 expression cassette; , NA invertase sites flanking the sequence encoding the RT and/or the HR, respectively in the 1 and eC3 expression cassettes. second aspect, the present invention further relates to a vector or set of vectors as described modules 1, 2 and 3 and said vector or set of vectors have the following features: sequence encoding a restriction enzyme gene operably linked to P6 in the eC4 expression sette; and, striction enzyme sites flanking the sequence encoding the RT and/or the HR, respectively in eC1 and eC3 expression cassettes. third aspect, the present invention further relates to a vector or set of vectors as described for dules 1, 2 and 3 and said vector or set of vectors have the following features: sequence encoding a transcription repressor gene operably linked to P6 in the eC4 expression sette; and, e sequence encoding the RT and/or the HR, respectively in the eC1 and eC3 expression settes can be negatively controlled by the transcription repressor. present invention also relates to a bacterial cell comprising the vector or set of vectors as described above with:
equence encoding a DNA invertase gene operably linked to P6 in the eC4 expression cassette; DNA invertase sites flanking the sequence encoding the RT and/or the HR, respectively in the 1 and eC3 expression cassettes; or . gene encoding a highly specific restriction enzyme (such as SceI) linked to the promoter P6 in eC4 expression cassette, and the eC1 further comprises restriction sites flanking the sequence oding RBD-RT and/or the eC3 further comprises restriction sites flanking the sequence oding HR factor gene; or e eC4 further comprises a sequence encoding a transcription repressor gene operably linked to and the expression of the sequence encoding RBD-RT of the eC1 and/or the sequence encoding factor gene of the eC3 can be stopped or negatively controlled by said transcription repressor e. ferably the vector or the set of vectors is low copy vector. cterial cells present invention relates to a recombinant bacterial cell comprising elements of modules 1, 2, nd 4, of modules 1, 2 and 3 or of modules 1 and 2, in particular the vector or set of vectors as ned in any of the modules 1, 2, 3 and 4. bacterial cell can be any prokaryotic cell suitable for having functional modules 1, 2, 3 or 4. instance, bacterial cells could belong to Escherichia coli, Vibrio natriegens, Bacillus subtilis, illus megaterium, Neisseria lactamica, Salmonella, Klebsiella, Pseudomonas, Caulobacter, zobium and the like. Other bacteria of interest are disclosed in the following publications: Ferre- alles et al, 2013, Microbial Cell Factories, 12, 113; Pharm et al, 2019, Front. Microbiol., 10, icle 1404; Weinstock et al, 2016, Nature Methods, 13, 849-851; Vos et al, 2009, The ISME rnal, 3, 199-208). referred aspects, the bacterial cell is a competent bacterial cell, preferably a competent bacterial suitable for transformation with a vector or set of vectors comprising elements of the modules , 3 or 4. In a more preferred aspect, the competent bacterial cell provides an optimal level of ression from a low number of copies. Competent strains that provides such an advantageous ure are well known to the person skilled in the art, especially among Escherichia coli strains. instance, the competent bacterial cell is derived from the BL21(DE3) strain, DH10B, rionette Clo (Addgene Ref #108251), in particular with the removal of a chloramphenicol stance gene (coding for chloramphenicol resistance protein, SEQ ID NO: 32), or Acella
TM (Zageno, Ref # 36795).
particular aspect, the bacterium has a genotype F- ompT hsdSB(rB - mB - ) gal dcm (DE3) dA ∆recA such as Acella
TM, a genotype F-ompT hsdSB (rB-, mB-) gal dcmrne131 (DE3) such BL21(DE3) Star cells, or a genotype F- mcrA Δ(mrr-hsdRMS-mcrBC) Φ80dlacZΔM15 cX74 endA1 recA1 deoR Δ(ara,leu)7697 araD139 galU galK nupG rpsL λ- Marionette(Δ R) such as a strain derived from Marionette Clo, or MG1655 (ybhB-bioAB)::[lcI857 N(cro- 9)] tetA recJ- sbcB- ΔaraBAD ΔmutS such as strain bMS_453 (kindly provided by Church Lab, vard, MIT). referred aspect, the bacterial cell has an improved plasmid stability. In another preferred aspect, bacterial cell has a reduced endogenous recombination. In a more preferred aspect, the bacterial has both an improved plasmid stability and a reduced endogenous recombination. In preferred ects, the bacterial cell has an increased proliferation rate. stability of oligonucleotides in the bacterial cell can be increased by means referred as servative effectors. Different types of preservative effectors can be used and optionally mbined according to the second module, such as effectors impairing the function of the MMR tem or effectors increasing RNA or DNA stability in the bacterial cell. present invention relates to a bacterial cell at least one preservative effector in any aspect or mbinations thereof and the use thereof for generating diversity in a gene of interest and for reasing the stability of oligonucleotides in the bacterial cell, thereby improving the generation diversity in a gene L. ionally, the bacterial cell has a constitutive or inducible modification improving RNA stability. he bacterial cell, the RNA stability is important to ensure the formation of retrotranscribing mplexes, such as RTC. Preferably, the improved RNA stability of the bacterial cell is due to a uced RNAse activity while sustaining normal growth of the bacterial cell. More preferably, the uced RNAse activity of the bacterial cell is due to mutations on at least one RNAse gene, such ne, pnp, or rnr, that respectively encode the RNAse E, the PnPase and the RNase R (Ikeda et 2011, Molecular Microbiology, 79, 419-432; Lopez et al, 1999, Molecular Microbiology, 33, -199; Bechhofer et al, 2019, Critical Reviews in Biochemistry and Molecular Biology, 54, 242- ). Even more preferably, the mutations on at least one RNAse gene does not alter the normal wth of the bacterial cell. Optionally, the bacterial cell may constitutively express a RNAse E ant defined by the rne131 mutation. present invention relates to a method for generating diversity in a gene L, wherein the bacterial cell further comprises at least one preservative effector capable of impairing RNAse activity such
hlB or a fragment 711-844 of RNAse E, and/or capable of impairing the MMR function such dam, and/or capable of increasing stability of single strand DNA such as mutant ssDNA nuclease. Optionally, the preservative effector capable of increasing the RNA stability can be effector that competes with RNAse E for interaction with the protein Hfq. Indeed, the above- ntioned interaction between RNAse E and the Hfq protein promotes the degradation of Hfq nds RNAs. So strategies that inhibit this interaction can improve Hfq bound RNAs half-life h beneficial effects on cDNA synthesis by reverse-transcription. effector capable of increasing the RNA stability can be an RNA helicase such as rhlB, whose uence corresponds to SEQ ID NO: 61, or can be a fragment 711-844 of RNAse E (SEQ ID NO: (Ikeda et al, 2011, 79, 419-432). Since rhlB interacts with RNAse E at the same epitope ognized by Hfq, the over-expression of rhlB can inhibit the interaction between Hfq and RNAse y competition. ernatively, the effector capable of increasing the RNA stability can be the fragment (711-844) RNAse E. The binding of the RNAse(711-844) peptide to the Hfq protein thus prevents it to ract with the whole functional RNAse E that includes the N-terminal catalytic region. n, the bacterial cell may express constitutively or inductively an RNA helicase such as rhlB or agment 711-844 of RNAse E as detailed above. yet another aspect, alternative or additional, the preservative effector can be an effector that reases the ssDNA strands stability. ionally, the bacterial cell has a constitutive or inducible modification reducing linear DNA radation. Preferably, the reduced linear DNA degradation of the bacterial cell is due to a uced ssDNAse and/or dsDNAse activity of the bacterial cell. More preferably, the reduced Ase activity of the bacterial cell is due to mutations on at least one ssDNA exonuclease gene, h as xonA, recJ, xseA exoX. In particular, the mutant ssDNA exonuclease whose exonuclase ction is reduced or invalidated can be a mutant xonA (such as SEQ ID NO: 64), a mutant xseA ch as SEQ ID NO: 66), a mutant exoX (such as SEQ ID NO: 65), or a mutant recJ (such as SEQ NO: 67) (Mosberg et al, 2012, PLOS One, 7, e44638; Gallagher et al, 2014, Nature Protocols, 2301-2316; Dutra et al, 2007, PNAS, 104, 216-221; Simon et al, 2018, ACS Synth Biol, 7, 0-2611). Generally, the invalidated gene is generated by knockout or by introduction of a OP codon in the coding sequence and/or by introducing a change in the open reading frame. preservative effector can be an effector capable of impairing the function of the MMR system. Optionally, the bacterial cell has a constitutive or inducible modification impairing the MMR
em. Preferably, the impairment of the MMR system of the bacterial cell is due to mutations on MR component genes, such as mutL, mutS, mutH or UvrD, in particular a dominant mutant of tS, a dominant mutant of MutL or a dominant mutant of MutH (Junop et al, 2003, DNA Repair, 87-405; Yang et al, 2004, Molecular Microbiology, 53, 283-295). Alternatively or in addition, impairment of the MMR system of the bacterial cell can be caused by the over-expression of DNA methylase such as dam. Indeed, the over expression of Dam can increase DNA hylation and impair the recognition of neosynthesized cDNA copies of gene L during match repair. Since the decrease in MMR function should also result in higher levels of ations over non-target sites, preservative effectors that impairs the MMR function are ferably over-expressed by transient methods in the bacterial cell. In particular aspects, the terial cell belongs to Nuc5-, EcNR3, or EcM2.1 strains (Gallagher et al, 2014, Nat. Protoc., 9, 1–2316) or TOP10 dXseA/dMutS strain (Simon, Morrow and Ellington, 2018, ACS Synth. l., acssynbio.8b00273). Nuclease invalidated strain can be found among George Church Lab’s ins available at Addgene: addgene.org/search/catalog/bacterial-strains/?q=george+church. preferred aspects, the bacterial cell is capable of over-expressing recombinase, in particular a a recombinase such as lambda phage recombination factors, in particular in an inducible way, instance when the temperature is shifted above 37°C. An example of such a bacterial cell is 380 strain. Alternative recobineering strains, including DY380, can be found at Court lab ombineering website (https://redrecombineering.ncifcrf.gov). cordingly, the bacterial cell may have one or more of the following features: constitutive or ucible improvement in RNA stability, decrease of linear DNA degradation, impairment of the A mismatch repair system, and increased proliferation. mbinations of modules present invention relates to the combination of modules 1 and 2, preferably with the co- alization strategy, modules 1, 2 and 3, optionally with the co-localization strategy, and modules , 3 and 4, optionally with the co-localization strategy. refore, it relates to bacterial cells and/or vectors or set of vectors comprising the elements of se modules as disclosed above. Optionally, all the element can be comprised into the bacterial s. Optionally, some of the elements can be comprised into the bacterial cells and the others on tors or set of vectors. Optionally, all the element can be comprised on vectors or set of vectors. present invention relates to the use of these bacterial cells and/or vectors or set of vectors for generating diversity and selecting variants.
bacterial cells and/or vector or set of vectors can be provided as a kit for generating diversity selecting variants. The present invention relates to this kit, and the use thereof for generatingersity and selecting variants. present invention also relates to a vector or set of vectors comprising the elements as definedow and a bacterial cell comprising this vector or set of vectors or comprising the elements as ned below, the elements being: - a transcription cassette (tC1) comprising a sequence encoding a tpRNA operably linked to a promoter (P1), said tpRNA comprising from 5’ to 3’: a gene L, an RTtag sequence operably linked to the gene L and a SPBM1 sequence, wherein said tC1 is suitable for allowing, in the bacterial cell, the transcription of a tpRNA, wherein the SPBM1 is capable of binding to an SP present in the bacterial cell at a first specific binding site (SPS1); - a transcription cassette (tC2) comprising a sequence encoding a prRNA operably linked to a promoter (P2), said prRNA comprising: an RBM sequence positioned in 5’ end, preferably the RBM of SEQ ID NO: 7, an SPBM2 sequence, preferably the SPBM2 of SEQ ID NO: 18, and an RTprimer, preferably an RTprimer of SEQ ID NO: 13, wherein said tC2 is suitable for allowing, in the bacterial cell, the transcription of a prRNA, wherein the RTprimer is capable of complementary pairing to the RTtag, the SPBM2 is capable of binding to the SP, the sequence encoding the prRNA optionally further comprising a sequence encoding a tRNA sequence contiguously positioned downstream of the RTprimer sequence, a site cleavable by an RNAse of the bacterial cell is present between said tRNA sequence and said RTprimer, thereby allowing the production of a well-defined 3’ prRNA end; - an expression cassette (eC1) comprising a sequence encoding an RBD-RT fusion protein operably linked to a promoter (P3), said RBD-RT comprising a reverse transcriptase (RT) sequence, especially TF1 RT (e.g., of SEQ ID NO: 1), MMLV RT (e.g., SEQ ID NO: 3) or HIV-1 RT (e.g., of SEQ ID NO: 2), and an RBD sequence, preferably an RBD of SEQ ID NO: 5, wherein said eC1 is suitable for allowing, in the bacterial cell, the expression of the RBD-RT fusion protein, wherein the RBD is capable of binding to the RBM of prRNA, - an expression cassette (eC2) comprising a sequence encoding the SP operably linked to a promoter (P4), preferably said SP being the Hfq protein, preferably the Hfq of SEQ ID NO: 15, wherein eC2 is suitable for allowing, in the bacterial cell, the expression of the SP, preferably the Hfq protein, and
- - an expression cassette (eC3) comprising an HR gene operably linked to a promoter (P5), wherein said eC3 is suitable for allowing, in the bacterial cell, the expression of an HR capable of integrating the altered copies of the gene L into a DNA vector or into the genome of the bacterial cell, said vector or genome comprising a copy of the gene L, thereby preserving the altered copies of the gene L from degradation. ionally, the vector or set of vectors or the bacterial cell comprising this vector or set of vectorsher comprises: an expression cassette (eC4) comprising a sequence encoding a reporter gene operably linked to a promoter (P6), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the reporter gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising an insertion site suitable for the insertion of the gene L and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of a FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L; an expression cassette (eC4) comprising a sequence encoding a reporter gene operably linked to a promoter (P6), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising an insertion site suitable for the insertion of the gene L and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the reporter gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of an FPL protein comprising either a ligand
encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L; an expression cassette (eC4) comprising a sequence encoding a reporter gene operably linked to a promoter (P6), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the reporter gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising the gene L and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of a FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L; an expression cassette (eC4) comprising a sequence encoding a reporter gene operably linked to a promoter (P6), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising the gene L and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the reporter gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of an FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L. ionally, the vector or set of vectors or the bacterial cells further comprises: a sequence encoding NA invertase gene operably linked to P6 in the eC4 expression cassette; and DNA invertase s flanking the sequence encoding the RT and/or the HR, respectively in the eC1 and eC3 expression cassettes.
onally, the vector or set of vectors or the bacterial cells further present the following features: eC1 further comprises restriction sites flanking the sequence encoding RBD-RT and/or the eC3 her comprises restriction sites flanking the sequence encoding HR factor gene, and the eC4 her comprises a sequence encoding a restriction enzyme gene operably linked to P6. ionally, the vector or set of vectors or the bacterial cells further present the following features: eC4 further comprises a sequence encoding a transcription repressor gene operably linked to and the expression of the sequence encoding RBD-RT of the eC1 and/or the sequence encoding factor gene of the eC3 can be stopped by said transcription repressor gene. present invention further relates to a vector or set of vectors, said vector or set of vectorsmprising: - a transcription cassette (tC1) comprising a sequence encoding a pre-tpRNA operably linked to a promoter (P1), said pre-tpRNA comprising from 5’ to 3’: an insertion site suitable for the insertion of a gene L, an RTtag sequence, preferably an RTtag of SEQ ID NO: 14, operably linked to the gene L to be inserted and a SPBM1 sequence, preferably a SPBM1 of SEQ ID NO: 17, wherein said tC1 is suitable for allowing, in the bacterial cell, the transcription of a tpRNA including an inserted gene L, wherein the SPBM1 is capable of binding to an SP present in the bacterial cell at a first specific binding site (SPS1); - a transcription cassette (tC2) comprising a sequence encoding a prRNA operably linked to a promoter (P2), said prRNA comprising: an RBM sequence positioned in 5’ end, preferably the RBM of SEQ ID NO: 7, an SPBM2 sequence, preferably the SPBM2 of SEQ ID NO: 18, and an RTprimer, preferably an RTprimer of SEQ ID NO: 13, wherein said tC2 is suitable for allowing, in the bacterial cell, the transcription of a prRNA, wherein the RTprimer is capable of complementary pairing to the RTtag, the SPBM2 is capable of binding to the SP, the sequence encoding the prRNA optionally further comprising a sequence encoding a tRNA sequence contiguously positioned downstream of the RTprimer sequence, a site cleavable by an RNAse of the bacterial cell is present between said tRNA sequence and said RTprimer, thereby allowing the production of a well-defined 3’ prRNA end; - an expression cassette (eC1) comprising a sequence encoding an RBD-RT fusion protein operably linked to a promoter (P3), said RBD-RT comprising a reverse transcriptase (RT) sequence, especially TF1 RT (e.g., of SEQ ID NO: 1), MMLV RT (e.g., SEQ ID NO: 3) or HIV-1 RT (e.g., of SEQ ID NO: 2), and an RBD sequence, preferably an RBD of SEQ ID NO: 5, wherein said eC1 is suitable for allowing, in the bacterial cell, the expression
of the RBD-RT fusion protein, wherein the RBD is capable of binding to the RBM of prRNA, - optionally, an expression cassette (eC2) comprising a sequence encoding the SP operably linked to a promoter (P4), preferably said SP being the Hfq protein, preferably the Hfq of SEQ ID NO: 15, wherein eC2 is suitable for allowing, in the bacterial cell, the expression of the SP, preferably the Hfq protein, and - - an expression cassette (eC3) comprising an HR gene operably linked to a promoter (P5), wherein said eC3 is suitable for allowing, in the bacterial cell, the expression of an HR capable of integrating the altered copies of the gene L into a DNA vector or into the genome of the bacterial cell, said vector or genome comprising a copy of the gene L, thereby preserving the altered copies of the gene L from degradation. ionally, the vector or set of vectors further comprises: an expression cassette (eC4) comprising a sequence encoding a repressor gene operably linked to a promoter (P6), an expression cassette (eC4’) comprising a sequence encoding a reporter gene operably linked to a promoter (P6’), the expression of the reporter gene being negatively controlled by the repressor encoded by (eC4), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the repressor gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising an insertion site suitable for the insertion of the gene L and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of a FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L; an expression cassette (eC4) comprising a sequence encoding a repressor gene operably linked to a promoter (P6),
an expression cassette (eC4’) comprising a sequence encoding a reporter gene operably linked to a promoter (P6’), the expression of the reporter gene being negatively controlled by the repressor encoded by (eC4), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising an insertion site suitable for the insertion of the gene L and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the repressor gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of an FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L; an expression cassette (eC4) comprising a sequence encoding a repressor gene operably linked to a promoter (P6), an expression cassette (eC4’) comprising a sequence encoding a reporter gene operably linked to a promoter (P6’), the expression of the reporter gene being negatively controlled by the repressor encoded by (eC4), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the repressor gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising the gene L and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of a FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L;
an expression cassette (eC4) comprising a sequence encoding a repressor gene operably linked to a promoter (P6), an expression cassette (eC4’) comprising a sequence encoding a reporter gene operably linked to a promoter (P6’), the expression of the reporter gene being negatively controlled by the repressor encoded by (eC4), an expression cassette (eC5) comprising a sequence encoding an FPR protein operably linked to a promoter (P7), said FPR comprising a target domain and transcription subunits (TrSu) capable of recruiting an RNA polymerase, wherein said eC5 is suitable for allowing, in the bacterial cell, the expression of an FPR protein, and an expression cassette (eC6) comprising a sequence encoding an FPL protein operably linked to a promoter (P8), said FPL comprising the gene L and a DBD sequence, said DBD being capable of binding to a site located at proximity of the promoter P6 so as to promote the expression of the repressor gene when the target molecule is bound to a variant encoded by an altered copy of the gene L, wherein said eC6 is suitable for allowing, in the bacterial cell, the expression of an FPL protein comprising either a ligand encoded by the gene L or a variant thereof encoded by an HR-integrated altered copy of gene L. ionally, the vector or set of vectors further comprises: a sequence encoding a DNA invertase e operably linked to P6 in the eC4 expression cassette; and DNA invertase sites flanking the uence encoding the RT and/or the HR, respectively in the eC1 and eC3 expression cassettes. ionally, the vector or set of vectors or the bacterial cells further present the following features: eC1 further comprises restriction sites flanking the sequence encoding RBD-RT and/or the eC3 her comprises restriction sites flanking the sequence encoding HR factor gene, and the eC4 her comprises a sequence encoding a restriction enzyme gene operably linked to P6. ionally, the vector or set of vectors or the bacterial cells further present the following features: eC4 further comprises a sequence encoding a transcription repressor gene operably linked to and the expression of the sequence encoding RBD-RT of the eC1 and/or the sequence encoding factor gene of the eC3 can be stopped by said transcription repressor gene. ionally, some encoding sequences can be arranged in a polycistronic constructs and their ression can be controlled by the same promoter. For instance, the RT, especially RBD-RT and HR can be assembled as a bicistronic construct and their expression can be controlled by the me promoter. The FPL and FPR coding region can also constitute bicistronic constructs trolled by the same promoter. Finally, bi- or polycistronic constructions can be used for generating signals correlated to the interaction between FPL and FPR. Preferentially, fluorescent
uminescent proteins can be coupled to antibiotic resistance markers and/or genes related to the em arrest such as DNA invertases, restriction enzymes or repressors. AMPLES e to the complexity of the 4-module system, the inventors began implementing the modules and ing them in pairs before implementing a complete 4-module system. The four module system chematically disclosed in Figures 1 and 2. ample 1: Test of RT/HR coupling order to test the coupling of RT (reverse transcription) and HR (homologous recombination) dules in a bacterial cell, an artificial biological system implemented in two plasmids was structed (Figure 3A). The first plasmid (Figure 3B, VN591; SEQ ID NO: 37) harbors a dified kanamycin resistance gene with an internal stop codon. Consequently, a truncated (not ctional) protein is produced and does not grant kanamycin resistance to transformed bacterial s (KanOff gene). The second plasmid (Figure 3C, VN575; SEQ ID NO: 38) provides a coding on of a retroviral reverse transcriptase (“MMLV_RT” corresponding to SEQ ID NO: 3, also rred as RT) and a lambda phage recombination factor (“Bet”, also referred as λBet), both mprised in a bicistronic construct. The same plasmid also contains a region which encompasses following segments: a) a segment homologous to the KanOff gene of the first plasmid (VN591) mediately downstream of the stop codon, followed by; b) a group I intron capable of ntaneous self-splicing from RNA in bacterial cells (td intron); c) a segment homologous to the nOff gene, immediately upstream of the stop codon and; d) a sequence corresponding to the erse complement (RTtag) of RNA oligonucleotide (for instance an endogenous small RNA NA) or a designed transcript) that should function as primer for reverse transcription (RT mer). illustrated in Figure 3A (bottom-left part), the transcription of this region generates an RNA uding an internal intron (KanOn precursor). The intron region self-splices, giving rise to an onless RNA product (KanOn RNA) corresponding to the fusion of the homologous regions sent at the extremities of the unprocessed RNA plus the RTtag. The intronless transcript and primer then hybridize by their complementary regions and the RT enzyme synthesizes a mplementary DNA strand (KanOn cDNA) complementary to the flanking regions of the internal p codon present at the KanOff gene. Thus, the free 3’OH of RT primer is used for DNA ymerization and the KanOn RNA is used as a template. When both VN591 and VN575 smids co-transform bacterial cells, the KanOn cDNA produced is used by λBet protein for homologous recombination and the outcome should be the deletion of the internal stop codon on
KanOff gene and, consequently, the rescue of kanamycin resistance. Therefore, only if KanOn NA is generated from intronless RNA molecules by reverse transcription and recombines with nOff gene the corresponding cells can proliferate in the presence of kanamycin, because ombination products involving exclusively the plasmids do not rescue kanamycin resistance. eed, the alternative possibility of direct recombination involving DNA from the plasmids would rescue kanamycin resistance because the intron sequence (about 437 base pairs) generates ertions, several stop codons and a frame-shift that would surely abrogate the expression of a ctional kanamycin resistance gene. test this hypothesis, Acella cells, a BL21(DE3) derived strain that provides better plasmid bility and reduces non lambda factors mediated recombination, were co-transformed with 575 and VN591 (plasmids described in Figure 3B and 3C) and cultivated overnight (37°C, rpm) in SOB medium supplemented with antibiotics (75µg/ml Ampicillin, 25µg/ml oramphenicol). Next, the saturated culture was diluted (1:100) and induced (aTc 100ng/ml; G 100µM). After about 3 hours incubation (37°C, 200rpm), the cells (O.D.=0.5-0.6) were ed in LB-Agar plates containing kanamycin (1mg/ml) and IPTG (100µM) for colony counting sequencing. The kanamycin resistant clones contained exactly the expected final sequence thout stop codon) and no colony was found containing the intron region. Thus, the feasibility he coupling between RT and RH modules was considered validated. It is the first time that the sibility of the reverse transcription using a retroviral reverse transcriptase and no tRNA derived mer structure (or 3' self-primer strategy) as primer is demonstrated in E. coli, therefore, eashing the intracellular use of these enzymes from this requirement. efficiency of the coupling between RT and HR (evaluated by the frequency of selected amycin resistant clones) should rely on several steps and factors including: a) the expression els of RT and λBet; b) the transcription level of the intron containing RNA and its self-splicing ciency; c) the concentration of intracellular oligonucleotides that should function as primer for erse transcription; d) the secondary structure stability of each RNA involved and their half-life; ecognition of dsRNA stretches by the RT and the efficiency of cDNA synthesis; e) degradation RNA strand of the DNA/RNA hybrid; f) the rate of cDNA degradation by intracellular single- nd exonucleases (such as xonA, xseA,exoX and recJ) and; g) λBet (or other annealing protein) moted recombination of the synthesized cDNA (KanOn cDNA) with the target plasmid nOff gene). he assays using intron containing RNAs, the observed frequency (counted colonies/total plated cells) of kanamycin resistant colonies was about 4.02x10
-9 (Figure 10, (1)), demonstrating that
em works but with a low efficiency. Some possible explanations for this can be related to: a) presence of the intron that does not self-splices efficiently in the current context; b) structural ments of the involved RNAs can be stable enough to impairs double-stranded RNA (dsRNA) ealing and, subsequent, reverse transcription; c) the fast turnover of RNAs in Escherichia coli linger and cols, 2003, found that half-life of total mRNA was about 6.8 minutes and some NA have half-life ≤ 2.5 minutes) reduces the probability of reverse transcription complex mation (RNA template, RNA primer and reverse transcriptase); d) insufficient amounts of ealing protein (λBet monomer) is expressed (from the bicistronic RNA including RT enzyme), eby impairing the formation of λBet functional multimers. ample 2: Implementation of the co-localization strategy order to address some of the above-mentioned potential problems, a new system was designed ecruit the kanON RNA, RNA primer and RT enzyme on a scaffold in order to increase involved A half-life, to promote dsRNA annealing, to increase local concentration of the ternary mplex members (RT template, RT primer and RT enzyme) and, consequently, to improve the lihood of cDNA synthesis. The selected scaffold was the Hfq protein. Thoughtfully, in order his recruitment strategy to work, specific RNA secondary structures are required. Thus, the A involved in the complex comprise specific RNA regions either dedicated to interact with the tein scaffold (in some embodiment SPBM1 and SPBM2) or RT interactions (in some bodiment, RBM) (Figure 4). e implementation of this new strategy was tested using DY380 cells that over-expresses lambda ombination factors when the temperature is shifted above 37°C. Cells were co-transformed with nOff plasmid (Figure 3B, VN591) and the new KanOn plasmid (Figure 5B, VN669; SEQ ID : 39) and cultivated overnight (30°C, 200rpm) in SOC medium supplemented with antibiotics µg/ml Ampicillin, 25µg/ml Chloramphenicol). Next, the saturated culture was diluted (1:65), ubated (30°C, 200rpm) for 2 hours (O.D. > 0.1) and induced (aTc 100ng/ml; IPTG 100µM, C for 12 minutes). After about 2 hours, cells were plated in LB-Agar plates containing amycin (1mg/ml) and IPTG (100µM) for colony counting and sequencing. Surprisingly, this w strategy resulted in an improved frequency of 3,12 x10
-6(Figure 10, (2)), more than 750 times re efficient than the former system including td intron and no recruitment of RNAs and RT yme (example 1). The sequencing results indicate that DNA products correspond exactly to the ected sequence. o, the strategy concerning the generation of RT primer could be applied to the intracellular generation of RNAs with defined sequence at 3’. The latter strategy consists in fusing an RNA
on to a tRNA containing a leader sequence that should be split off by a host cell RNAse, such RNAse P (Figure 4C). ample 3: Adaptation of the system to ligand screening using an enhanced Bacterial Two brid (B2H) system ncerning the third module (eB2H), first, the inventors have tested currently available B2Hs cterial two-hybrid systems), such as the one created by the team of Ann Hochschild (Harvard versity, USA; Nickels, 2009) and Rama Ranganathan (Green Center for Systems Biology, A; McLaughlin, 2012). In order to compare them, the original systems were modified in order harmonize the plasmids used: the reporter gene (eGFP, SEQ ID NO: 33) and the complex mation partners (FPL and FPR), thus, the only relevant element differing was the two-hybrid ponsive promoter. Protein-protein interactions (PPIs) with varying strengths, ranging from 3 to 0 nM, were tested to evaluate their signal intensities and their correlation to the affinities. Based he results (Figure 6A), the inventors have noticed that the former does not provide a sufficient ng signal output and the second does not show good correlations between ligand affinities and nal intensity. Therefore, the inventors had to create by rational genetic engineering a new two- rid responsive promoter (Figure 6B) that conciliates stronger genetic output and improved relation between affinity with signal intensity. One promoter variant (epB2H, SEQ ID NO: 24) chosen after comparing some alternatives and the enhanced B2H (eB2H) can provide nificant signals even for µM affinities and robust signal for nM affinities. Moreover, this ponsive promoter correlates well the complex affinity and the signal output (Figure 6A). tests were carried out by co-transforming BL21(DE3) Star cells with plasmids harboring each he promoter variants (respectively, VN520, VN552 and VN550 corresponding to SEQ ID NOs: 42) and the target gene fused to λ cI DNA binding domain (cI-Asf1) plus one of the plasmids taining different rpoA-peptide fusions (rpoA, RNA polymerase alpha subunit). Each peptide racts with Asf1 with varying affinities (VN515_IP1: 8000nM, VN516_IP2: 560nM, 517_IP3: 84nM, VN518_IP4: 3nM, VN519_IP3mutA: no-interaction; corresponding to SEQ NOs: 43-47). Co-transformed cells were cultivated (200 rpm, 37°C, overnight) in LB plemented with ampicillin (75 µg/ml) and chloramphenicol (25 µg/ml), saturated cultures were uted 100X and fresh cultures were cultivated for 2h (37°C, 200rpm). Next, the cultures were uced (20µM IPTG) and grown overnight (20°C, 200rpm). The next day, culture samples were uted in PBS, analyzed by flow cytometry (Millipore Guava easyCyte HT). The mean orescence intensity (MFI) of each sample was calculated and plotted against the reported affinity for each peptide binder (Figure 6A). Thus, it can be noted that, compared to the B2H
ems from Hochschild and Ranganathan, the selected responsive promoter correlates well themplex affinity and the signal output (Figure 6A, “Ramos/martin (2 plasmids)” curve). inventors also created a single vector encompassing all biological elements required for theH system to work, generated a series of derivatives corresponding to the peptides with varying nities that the inventors tested under the same conditions but using only chloramphenicol μg/mL) as antibiotic for selection of transformed cells (VN750_IP1: 8000nM, VN751_IP2: nM, VN752_IP3: 84nM, VN753_IP4: 3nM, VN754_IP3mutA: no-interaction; corresponding SEQ ID NOs: 48-52) (Figure 6A, “Ramos/martin (1 plasmid)” curve). Interestingly, the single tor configuration allows the B2H system to be more sensitive, with higher MFI valuesmpared to the dual plasmid configuration described above. ally, the inventors constructed a series of vectors that indirectly correlate the sensed affinity h the resulting gene expression signal. The signal inversion was obtained by replacing the orter/marker genes in the previous constructions by a repressor (SrpR) that blocks the scription from a promoter (T7-SrprOx2) associated to the expression of the reporter /marker es (Figure 6A, “Ramos/martin reverse (2 plasmids)” curve). The latter vector series wasplemented in a two plasmids setting and tested under the same conditions using ampicillin μg/mL) and chloramphenicol (25μg/mL). Interestingly, since the fluorescence signals are tively high in the low affinity range (10
3 to 10
5 nM), the reverse configuration can be icularly interesting for detecting low affinity bindings. ample 4: B2H optimization addition to the improved responsive promoter, other modifications of the B2H system were oduced in order to decrease the stochastic behavior. I fusion regulation expression of the cI fusion element (comprising the DNA binding domain, DBD), was ulated by the promoter lacUV5 (IPTG induced) and its strong RBS in the plasmid VN1197 Q ID NO: 53). In VN1296 (SEQ ID NO: 54), this promoter and its associated RBS were aced by a strong promoter (pLtetO) associated with a weak RBS. This promoter and this RBS e selected from a library composed of 3 promoters of varying strengths (pLTetO, J23113 and 116) and 24 RBS variants that have been designed using an RBS Library calculator ps://salislab.net/software/RBSLibraryCalculatorSearchMode, containing RBSs from weak to derate strength).
fly, for promoter+RBS selection, Acella strain was transformed with the library and plated in -Agar chloramphenicol containing anhydrotetracycline hydrochloride (aTc, 200ng/ml) and G (250µM). The most fluorescent colonies were inoculated in liquid media for plasmid raction and DNA sequencing. The couple pLTetO+RBS7 was found to be the most prevalent ong the combinations that yield high fluorescence. The RNA transcribed as an output of the bacterial two-hybrid system VN1197, it consisted of a tricistronic construction composed of the following elements: RBS+ URFP + RBS + heme oxygenase + weak RBS + kanamycin resistance. In VN1296, the RNA put was replaced by a simpler version composed by the following elements: weak RBS + amycin resistance. The strain 1197 was tested in Acella while VN1296 was tested in SB33 Strain (having the genome of rionette Clo (Addgene: 108251) with the removal of the chloramphenicol resistance gene). The ome of SB33 is: F- mcrA Δ(mrr-hsdRMS-mcrBC) Φ80dlacZΔM15 ΔlacX74 endA1 recA1 R Δ(ara,leu) 7697 araD139 galU galK nupG rpsL λ- Marionette(ΔCmR). n, the inventors tested the effects of the above-mentioned modifications on the stochastic cts by comparing silent mutations of the wild type sequence (Figure 7). The inventors erved that the use of the strong promoter with a weak RBS allowed a considerable improvement tochastic effect, by reducing the dispersion of enrichment values. ample 5: Adding of a “STOP” module implement the fourth module (diversity generation arrest or “STOP”), a variant of the third dule was implemented in a plasmid similar to VN550 (plasmid VN419; SEQ ID NO: 55) in ch the two-hybrid responsive promoter controls the transcription of a bicistronic RNA sisting in a DNA invertase gene (BxB1) and a fluorescent reporter gene (eGFP) (Figure 8A). he second plasmid (VN376. SEQ ID NO: 56) a DNA region representing the bicistronic RT yme + λ Bet protein coded in one strand and a kanamycin resistance gene coded in the reverse mplementary strand was flanked by DNA invertase sites (“ ^”



A moderate strength promoter
was placed upstream his region and the whole fragment was inserted between two different transcription terminators
(Figure 8B). As these plasmids also contained DBD- target (cI-PDZ) and TrSu-L fusions (rpoA-L; L=G4S or CRIPT (Cysteine-rich PDZ-binding
ide), Bxb1 DNA invertase should be sufficiently expressed only if the hybrid fusions interact PDZ + rpoA-CRIPT), thereby inverting the DNA region between Bxb1 attB and attP sites gure 8C). Consequently, RT enzyme and λ Bet should no longer be expressed and the amycin resistance gene (coding for kanamycin resistance protein, SEQ ID NO: 34) should now expressed
thus allowing the cells to be selected in presence of amycin. The weak RBS (ribosome binding site) controlling Bxb1 translation (Figure 8A) was cted from a library generated using RBS calculator (Salis, Mirsky & Voigt, 2010) containing variants with predicted strengths between 0.099 and 477.818au. For the convenient selection RBS strength, the (Figure 8B) fragment was replaced by ⌶
and RBSs that do not allow inversion when there is no interaction ween hybrid fusions were selected in presence of streptomycin (aaDa encodes for a ptomycin, spectinomycin resistance protein, SEQ ID NO: 35). Then, the sub-library was used elect a new sub-library of RBSs in presence of kanamycin that now allow DNA inversion when fusion proteins interact

test if the evolution arrest mechanism worked as expected, BL21(DE3) Star cells (F-ompT SB (rB-, mB-) galdcmrne131 (DE3)) or Acella (F- ompT hsdSB(rB-mB-)gal dcm (DE3) ndA ΔrecA, BL21(DE3)) were co-transformed with plasmids VN419 (containing cI-PDZ on) and either VN376 or VN405 (respectively: premature stop codon resulting in no fusion tide or CRIPT fusion peptide; corresponding to SEQ ID NO: 56 and 58) and induced cells (as cribed for the third module with enhanced B2H) of the corresponding pairs (no-binding: cI- Z/rpoA-stop or; 800nM affinity: cI-PDZ/rpoA-CRIPT) were obtained in LB-Agar plemented with suitable antibiotics (37°C, overnight). The sequencing results confirm that for onies representing the non-interacting pair (cI-PDZ/rpoA-stop), the DNA region flanked by b1 attB and attP sites is not inverted, in opposition to colonies representing the interaction cI- Z/rpoA-CRIPT. ample 6: Whole system implementation ce the interactions between the couples of interacting modules (RT and HR; eB2H and STOP) e validated, a new implementation was created to unequivocally and conveniently estimate the ciency of the whole system (including the four modules in the same cell, represented in Figure nd Figure 2) by introducing an antibiotic resistance gene (Shble*, invalidated zeocin resistance; le coding for a zeocin resistance protein, SEQ ID NO: 36) between the transcription subunit ( sU, rpoA) and the ligand (SpyTag_D7A, a peptide that interacts with SpyCatcher domain with affinity around 200nM (Zakeri et al., 2012) in the hybrid construction, therefore creating an
nded ligand (Shble*-SpyTag_D7A). Due to the presence of a stop codon in the 5’ region ing the antibiotic resistance sequence (Shble*) that also introduces a frame shift in the wnstream open reading frame, the transformed cells are neither resistant to zeocin nor orescent (plasmid VN1238, Figure 9A; corresponding to SEQ ID NO: 59). If the codon is rectly edited in this set up, the corresponding cells become resistant to zeocin and fluorescent ause the two-hybrid fusions (cI-SpyCatcher, rpoA-Shble-SpyTag_D7A) should now interact efore triggering B2H markers and reporters. For convenience, the generation of diversity (RT pted for co-localization strategy and HR, plasmid VN1228, Figure 9B; corresponding to SEQ NO: 60) and selection of clones related modules (eB2H and STOP, plasmid VN1238, Figure were separately implemented, thereby, the whole system is reconstituted (Figure 9D) by the sformation of cells using both plasmids. estimate the frequency of edited cells due to the action of RT and RH modules, the inventors sformed bMS_453 cells with the whole system composed of four modules (plasmids VN1228 VN1238). Briefly, electrocompetent cells were prepared in room temperature using the tocol described by Tu and cols (2016), transformed cells were recovered in 1mL SOC media incubated for 90 minutes. Next, cells were inoculated in 10mL of LB media supplemented h carbenicillin (75μg/mL), chloramphenicol (25 μg/mL), aTc (200 ng/mL), IPTG (20 μM) and ubated overnight. The cultures were diluted (1:200) and incubated for 6 hours; then a dilution responding to 500 cells (for the calculations, the concentration of 5x10
8 cells/mL was sidered equivalent to O.D.
600nm = 1) was plated in LB-agar supplemented with carbenicillin (75 mL), chloramphenicol (25 μg/mL) and IPTG (20μM) in order to count the number of viable s. Different amounts of cells (5x10
2 to 5x10
6) were plated in LB-agar supplemented with zeocin μg/mL) and IPTG (20 μM) to evaluate the number of edited/evolved cells. All cultures were t at 31°C and liquid cultures were shaked at 190 rpm. number of viable cells plated in zeocin/IPTG media was corrected based on the proportion of onies obtained in Carbenicillin/Chloramphenicol media and the frequency of edited/evolved s was estimated by the ratio between the number of selected cells and the expected number of ble cells that were plated. In opposition to non-edited cells, the majority of selected colonies ocin resistant) exhibited intense green fluorescence indicating that the interaction between rid proteins was appropriately sensed. Selected colonies were sequenced and the results cate that the premature stop codon was reverted and the expression of the invertase protein b1) was sufficient to invert the DNA corresponding to the generation of diversity main ctors (RT+HR) and to activate the expression of the ORF related to Spectinomycin resistance (50 colonies were verified in LB-agar spectinomycin, 50 μg/mL) . Furthermore, the analysis of
nies in solid media without zeocin indicated that fluorescent colonies correspond to about 5% of the population. ample 7: Efficiency test for systems with distinct modules implementations efficiency of different system implementations expressed as edited cell frequencies is ilable in Figure 10. Briefly, cells were transformed with different sets of plasmids allowing erent system implementations, cultured under induction prior to be plated in LB-agar taining antibiotics (kanamycin or zeocin) and counted. Some protocols for cell transformation culture are detailed in previous examples 1 (first bar), 2 (second bar), 4 (third and nineth bar) 6 (sixth bar). restingly, comparison of the different phenotype frequencies provided by different system lementations allows to highlight the respective benefit of various system modules. Firstly, it be noted that the use of nuclease mutated strains (for instance bMS_453), even in the presence he third module (B2H, (3)), significantly increases the phenotype frequency up to 3.08x10
-4, s indicating an improved generation of diversity compared to the system with only first and ond modules implementing the co-localization strategy in cells harboring wild-type nucleases In contrast, this increase in phenotype frequency is less important (5.79x10
-5) for the lementation of the whole system comprising the four modules. This can be explained by an y cessation of the diversity generation process caused by the edition of the stop codon of the le* sequence, that allows the expression of a functional ligand (SpyTag_D7A), thereby wing the expression of the invertase Bxb1 and, consequently, evolution arrest. ddition, the replacement of the HR (λ Bet) by an RNA helicase (rhlB, (4) and (7)) or a DNA hylase (dam, (6) and (8)) leads to relative decreases in phenotype frequency compared to ems implementing three (3) or four (6) modules. This can be explained by the absence of the that significantly reduce the functional coupling between first (RT) and third (B2H) modules, eby reducing the integration of Shble gene variants into the VN1238 plasmid. However, it is resting to note that even in absence of HR, the rhlb and dam effectors, coupled with the B2H dule, induce a significant improvement of phenotype frequency compared to the “naïve” lementation with no co-localization (1). Nevertheless, the use of effectors alone cannotmpensate the absence of HR (respectively, implementations 4 and 5 compared to 3 and; lementations 7 and 8 compared to 6). It is also noticeable that rhlB exhibits better performance dam expression in these cases and can potentially improve the system in the context of HR.
mple 8: Error-rate estimations for TF1 RT order to evaluate the in vivo error profile of TF1 reverse transcriptase, bMS_453 cells were ble transformed with VN1270+VN1269 (system 1) or VN1237+VN1228 (system 2). VN1237 smid were previously described herein as VN1238 and VN1228 has also been previously cribed herein. VN1270 is a derivative of VN1237 B2H single plasmid by replacing the original biotic resistance gene (intended for chloramphenicol selection) by the Bla gene (for ampicillin ction). VN1269 is a modified version of the plasmid described by Schubert et al. (Schubert et bioRxiv 2020.03.05.975441; doi: https://doi.org/10.1101/2020.03.05.975441) which encodes a orampenicol resistance gene and is intended for retron reverse-transcriptase based edition of same locus target by VN1228 (i.e., ShBle Stop that invalidate zeocin resistance). transformed cells were culture in LB containing ampicillin (75 µg/ml) and chloramphenicol µg/ml) (31°C, 190rpm, overnight). Then, fresh dilutions were made from saturated cultures in ml tubes (O.D.
600nm = 0.01, 10 ml) and kept at 31°C for 1 hour and 30 minutes (O.D.
600nm < 0.3) en system 1 was induced by arabinose (50mM) and IPTG (20nM) while system 2 was induced aTc (200 ng/ml) and IPTG (20nM). Next, the cultures were incubated in a thermomixer pendorf) at 42°C, 900rpm, for 14 minutes and put back at 31°C, 190rpm for about 6h and 30 nutes. Finally, 10
8 cells of the obtained culture (O.D.
600nm ~ 3.0) were inoculated into 10 ml of containing zeocin (20 µg/ml) and IPTG (20µM). plasmids were extracted from zeocin resistant cells and used as template for PCR reactions 350 ng for 100µl reactions) designed for the amplification of the targeted region in the B2H smids (i.e. ShBle Stop in VN1237 or VN1270) using Q5 polymerase. The PCR products were rose gel purified and used (0,062 pmol) in a 3-way golden gate reaction (10 µl; NEB, Golden e Assembly Kit BsaI-HF
®v2, E1601S) with 5’ adaptor fragment (0,025 pmol) and 3’ adaptor gment (0,025 pmol).5’ and 3’ fragments contained demultiplexing and UMI (unique molecular ntifier) sequences and required regions for Illumina NGS. Ligated products, were column ified (GeneJET PCR Purification, Thermo, K0701) and PCR amplified using 5’ and 3’ primers, product of the expected size was gel purified and sequenced (2x150 paired-end reads, Illumina VASEQ 6000 platform, NOVOGEN, UK). To decrease sequencing errors, the cDNA targeted on was fully covered by both paired-end reads in order to reconstruct high quality assemblies bioinformatics analysis. This strategy allows the efficient deep sequencing of single molecules rder to improve statistics reliability and to suppress sequencing errors. In the one hand, under described conditions, system 1 (retron based edition) shows 27.35 % of mutated sequences (in other words 72,65 % of the sequences corresponded to the expected product - faithful to the
ented reverse transcription template). In the other hand, system 2 (TF1 RT based using the cribed concepts) resulted in 99,81 % mutated sequences. Focused analysis of the mutated uences indicate higher insertion frequency for system 2 (7,65E-03 insertion per base) compared system 1 (3,25E-05 insertion per base). The majority of these events correspond to “A” ertions in poly-A regions for system 2, which is compatible with previously described TF1 RT file (Kirshenboim et al., Virology. 2007 Sep 30;366(2):263-76. doi: 1016/j.virol.2007.04.002. Epub 2007 May 23. PMID: 17524442). Similar frequencies of ation by nucleotide misincorporation were observed for both systems (System 1: 7,34E-04 tations per base; System 2: 6,37E-04 mutations per base). GS Full amplicon sequence double tagged with unique molecular identifiers (UMIs) TGATACGGCGACCACCGAGATCTACACTAATCTTAACACTCTTTCCCTACACGAC TCTTCCGATCTCGCHHNHHNHATTCGGAAGCTTTCGTTGACTTACGTGATGTAC CAGCCTGAAGTGAAAGAAGAGAAACCAGAGGCGGCCGCAGCCAAGTTGACCAG CAGTTCCGGTGCTCACCGCGCGCGACGTCGCCGGAGCGGTCGAGTTCTGGACCG CGGCTCGGGTTCTCACCTDNDDNDDGCGAGATCGGAAGAGCACACGTCTGAACT AGTCACCTTGCTAGATCTCGTATGCCGTCTTCTGCTTG (SEQ ID NO: 71) MI (Unique Molecular Identifier): the corresponding region is indicated in bold and the region of variable size is underlined (three sequences are expected at this site: CGC, CT or A). IMER BINDING SITE for the amplification of the full DNA fragment for NGS sequencing umina platform) is indicated. each UMI, the constant size region (HHNHHNH or DNDDNDD) corresponds to 3888 uences that can be found fused to 3 different variable regions for a total of 11664 possible MIs. By combining the UMIs at both sides a theoretical diversity of 136048896 is achieved. fragment TGATACGGCGACCACCGAGATCTACACTAATCTTAACACTCTTTCCCTACACGAC TCTTCCGATCTCGCHHNHHNHATTCTGAGACCTTTCCC (SEQ ID NO: 72) fragment GAAAGGTCTCAACCTDNDDNDDGCGAGATCGGAAGAGCACACGTCTGAACTCCA CACCTTGCTAGATCTCGTATGCCGTCTTCTGCTTG (SEQ ID NO: 73) nternal fragment GAAAGGTCTCAATTCGGAAGCTTTCGTTGACTTACGTGATGTACGTCAGCCTGAA GAAAGAAGAGAAACCAGAGGCGGCCGCAGCCAAGTTGACCAGTGCAGTTCCGGT TCACCGCGCGCGACGTCGCCGGAGCGGTCGAGTTCTGGACCGACCGGCTCGGGT TCACCTTGAGACCTTTCCC (SEQ ID NO: 74) > OR2004 : Internal fragment forward primer
GAAAGGTCTCAATTCGGAAGCTTTCGTTGACTTACG (SEQ ID NO: 75)R2005 : Internal fragment reverse primerGAAAGGTCTCAAGGTGAGAACCCGAGCCGG (SEQ ID NO: 76)