WO2024240965A2 - Droplet-based screening method - Google Patents
Droplet-based screening method Download PDFInfo
- Publication number
- WO2024240965A2 WO2024240965A2 PCT/EP2024/077083 EP2024077083W WO2024240965A2 WO 2024240965 A2 WO2024240965 A2 WO 2024240965A2 EP 2024077083 W EP2024077083 W EP 2024077083W WO 2024240965 A2 WO2024240965 A2 WO 2024240965A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- interest
- polynucleotide
- polypeptide
- cells
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N15/00—Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
- G01N15/10—Investigating individual particles
- G01N15/14—Optical investigation techniques, e.g. flow cytometry
- G01N15/1429—Signal processing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1075—Isolating an individual clone by screening libraries by coupling phenotype to genotype, not provided for in other groups of this subclass
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1086—Preparation or screening of expression libraries, e.g. reporter assays
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N15/00—Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
- G01N15/10—Investigating individual particles
- G01N15/14—Optical investigation techniques, e.g. flow cytometry
- G01N15/1456—Optical investigation techniques, e.g. flow cytometry without spatial resolution of the texture or inner structure of the particle, e.g. processing of pulse signals
- G01N15/1459—Optical investigation techniques, e.g. flow cytometry without spatial resolution of the texture or inner structure of the particle, e.g. processing of pulse signals the analysis being performed on a sample stream
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N15/00—Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
- G01N15/10—Investigating individual particles
- G01N15/14—Optical investigation techniques, e.g. flow cytometry
- G01N15/149—Optical investigation techniques, e.g. flow cytometry specially adapted for sorting particles, e.g. by their size or optical properties
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N15/00—Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
- G01N15/10—Investigating individual particles
- G01N2015/1006—Investigating individual particles for cytology
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2500/00—Screening for compounds of potential therapeutic value
- G01N2500/10—Screening for compounds of potential therapeutic value involving cells
Definitions
- the present invention relates to methods for screening a biological-library using a microfluidic chip.
- the invention also relates to nucleic acid sequences, vectors, and host cells which have been isolated and/or generated by the methods of the invention.
- the methods of the invention also relate to identification of polynucleotides of interest and/or cells having desired characteristics.
- Droplets sorted in the chip are sometimes referred to as microdroplets or microencapsulations, they typically have an average diameter of about 20 micrometer and are used as compartments or miniscule reaction vessels. They can contain live microbial cells that are, for example, secreting an enzyme. Additionally or alternatively, the droplets can contain cell extracts that enable the expression of a protein encoded by a polynucleotide of interest.
- the droplets may also contain other components, for example, a fluorogenic enzyme substrate that can reveal the activity of an enzyme.
- the method of the invention is as robust as conventional methods (as demonstrated in Example 3) while showing reduced variability (evidenced by decreased standard deviation as shown in Example 4). Decreased standard deviation facilitates the generation of better performing computational models.
- the method of the invention enables the screening of larger libraries. Screening a higher number of library members allows for the generation of more effective models (as shown in Example 6). The method of the invention also identifies library members that are difficult or impossible to identify using other methods (as shown in Example 7).
- Droplet microfluidics accelerates the speed with which biological screening data can be generated by a factor of ca. 1000 at less than 1% of the cost compared to conventional HTS methods.
- the methods of the invention require a lower number of cells and/or library members in the starting sample without compromising the read-out quality.
- the scoring of the polynucleotide of interest allows a detailed study of the relationship between variants of the polynucleotide of interest and a desired effect, e.g., enzyme activity or enzyme yield.
- the methods of the invention allow a higher resolution of read-outs, i.e., it is possible to differentiate between multiple sub-sets after droplet separation based on multiple threshold values. For example, in some cases it is not the highest binding activity of a polypeptide towards a substrate or inhibitor that is favorable, but a moderate binding activity in a “sweet spot” is preferred.
- variant polynucleotide sequences correlated with positive effects e.g., increased enzyme yield or increased enzyme activity
- variant sequences correlated with less- desired effects e.g., reduced enzyme yield or reduced enzyme activities
- the results obtained with the methods of the invention unexpectedly have a strong correlation with the results of the MTP screening methods.
- using the methods of the instant invention allows to skip or replace the well-establised MTP screening methods which are known to be time- and resource-demanding.
- the methods of the invention have also shown a lower standard deviation (STD) compared to MTP. This lower STD is of particular advantage when using the resulting data as input data for a machine learning model as a lower STD results in a high confidence machine learning model in form of a more precise and robust algorithm.
- STD standard deviation
- the method of the invention allows to identify promising library members from a large library, which members otherwise would not have been identified using conventional screening methods as these conventional methods are limited to smaller libraries only.
- the methods of the invention provide a strategy where droplets are sorted into multiple, at least three, output channels (pools). Then the abundance of each individual sequence in each pool is measured and calculated as a score for each polynucleotide sequence. In this manner, droplet screening technology can be employed for the efficient generation of extensive data sets, while considering each and every polynucleotide sequence present in each pool.
- the scoring, identification and/or sequencing of one or more polynucleotide of interest in one or more output channel can be utilized to train a computational model to obtain further insights about the sequence properties, to improve desired sequence characteristics (e.g., increased yield, and/or increased enzyme activity), and/or to generate synthetic sequences with such improved characteristics.
- desired sequence characteristics e.g., increased yield, and/or increased enzyme activity
- synthetic sequences with such improved characteristics.
- the invention relates to a method for screening a biological library, the method comprising the steps of: a) providing a microfluidic device comprising a droplet sorter (200), the droplet sorter comprising at least three output channels (301 , 302, 303), b) providing an emulsion of droplets comprising a library of polynucleotides of interest and a screenable product, c) determining the amount of screenable product of one or more droplets in the microfluidic device, d) sorting the one or more droplets with the droplet sorter (200) into a receiving output channel of the at least three output channels (301 , 302, 303), wherein the receiving output channel is determined based on the amount of screenable product per droplet, and wherein at least three receiving output channels receive a plurality of droplets comprising an amount of screenable product above and/or below one or more predetermined threshold level, e) identifying one or more polynucleotide of interest present in the at least three
- the invention relates to a host cell comprising in its genome the synthetic polynucleotide of interest generated in additional step h), and/or a polynucleotide of interest identified in step e).
- the invention in a third aspect, relates to a method of producing a polypeptide of interest, the method comprising the steps of cultivating the cell according to the second aspect, under conditions conducive for production of the polypeptide.
- Figure 1 shows a schematic overview of microfluidic device with multiple output channels according to one embodiment of the method of the invention.
- Figure 2 shows the assay responses for 100.000 droplets and the thresholds used for separation into the five output channels (pools 1-5).
- Figure 3 shows the relative abundance for 102 signal peptide variants sorted into five output channels (pools 1-5).
- Figure 4 shows the correlation between the scores of the droplet method of the invention and the MTP assay.
- Figure 5 shows the standard deviation (o) for MTP fermentations (A) and droplet fermentations (B) based on counts and relative protein yield.
- Figure 6 shows the fraction of proline containing sequences amongst the sequences obtained from a MTP screen (A) and a ranking of signal peptide sequences obtained from the same MTP screen (B).
- Figure 8 shows the correlation coefficient between predictions and observed values dependent on the size of the training data.
- Non-limiting examples for DNA sequence variants include a library of wildtype cells comprising native DNA sequence variants.
- the library of polynucleotides of interest is comprised in wildtype cells.
- Non-limiting examples for amino acid variants include a library or purified polypeptide variants, and a library of recombinant cells expressing polypeptide variants.
- machine learning algorithms include, but are not limited to:
- Linear Regression A foundational algorithm that models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data points. It is commonly used for tasks like predicting numerical values, such as housing prices based on factors like square footage and location.
- Decision Trees A method that uses a tree-like structure to make decisions based on multiple conditions. Each node in the tree represents a decision based on a particular feature, eventually leading to a leaf node with the final prediction or classification. Decision trees are employed for tasks like classification, where an algorithm determines the category of an input based on features.
- a random forest model is an advanced machine learning algorithm for diverse applications such as classification, regression, and data analysis.
- the algorithm constructs an ensemble comprising numerous decision trees. Notably, each decision tree is established utilizing a subset of the training dataset and a randomized assortment of input features.
- the distinctive potency of the random forest model stems from its capacity to amalgamate predictions derived from multiple decision trees. This fusion, termed "bagging," engenders augmented accuracy and heightened resilience in contrast to individual trees.
- the random forest model effectively averts overfitting. Consequently, its performance is markedly enhanced in terms of making accurate predictions on novel, previously unseen data instances.
- the random forest model adeptly manages high-dimensional datasets and intricate feature interdependencies, rendering it particularly applicable to intricate real-world predicaments. It is notable for its competence in accommodating missing data values, ensuring sustained accuracy even when confronted with incompleteness within portions of the data.
- Neural Networks Complex algorithms inspired by the structure and function of biological neural networks. They consist of layers of interconnected nodes (neurons) that process and transform data. Deep learning, a subset of neural networks, involves multiple hidden layers and is utilized for tasks like natural language processing, image generation, and autonomous driving.
- Reinforcement Learning An approach where an algorithm learns to make sequences of decisions by interacting with an environment to maximize a cumulative reward. This is often used in robotics, game playing, and autonomous systems.
- Naive Bayes A probabilistic algorithm based on Bayes' theorem that is particularly effective for text classification tasks like spam detection and sentiment analysis.
- PCA Principal Component Analysis
- Generative Adversarial Network A specialized class of machine learning algorithm that involves two neural networks, a generator, and a discriminator, engaged in a competitive process.
- the generator creates synthetic data instances (such as images or text) that resemble real data, while the discriminator evaluates whether a given data instance is real or generated.
- the two networks iteratively refine their performance, with the generator aiming to produce increasingly realistic data and the discriminator improving its ability to differentiate between real and generated data.
- a non-limiting example of a suitable GAN is disclosed in WO2024/133344 (Novozymes A/S).
- ddPCR The term “Droplet Digital PCR” or “ddPCR” refers to an advanced molecular biology technique employed for the precise analysis and quantification of nucleic acids, including DNA and RNA, within a sample. This method represents an innovation over conventional polymerase chain reaction (PCR) methodologies, devised to address the inherent limitations of traditional PCR by enabling accurate measurement and detection of rare target sequences or subtle variations in target concentrations.
- PCR polymerase chain reaction
- the sample containing the target nucleic acid is intelligently subdivided into numerous individual droplets, each operating as an independent reaction compartment. This strategic partitioning step facilitates the isolated amplification of the target nucleic acid, minimizing the potential for amplification biases and interference from non-target molecules.
- the droplets undergo fluorescence-based analysis, determining the presence or absence of the amplified target sequence within each individual droplet.
- Droplet sorter means an arrangement within the microfluidic device which allows the sorting of droplets into three or more output channels, wherein the sorting is based on the amount of screenable product detected in the droplet.
- the sorting is carried out by using one or more sorting means, e.g., electrodes or valves.
- the amount of screenable product of the droplet is detected by one or more sensing means, and communicated to the sorting means, e.g., two or more electrodes.
- expression means any step involved in the production of a polypeptide including, but not limited to, transcription, post-transcriptional modification, translation, post-translational modification, and secretion.
- Expression vector refers to a linear or circular DNA construct comprising a DNA sequence encoding a polypeptide, which coding sequence is operably linked to a suitable control sequence capable of effecting expression of the DNA in a suitable host.
- control sequences may include a promoter to effect transcription, an optional operator sequence to control transcription, a sequence encoding suitable ribosome binding sites on the mRNA, enhancers and sequences which control termination of transcription and translation.
- Extension means an addition of one or more amino acids to the amino and/or carboxyl terminus of a polypeptide, wherein the “extended” polypeptide has enzyme activity.
- fragment means a polypeptide having one or more amino acids absent from the amino and/or carboxyl terminus of the mature polypeptide, wherein the fragment has enzyme activity.
- Fusion polypeptide is a polypeptide in which one polypeptide is fused at the N-terminus and/or the C-terminus of a polypeptide of the present invention.
- a fusion polypeptide is produced by fusing a polynucleotide encoding another polypeptide to a polynucleotide of the present invention, or by fusing two or more polynucleotides of the present invention together.
- Techniques for producing fusion polypeptides are known in the art, and include ligating the coding sequences encoding the polypeptides so that they are in frame and that expression of the fusion polypeptide is under control of the same promoter(s) and terminator.
- Fusion polypeptides may also be constructed using intein technology in which fusion polypeptides are created post-translationally (Cooper et al., 1993, EMBO J. 12: 2575-2583; Dawson et al., 1994, Science 266: 776-779).
- a fusion polypeptide can further comprise a cleavage site between the two polypeptides. Upon secretion of the fusion protein, the site is cleaved releasing the two polypeptides. Examples of cleavage sites include, but are not limited to, the sites disclosed in Martin et al., 2003, J. Ind. Microbiol. Biotechnol. 3: 568-576; Svetina et al., 2000, J.
- heterologous means, with respect to a host cell, that a polypeptide or nucleic acid does not naturally occur in the host cell.
- heterologous means, with respect to a polypeptide or nucleic acid, that a control sequence, e.g., promoter, of a polypeptide or nucleic acid is not naturally associated with the polypeptide or nucleic acid, i.e., the control sequence is from a gene other than the gene encoding the mature polypeptide.
- Host Strain or Host Cell is an organism comprising a polynucleotide of interest.
- exemplary host strains are microorganism cells (e.g., bacteria, filamentous fungi, and yeast) capable of expressing a polypeptide of interest and/or fermenting saccharides, and/or probiotic microorganisms.
- a recomobinant host strain or recombinant host cell is an organism into which an expression vector, phage, virus, or other DNA construct, including a polynucleotide encoding a polypeptide of interest (e.g., an amylase) has been introduced.
- exemplary recombinant host strains are microorganism cells (e.g., bacteria, filamentous fungi, and yeast) capable of expressing the polypeptide of interest and/or fermenting saccharides.
- the term "host cell" includes protoplasts created from cells.
- Isolated means a polypeptide, nucleic acid, cell, or other specified material or component that has been separated from at least one other material or component, including but not limited to, other proteins, nucleic acids, cells, etc.
- An isolated polypeptide, nucleic acid, cell or other material is thus in a form that does not occur in nature.
- An isolated polypeptide includes, but is not limited to, a culture broth containing the secreted polypeptide expressed in a host cell.
- Mature polypeptide means a polypeptide in its mature form following N-terminal and/or C-terminal processing (e.g., removal of signal peptide).
- Mature polypeptide coding sequence means a polynucleotide that encodes a mature polypeptide.
- the microfluidic device comprises a droplet sorter (200), the droplet sorter comprising at least three output channels (301 , 302, 303).
- the microfluidic device also comprises a plurality of liquid inlets and/or liquid inlets.
- the device comprises an incubation chamber (500).
- Native means a nucleic acid or polypeptide naturally occurring in a host cell.
- Nucleic acid encompasses DNA, RNA, heteroduplexes, and synthetic molecules capable of encoding a polypeptide. Nucleic acids may be single stranded or double stranded, and may be chemical modifications. The terms “nucleic acid” and “polynucleotide” are used interchangeably. Because the genetic code is degenerate, more than one codon may be used to encode a particular amino acid, and the present compositions and methods encompass nucleotide sequences that encode a particular amino acid sequence. Unless otherwise indicated, nucleic acid sequences are presented in 5'-to-3' orientation.
- nucleic acid construct means a nucleic acid molecule, either single- or double-stranded, which is isolated from a naturally occurring gene or is modified to contain segments of nucleic acids in a manner that would not otherwise exist in nature or which is synthetic, and which comprises one or more control sequences operably linked to the nucleic acid sequence.
- operably linked means that specified components are in a relationship (including but not limited to juxtaposition) permitting them to function in an intended manner.
- a regulatory sequence is operably linked to a coding sequence such that expression of the coding sequence is under control of the regulatory sequence.
- the polynucleotide of interest encodes a protease.
- Suitable proteases include those of bacterial, fungal, plant, viral or animal origin e.g. microbial or vegetable origin. Microbial origin is preferred. Chemically modified or protein engineered variants are included. It may be an alkaline protease, such as a serine protease or a metalloprotease.
- a serine protease may for example be of the S1 family, such as trypsin, or the S8 family such as subtilisin.
- a metalloproteases protease may for example be a thermolysin from e.g. family M4 or other metalloprotease such as those from M5, M7 or M8 families.
- Serine endopeptidases hydrolyse the substrate N-Succinyl-Ala-Ala-Pro-Phe pnitroanilide.
- the reaction was performed at room temperature at pH 9.0.
- the release of pNA results in an increase of absorbance at 405 nm and this increase is proportional to the enzymatic activity measured against a standard.
- purified means a nucleic acid, polypeptide or cell that is substantially free from other components as determined by analytical techniques well known in the art (e.g., a purified polypeptide or nucleic acid may form a discrete band in an electrophoretic gel, chromatographic eluate, and/or a media subjected to density gradient centrifugation).
- a purified nucleic acid or polypeptide is at least about 50% pure, usually at least about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 91 %, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, about 99.5%, about 99.6%, about 99.7%, about 99.8% or more pure (e.g., percent by weight or on a molar basis).
- a composition is enriched for a molecule when there is a substantial increase in the concentration of the molecule after application of a purification or enrichment technique.
- the term "enriched" refers to a compound, polypeptide, cell, nucleic acid, amino acid, or other specified material or component that is present in a composition at a relative or absolute concentration that is higher than a starting composition.
- the term “purified” as used herein refers to the polypeptide or cell being essentially free from components (especially insoluble components) from the production organism. In other aspects, the term “purified” refers to the polypeptide being essentially free of insoluble components (especially insoluble components) from the native organism from which it is obtained. In one aspect, the polypeptide is separated from some of the soluble components of the organism and culture medium from which it is recovered. The polypeptide may be purified (/.e., separated) by one or more of the unit operations filtration, precipitation, or chromatography.
- the polypeptide may be purified such that only minor amounts of other proteins, in particular, other polypeptides, are present.
- purified as used herein may refer to removal of other components, particularly other proteins and most particularly other enzymes present in the cell of origin of the polypeptide.
- the polypeptide may be "substantially pure", i.e., free from other components from the organism in which it is produced, e.g., a host organism for recombinantly produced polypeptide.
- the polypeptide is at least 40% pure by weight of the total polypeptide material present in the preparation.
- the polypeptide is at least 50%, 60%, 70%, 80% or 90% pure by weight of the total polypeptide material present in the preparation.
- a "substantially pure polypeptide” may denote a polypeptide preparation that contains at most 10%, preferably at most 8%, more preferably at most 6%, more preferably at most 5%, more preferably at most 4%, more preferably at most 3%, even more preferably at most 2%, most preferably at most 1%, and even most preferably at most 0.5% by weight of other polypeptide material with which the polypeptide is natively or recombinantly associated.
- the substantially pure polypeptide is at least 92% pure, preferably at least 94% pure, more preferably at least 95% pure, more preferably at least 96% pure, more preferably at least 97% pure, more preferably at least 98% pure, even more preferably at least 99% pure, most preferably at least 99.5% pure by weight of the total polypeptide material present in the preparation.
- the polypeptide of the present invention is preferably in a substantially pure form i.e., the preparation is essentially free of other polypeptide material with which it is natively or recombinantly associated). This can be accomplished, for example by preparing the polypeptide by well-known recombinant methods or by classical purification methods.
- Recombinant is used in its conventional meaning to refer to the manipulation, e.g., cutting and rejoining, of nucleic acid sequences to form constellations different from those found in nature.
- the term recombinant refers to a cell, nucleic acid, polypeptide or vector that has been modified from its native state.
- recombinant cells express genes that are not found within the native (non-recombinant) form of the cell, or express native genes at different levels or under different conditions than found in nature.
- the term “recombinant” is synonymous with “genetically modified” and “transgenic”.
- Recover means the removal of a polypeptide from at least one fermentation broth component selected from the list of a cell, a nucleic acid, or other specified material, e.g., recovery of the polypeptide from the whole fermentation broth, or from the cell-free fermentation broth, by polypeptide crystal harvest, by filtration, e.g.
- Score In the context of the invention a score is calculated for each of the one or more polynucleotide of interest.
- the score is the sum of products of the normalized relative abundances in each output channel multiplied with the sorting threshold score for the corresponding output channel.
- the score is calculated as described in Example 3.
- Screenable product means a molecule which is detectable by the sensing means (600).
- the screenable product includes but is not limited to fluorescent molecules (e.g., green fluorescent protein (GFP), mCherry, mVenus, DsRed, EGFP, nile red (9-(diethylamino)benzo[a]phenoxazin-5-one), a fluorescent vitamine, DAPI (4’,6- diamidino-2-phenylindole), and BIODIPY), and fluorogenic molecules, e.g. fluorgenic Rhodamine.
- GFP green fluorescent protein
- mCherry mCherry
- mVenus mVenus
- DsRed EGFP
- nile red (9-(diethylamino)benzo[a]phenoxazin-5-one
- DAPI 4,6- diamidino-2-phenylindole
- BIODIPY BIODIPY
- fluorogenic molecules e.g
- the screenable product is added to the emulsion, or is generated from a substrate by a process taking place in the droplet, e.g., during incubation.
- the screenable product is a polypeptide expressed in the droplets.
- the screenable product is a host cell in the droplets.
- the screenable product comprises an absorbing molecule.
- the absorbing molecule comprises para-nitro-anilin (PNA).
- the amount of the screenable product in the droplet may be inversely proportional to the amount of a polypeptide of interest expressed in the droplet, and/or by the host cells, e.g., when the polypeptide of interest binds or degrades the screenable product.
- the amount of the screenable product in the droplet may be proportional to the amount of a polypeptide of interest expressed in the droplet, and/or by the host cells, e.g., when the polypeptide of interest degrades a substrate, which results in formation of the screenable product, or when the screenable product incorporates into the host cells or parts thereof (e.g., host cell membrane, or host cell wall), for example Nile Red.
- the screenable product can thus, for example, be used as a proxy for one or more of the features selected from the list of cell growth, cell division, polypeptide of interest expression, polypeptide of interest binding, polypeptide of interest stability, and polypeptide of interest activity.
- more than one screenable product is present in the droplets, e.g., to determine two or more different features selected from the aforementioned features.
- Sequence identity The relatedness between two amino acid sequences or between two nucleotide sequences is described by the parameter “sequence identity”.
- the sequence identity between two amino acid sequences is determined as the output of “longest identity” using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet. 16: 276-277), preferably version 6.6.0 or later.
- the parameters used are a gap open penalty of 10, a gap extension penalty of 0.5, and the EBLOSUM62 (EMBOSS version of BLOSUM62) substitution matrix.
- the Needle program In order for the Needle program to report the longest identity, the -nobrief option must be specified in the command line.
- the output of Needle labeled “longest identity” is calculated as follows:
- the sequence identity between two polynucleotide sequences is determined as the output of “longest identity” using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, supra) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, supra), preferably version 6.6.0 or later.
- the parameters used are a gap open penalty of 10, a gap extension penalty of 0.5, and the EDNAFULL (EMBOSS version of NCBI NLIC4.4) substitution matrix.
- the nobrief option must be specified in the command line.
- the output of Needle labeled “longest identity” is calculated as follows:
- Signal Peptide A "signal peptide” is a sequence of amino acids attached to the N- terminal portion of a protein, which facilitates the secretion of the protein outside the cell.
- the mature form of an extracellular protein lacks the signal peptide, which is cleaved off during the secretion process.
- Subsequence means a polynucleotide having one or more nucleotides absent from the 5' and/or 3' end of a mature polypeptide coding sequence; wherein the subsequence encodes a fragment having enzyme activity.
- variant means a polypeptide having enzyme activity comprising a man-made mutation, i.e., a substitution, insertion (including extension), and/or deletion (e.g., truncation), at one or more positions.
- a substitution means replacement of the amino acid occupying a position with a different amino acid;
- a deletion means removal of the amino acid occupying a position; and
- an insertion means adding 1-5 amino acids (e.g., 1-3 amino acids, in particular, 1 amino acid) adjacent to and immediately following the amino acid occupying a position.
- Wild-type in reference to an amino acid sequence or nucleic acid sequence means that the amino acid sequence or nucleic acid sequence is a native or naturally- occurring sequence.
- naturally-occurring refers to anything (e.g., proteins, amino acids, or nucleic acid sequences) that is found in nature.
- non-naturally occurring refers to anything that is not found in nature (e.g., recombinant nucleic acids and protein sequences produced in the laboratory or modification of the wild-type sequence).
- the invention relates to a method for screening a biological-library, the method comprising the steps of: a) providing a microfluidic device comprising a droplet sorter (200), the droplet sorter comprising at least three output channels (301 , 302, 303), b) providing an emulsion of droplets comprising a library of polynucleotides of interest, and a screenable product, c) determining the amount of screenable product of one or more droplets in the microfluidic device , d) sorting the one or more droplets with the droplet sorter (200) into a receiving output channel of the at least three output channels (301 , 302, 303), wherein the receiving output channel is determined based on the amount of screenable product per droplet, and wherein at least three receiving output channels receive a plurality of droplets comprising an amount of screenable product above and/or below one or more predetermined threshold level, e) identifying one or more polynucleotide of interest present in the steps of: a
- the emulsion of droplets comprises one or more host cells.
- each host cell comprises one or more polynucleotide of interest of the library of polynucleotides of interest.
- each droplet comprises at most one host cell, or a plurality of host cells derived from the same parent host cell.
- each droplet comprises at most one polynucleotide of interest.
- the screenable product is produced by the host cells.
- the screenable product is catalyzed by an enzyme, preferably the enzyme is encoded by the polynucleotide of interest.
- the screenable product is encoded by the one or more polynucleotide of interest.
- the screenable product is produced by a polypeptide expressed by the host cells.
- the screenable product is produced by a polypeptide encoded by the one or more polynucleotide of interest.
- the screenable product is a polypeptide expressed by the host cells.
- the screenable product is an enzyme
- the enzyme is expressed by the host cells.
- the enzyme is selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, betagalactosidase, glucoamylase, alpha-glucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic enzyme, peroxidase, phytase, polyphenoloxidase, prote
- the screenable product is degraded by the host cells.
- the screenable product is degraded by the polypeptide encoded by the one or more polynucleotide of interest.
- the screenable product is degraded by a polypeptide expressed by the host cells.
- the screenable product is an enzyme substrate, preferably for an enzyme selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, beta-galactosidase, glucoamylase, alphaglucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic enzyme, peroxidase, phyta
- the amount of screenable product is inversely proportional for one or more of cell number, cell growth, cell division, cell viability, or cell growth rate.
- the amount of screenable product is proportional for one or more of cell number, cell growth, cell division, cell viability, or cell growth rate.
- the screenable product comprises or consists of one or more host cells.
- the screenable product comprises or consists of substantially all the host cells in a droplet.
- the score is proportional, e.g., normalized, to the number of identical DNA sequences for a first polynucleotide of interest present in an output channel.
- the score is the total number of identical DNA sequences for a first polynucleotide of interest present in an output channel.
- the score is proportional, e.g., normalized, to the number of identical DNA sequences for a second polynucleotide of interest present in an output channel.
- the microfluidic device comprises an incubation zone (500).
- the incubation does not take place in the microfluidic chip.
- the incubation takes place on and/or in the microfluidic device.
- the droplet sorter comprises one or more sensing means (600), preferably located downstream of the incubation zone (500), and/or upstream of the sorting means (401 , 402).
- the one or more sensing means (600) comprises a fluorescence sensor.
- the one or more sensing means (600) comprises an absorption sensor.
- the one or more sensing means (600) comprises an image sensor, e.g., a CMOS sensor, or a CCD sensor, or a PMT sensor.
- an image sensor e.g., a CMOS sensor, or a CCD sensor, or a PMT sensor.
- the one or more sensing means (600) comprises a NEMS (nanoelectromechanical system) sensor.
- the one or more sensing means (600) comprises a mass analyzer suitable for mass spectrometry, e.g. a quadrupole mass analyzer, a TOF mass analyzer, an ion trap mass analyzer, an orbitrap mass analyzer, a magnetic sector mass analyzer, a Q-TOF mass analyser, or a FT-ICR mass analyzer.
- a mass analyzer suitable for mass spectrometry e.g. a quadrupole mass analyzer, a TOF mass analyzer, an ion trap mass analyzer, an orbitrap mass analyzer, a magnetic sector mass analyzer, a Q-TOF mass analyser, or a FT-ICR mass analyzer.
- step e) comprises DNA amplification of the one or more polynucleotide of interest within each output channel.
- the DNA amplification is a PCR method.
- the one or more polynucleotide of interest is identified by a DNA barcode.
- the droplet sorter (200) comprises one or more sorting means (401 , 402).
- the one or more sorting means comprises at least two electrodes.
- the one or more sorting means consists of one electrode.
- the one or more sorting means consists of two electrodes.
- the biological library comprises or consists of wild-type cells with different genotype and/or different phenotype.
- the biological library comprises or consists of recombinant cells.
- the biological library encodes different variants of the same polypeptide of interest, preferably the polypeptide of interest is an enzyme.
- the biological library encodes different signal peptide variants.
- the biological library encodes different promoter variants.
- the biological library comprises different codon-optimized DNA sequences encoding the same amino acid sequence of a polypeptide of interest, e.g., a signal peptide, and/or an enzyme.
- the polynucleotide of interest encodes a polypeptide of interest.
- the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest encoding a variant of a polypeptide of interest.
- the polynucleotide of interest comprises a first polynucleotide of interest encoding a control sequence, and a second polynucleotide of interest encoding a polypeptide of interest.
- the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest encoding a variant of a control sequence.
- control sequence is a promoter sequence, a signal peptide, a leader sequence, a polyadenylation sequence, a propeptide sequence, or a transcription terminator.
- the polynucleotide of interest comprises a first polynucleotide of interest encoding a signal peptide, and a second polynucleotide of interest encoding a polypeptide of interest, wherein the first polynucleotide of interest is operatively linked to the second polynucleotide of interest and located upstream of the second polynucleotide of interest.
- the polynucleotide of interest comprises a first polynucleotide of interest comprising a promoter sequence, and a second polynucleotide of interest encoding a polypeptide of interest, wherein the first polynucleotide of interest is operatively linked to the second polynucleotide of interest and located upstream of the second polynucleotide of interest.
- the biological library comprises identical second polynucleotides of interest, and a plurality of variants of the first polynucleotides of interest.
- the biological library comprises identical first polynucleotides of interest, and a plurality of variants of the second polynucleotides of interest.
- the first polynucleotide of interest is heterologous to the second polynucleotide of interest.
- the first polynucleotide of interest is endogenous to the second polynucleotide of interest.
- the one or more polynucleotide of interest comprises a promoter, a polynucleotide encoding a signal peptide, a polynucleotide encoding a polypeptide of interest, or a native host cell gene.
- the polynucleotide of interest is substantially the whole genome of the host cell.
- the one or more polynucleotide of interest is heterologous to the host cell.
- the one or more polynucleotide of interest is endogenous to the host cell.
- the first polynucleotide of interest is heterologous to the host cell.
- the first polynucleotide of interest is endogenous to the host cell.
- the second polynucleotide of interest is heterologous to the host cell.
- the second polynucleotide of interest is endogenous to the host cell.
- the first and second polynucleotide of interest are heterologous to the host cell.
- the first and second polynucleotide of interest are endogenous to the host cell.
- the one or more polynucleotide of interest encodes a polypeptide of interest.
- the polypeptide of interest is an enzyme, a nanobody, an antibody, an antibody-fragment, a fluorescent polypeptide, e.g., GFP, or an alpha-lactalbumin.
- the amount of screenable product in the droplet is proportional to the amount of a polypeptide encoded by the one or more polynucleotide of interest.
- the amount of screenable product in the droplet is inversely proportional to the amount of a polypeptide encoded by the one or more polynucleotide of interest.
- the biological library comprises at least 100 different one or more polynucleotides of interest, at least 200 different one or more polynucleotides of interest, at least 500 different one or more polynucleotides of interest, at least 1 000 different one or more polynucleotides of interest, at least 2 000 different one or more polynucleotides of interest, at least 3 000 different one or more polynucleotides of interest, at least 5 000 different one or more polynucleotides of interest, at least 10 000 different one or more polynucleotides of interest, at least 100 000 different one or more polynucleotides of interest, at least 1 000 000 different one or more polynucleotides of interest, at least 10 000 000 different one or more polynucleotides of interest, at least 50 000 000 different one or more polynucleotides of interest, or at least 100 000 000 different polynucleotides of interest.
- the biological library comprises at least 100 different host cells, at least 200 different host cells, at least 500 different host cells, at least 1 000 different host cells, at least 2 000 different host cells, at least 3 000 different host cells, at least 5 000 different host cells est, at least 10 000 different host cells, at least 100 000 different host cells, at least 200 000 different host cells, at least 500 000 different host cells, at least 1 000 000 different host cells, at least 5 000 000 different host cells, at least 10 000 000 different host cells, or at least 100 000 000 different host cells.
- the amount of screenable product in the droplet is proportional to one or more of: stability of the polypeptide of interest, transcription of the polypeptide of interest, translation of the polypeptide of interest, secretion of the polypeptide of interest, yield of the polypeptide of interest, binding strength of the polypeptide of interest to a target molecule, and activity of the polypeptide of interest.
- the amount of screenable product in the droplet is inversely proportional to one or more of: stability of the polypeptide of interest, transcription of the polypeptide of interest, translation of the polypeptide of interest, secretion of the polypeptide of interest, yield of the polypeptide of interest, binding strength of the polypeptide of interest to a target molecule, and activity of the polypeptide of interest.
- the amount of screenable product in the droplet is proportional to one or more of: cell number, viability of the host cell, cell division rate of the host cell, cell growth rate of the host cell, cell size of the host cell, and protein secretion of the host cell.
- the amount of screenable product in the droplet is inversely proportional to one or more of: cell number, viability of the host cell, cell division rate of the host cell, cell growth rate of the host cell, cell size of the host cell, and protein secretion of the host cell.
- the one or more droplets comprise a substrate.
- the substrate comprises or consists of the screenable product.
- the substrate is a fluorescent substrate.
- the substrate is a fluorogenic Rhodamine.
- the substrate is a fluorochrome.
- the substrate is a fluorogenic substrate.
- the substrate comprises a fluorophore, e.g., fluorescein, or fluorescein- labelled starch. In one embodiment, the substrate is Nile red.
- the substrate is DAPI (4’,6-diamidino-2-phenylindole).
- each droplet before the optional incubation, comprises an average occupation of at most 0.01 cells, at most 0.02 cells, at most 0.03 cells, at most 0.04 cells, at most 0.05 cells, at most 0.06 cells, at most 0.07 cells, at most 0.08 cells, at most 0.09 cells, at most 0.1 cells, at most 0.2 cells, at most 0.3 cells, at most 0.4 cells, at most 0.5 cells, at most 0.6 cells, or at most 0.7 cells; preferably at most 0.1 cells.
- each droplet comprises an average occupation of at most 0.01 polynucleotide of interest, at most 0.02 polynucleotide of interest, at most 0.03 polynucleotide of interest, at most 0.04 polynucleotide of interest, at most 0.05 polynucleotide of interest, at most 0.06 polynucleotide of interest, at most 0.07 polynucleotide of interest, at most 0.08 polynucleotide of interest, at most 0.09 polynucleotide of interest, at most 0.1 polynucleotide of interest, at most 0.2 polynucleotide of interest, at most 0.3 polynucleotide of interest, at most 0.4 polynucleotide of interest, at most 0.5 polynucleotide of interest, at most 0.6 polynucleotide of interest, or at most 0.7 polynucleotide of interest; preferably at most 0.1 polynucleotide of interest.
- the droplet sorting is facilitated by an acoustic wave generated by one or more acoustic wave generators (401 , 402) adjacent to the droplet sorter.
- the droplet sorting is facilitated by a local pressure change generated by one or more pressure-controlled outlets (401 , 402) adjacent to the droplet sorter, e.g., wherein the one or more pressure-controlled outlets are comprised in one or more output channel.
- the amount of screenable product in step c) is determined using a fluorescence-based signal, absorbance, Raman spectroscopy, mass spectrometry (MS), or MALDI-MS.
- a relative and/or an absolute amount of the screenable product per droplet is determined by the one or more sensing means (600).
- one or more output channels comprise at least 10 000 droplets, at least 50 000 droplets, at least 100 000 droplets, at least 500 000 droplets, at least 1 000 000 droplets, at least 2 000 000 droplets, at least 5 000 000 droplets, at least 10 000 000 droplets, or at least 100 000 000 droplets.
- the droplet sorter comprises at least four output channels, at least five output channels, at least six output channels, at least seven output channels, at least 8 output channels, at least 9 output channels, or at least 10 output channels.
- the host cell is is a yeast host cell, e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell.
- the host cell is a filamentous fungal host cell, e.g., an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Fili basidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell, in particular, an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus
- the host cell is a prokaryotic host cell, e.g., a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Enterococcus, Geobacillus, Lactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gramnegative bacteria selected from the group consisting of Campylobacter, E.
- a prokaryotic host cell e.g., a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Enterococcus, Geobacillus, Lactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gramnegative bacteria selected from the group consisting of Campylobacter, E.
- coli Flavobacterium, Fusobacterium, Helicobacter, llyobacter, Neisseria, Pseudomonas, Salmonella, and Ureaplasma cells, such as Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, Bacillus thuringiensis, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp.
- Bacillus alkalophilus Bacillus amyloliquefaciens
- Bacillus brevis Bacillus circulans, Bac
- the host cell is Bacillus subtilis.
- the host cell is Bacillus licheniformis.
- the host cell is Trichoderma reesei.
- the host cell is Aspergillus niger.
- the host cell is Aspergillus oryzae.
- the host cell is a Bifidobacterium, e.g., Bifidobacterium animalis, or Bifidobacterium animalis subsp. lactis. It is envisioned that the method of the present invention may employ a library of isolated polynucleotides and in vitro expression systems, as well as recombinant cells, and/or wildytpe cells. However, a preferred embodiment of the invention is a setup comprising host cells, i.e., wherein the library is comprised within a host cell.
- each individual polynucleotide of the library is in its own separate host cell.
- each droplet in step (b) of the first aspect comprises at most a single host cell, which optionally can be incubated to grow into a plurality of cells before determining the amount of screenable product in step c).
- the substrate for the one or more enzyme is fluorogenic and the activity of the enzyme converts the fluorogenic substrate into a fluorescent product (screenable product).
- the polynucleotide library member inside the droplet needs to be identified.
- the polynucleotide is identified through DNA sequencing.
- the polynucleotide may also have been outfitted with an identifying sequence tag to serve as a "bar code" when the library was constructed, thus obviating the need for sequencing. Based on the identification of the bar-code, the DNA sequence of the polynucleotide would then immediately be known and it would, thus, be identified.
- the one or more polynucleotide of interest is identified in step e) by DNA sequencing of the one or more polynucleotide of interest.
- the aliquotes are usually much smaller in volume than the droplets, but they may in principle range in size up to the same volume as the droplets or even larger. In the examples below, the aliquotes are significantly smaller than the droplets.
- microfluidic devices that enable the application of an electric field to merge or coalesce two or more droplets are disclosed, for example, in WO 2007/061448.
- Another way to introduce small aliquotes of an aqueous liquid into an aqueous droplet in a microfluidic device is known as "pico-injection" and is disclosed, for example, in WO 2010/151776.
- the aliquotes were introduced into the droplets by merging or coalescing the aliquotes and the droplets through the application of an electric field.
- the aliquotes are introduced into the droplets by merging or coalescing the aliquotes and the droplets through the application of an electric field or by injection.
- Figure 1 shows one embodiment of the invention, wherein the device comprises a droplet sorter (200) with five output channels (301 , 302, 303, 304, 305), and with an incubation zone (500).
- the device furthermore comprises sensing means (600) and two electrodes (401 , 402).
- Droplets comprising host cells and screenable product are shown in circles. Schematically, the amount of screenable product present in each droplet is represented by a black filling. Schematically, the amount of black color is proportional to the amount of screenable product present in each droplet.
- Flow directed from the incubation zone (500) to the output channels (301 , 302, 303, 304, 305) allows droplets to pass the sensing means (600) which determines the amount of screenable product in each droplet (step c)).
- the sensing means (600) communicates the amount of screenable product to the electrodes (401 , 402). Based on the information about the amount of screenable product in each droplet, the electrodes apply an electric field which allows sorting of the droplet into one of the five output channels (step d)).
- droplets with high amount of screenable product are sorted into the top output channel (305) and collected in pool 5, while droplets with no/low amount of screenable product are sorted into the lowest output channel (301) and collected in pool 1.
- droplets with intermediate amounts of screenable product are sorted into the remaining three output channels (302, 303, and 304) and collected in pools 2-4.
- the design with three or more output channels allows parallel sorting into multiple output channels, using multiple predetermined threshold values, wherein no sample volume is lost.
- the methods of the present invention utilize biological libraries of variants (amino acid sequences, DNA sequences and/or host cell variants), but also enable the generation of synthetic variant sequences based on the read-out of the method.
- synthetic sequence variants are generated by substitution, deletion or addition of one or several amino acids (for polypeptide variants) or one or several nucleotides (for DNA sequence variants).
- the polypeptide variant is derived from a mature polypeptide by substitution, deletion or addition of one or several amino acids.
- the number of amino acid substitutions, deletions and/or insertions introduced into the polypeptide is up to 15, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, or 15.
- amino acid changes may be of a minor nature, that is conservative amino acid substitutions or insertions that do not significantly affect the folding and/or activity of the protein; small deletions, typically of 1-30 amino acids; small amino- or carboxyl-terminal extensions, such as an amino-terminal methionine residue; a small linker peptide of up to 20-25 residues; or a small extension that facilitates purification by changing net charge or another function, such as a poly-histidine tract, an antigenic epitope or a binding module.
- Essential amino acids in a polypeptide can be identified according to procedures known in the art, such as site-directed mutagenesis or alanine-scanning mutagenesis (Cunningham and Wells, 1989, Science 244: 1081-1085). In the latter technique, single alanine mutations are introduced at every residue in the molecule, and the resultant molecules are tested for enzyme activity to identify amino acid residues that are critical to the activity of the molecule. See also, Hilton et al., 1996, J. Biol. Chem. 271 : 4699-4708.
- the active site of the enzyme or other biological interaction can also be determined by physical analysis of structure, as determined by such techniques as nuclear magnetic resonance, crystallography, electron diffraction, or photoaffinity labeling, in conjunction with mutation of putative contact site amino acids. See, for example, de Vos et al., 1992, Science 255: 306-312; Smith et al., 1992, J. Mol. Biol. 224: 899-904; Wlodaver et al., 1992, FEBS Lett. 309: 59-64.
- the identity of essential amino acids can also be inferred from an alignment with a related polypeptide, and/or be inferred from sequence homology and conserved catalytic machinery with a related polypeptide or within a polypeptide or protein family with polypeptides/proteins descending from a common ancestor, typically having similar three- dimensional structures, functions, and significant sequence similarity.
- protein structure prediction tools can be used for protein structure modelling to identify essential amino acids and/or active sites of polypeptides. See, for example, Jumper et al., 2021 , “Highly accurate protein structure prediction with AlphaFold”, Nature 596: 583-589.
- Single or multiple amino acid substitutions, deletions, and/or insertions can be made and tested using known methods of mutagenesis, recombination, and/or shuffling, followed by a relevant screening procedure, such as those disclosed by Reidhaar-Olson and Sauer, 1988, Science 241 : 53-57; Bowie and Sauer, 1989, Proc. Natl. Acad. Sci. USA 86: 2152-2156; WO 95/17413; or WO 95/22625.
- DNA-variant sequences are derived by substitution, deletion or addition of one or several nucleic acids.
- the polynucleotide may also be mutated by introduction of nucleotide substitutions that do not result in a change in the amino acid sequence of the polypeptide, but which correspond to the codon usage of the host organism intended for production of the enzyme, or by introduction of nucleotide substitutions that may give rise to a different amino acid sequence.
- nucleotide substitutions see, e.g., Ford et al., 1991 , Protein Expression and Purification 2: 95-107.
- DNA sequences for the library design may be obtained from microorganisms of any genus.
- polypeptide sequences comprising e.g., an enzyme, a signal peptide, or a nanobody may be obtained from microorganisms of any genus.
- the term “obtained from” as used herein in connection with a given source shall mean that the polypeptide encoded by a polynucleotide is produced by the source or by a strain in which the polynucleotide of the invention has been inserted.
- the polypeptide obtained from a given source is secreted extracellularly.
- the invention encompasses both the perfect and imperfect states, and other taxonomic equivalents, e.g., anamorphs, regardless of the species name by which they are known. Those skilled in the art will readily recognize the identity of appropriate equivalents.
- the polypeptides may be identified and obtained from other sources including microorganisms isolated from nature (e.g., soil, composts, water, etc.) or DNA samples obtained directly from natural materials (e.g., soil, composts, water, etc.) using the above-mentioned probes. Techniques for isolating microorganisms and DNA directly from natural habitats are well known in the art. A polynucleotide encoding the polypeptide may then be obtained by similarly screening a genomic DNA or cDNA library of another microorganism or mixed DNA sample.
- the polynucleotide can be isolated or cloned by utilizing techniques that are known to those of ordinary skill in the art (see, e.g., Davis et al., 2012, Basic Methods in Molecular Biology, Elsevier). Screening a biological library comprising control sequences
- the present invention also relates to screening a biological library, wherein the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest comprising a first polynucleotide of interest encoding a control sequence, and a second polynucleotide of interest encoding a polypeptide of interest.
- the biological library comprises a plurality of variants of the control sequence.
- the second polynucleotide of interest is operably linked to one or more control sequences (first polynucleotide of interest) that direct the expression of the second polynucleotide of interest in a suitable host cell under conditions compatible with the control sequences.
- control sequence may be manipulated in a variety of ways to provide for expression of the polypeptide of interest, and/or to create a control sequence library. Manipulation of the control sequence prior to its insertion into a vector may be desirable or necessary depending on the expression vector. Techniques for modifying the control sequences utilizing recombinant DNA methods are well known in the art.
- the control sequence may be a promoter, a polynucleotide that is recognized by a host cell for expression of a polynucleotide encoding a polypeptide of the present invention.
- the promoter contains transcriptional control sequences that mediate the expression of the polypeptide.
- the promoter may be any polynucleotide that shows transcriptional activity in the host cell including mutant, truncated, and hybrid promoters, and may be obtained from genes encoding extracellular or intracellular polypeptides either homologous or heterologous to the host cell.
- Suitable promoters for directing transcription of the polynucleotide of the present invention in a bacterial host cell are described in Sambrook et al. , 1989, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Lab., NY, Davis et al., 2012, supra, and Song et al., 2016, PLOS One 11(7): e0158447.
- the control sequence may also be a transcription terminator, which is recognized by a host cell to terminate transcription.
- the terminator is operably linked to the 3’-terminus of the polynucleotide encoding the polypeptide. Any terminator that is functional in the host cell may be used in the present invention.
- Preferred terminators for bacterial host cells may be obtained from the genes for Bacillus clausii alkaline protease (aprH), Bacillus licheniformis alpha-amylase (amyL), and Escherichia coli ribosomal RNA (rrnB).
- aprH Bacillus clausii alkaline protease
- AmyL Bacillus licheniformis alpha-amylase
- rrnB Escherichia coli ribosomal RNA
- Preferred terminators for filamentous fungal host cells may be obtained from Aspergillus or Trichoderma species, such as obtained from the genes for Aspergillus niger glucoamylase, Trichoderma reesei beta-glucosidase, Trichoderma reesei cellobiohydrolase I, and Trichoderma reesei endoglucanase I, such as the terminators described in Mukherjee et al., 2013, “Trichoderma: Biology and Applications”, and by Schmoll and Dattenbdck, 2016, “Gene Expression Systems in Fungi: Advancements and Applications”, Fungal Biology.
- Preferred terminators for yeast host cells may be obtained from the genes for Saccharomyces cerevisiae enolase, Saccharomyces cerevisiae cytochrome C (CYC1), and Saccharomyces cerevisiae glyceraldehyde-3-phosphate dehydrogenase.
- Other useful terminators for yeast host cells are described by Romanos et al., 1992, Yeast 8: 423-488.
- control sequence may also be an mRNA stabilizer region downstream of a promoter and upstream of the coding sequence of a gene which increases expression of the gene.
- mRNA stabilizer regions are obtained from a Bacillus thuringiensis crylllA gene (WO 94/25612) and a Bacillus subtilis SP82 gene (Hue etal., 1995, J. Bacteriol. 177: 3465-3471).
- mRNA stabilizer regions for fungal cells are described in Geisberg et al., 2014, Cell 156(4): 812-824, and in Morozov et al., 2006, Eukaryotic Ce// 5(11): 1838-1846.
- the control sequence may also be a leader, a non-translated region of an mRNA that is important for translation by the host cell.
- the leader is operably linked to the 5’-terminus of the polynucleotide encoding the polypeptide. Any leader that is functional in the host cell may be used. Suitable leaders for bacterial host cells are described by Hambraeus et al., 2000, Microbiology 146(12): 3051-3059, and by Kaberdin and Blasi, 2006, FEMS Microbiol. Rev. 30(6): 967-979.
- Preferred leaders for filamentous fungal host cells may be obtained from the genes for Aspergillus oryzae TAKA amylase and Aspergillus nidulans triose phosphate isomerase.
- Suitable leaders for yeast host cells may be obtained from the genes for Saccharomyces cerevisiae enolase (ENO-1), Saccharomyces cerevisiae 3-phosphoglycerate kinase, Saccharomyces cerevisiae alpha-factor, and Saccharomyces cerevisiae alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase (ADH2/GAP).
- ENO-1 Saccharomyces cerevisiae enolase
- Saccharomyces cerevisiae 3-phosphoglycerate kinase Saccharomyces cerevisiae alpha-factor
- Saccharomyces cerevisiae alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase ADH2/GAP
- the control sequence may also be a polyadenylation sequence, a sequence operably linked to the 3’-terminus of the polynucleotide which, when transcribed, is recognized by the host cell as a signal to add polyadenosine residues to transcribed mRNA. Any polyadenylation sequence that is functional in the host cell may be used.
- Preferred polyadenylation sequences for filamentous fungal host cells are obtained from the genes for Aspergillus nidulans anthranilate synthase, Aspergillus niger glucoamylase, Aspergillus niger alpha-glucosidase, Aspergillus oryzae TAKA amylase, and Fusarium oxysporum trypsin-like protease.
- control sequence may also be a signal peptide coding region that encodes a signal peptide linked to the N-terminus of a polypeptide and directs the polypeptide into the cell’s secretory pathway.
- the 5’-end of the coding sequence of the polynucleotide may inherently contain a signal peptide coding sequence naturally linked in translation reading frame with the segment of the coding sequence that encodes the polypeptide.
- the 5’-end of the coding sequence may contain a signal peptide coding sequence that is heterologous to the coding sequence.
- a heterologous signal peptide coding sequence may be required where the coding sequence does not naturally contain a signal peptide coding sequence.
- a heterologous signal peptide coding sequence may simply replace the natural signal peptide coding sequence to enhance secretion of the polypeptide. Any signal peptide coding sequence that directs the expressed polypeptide into the secretory pathway of a host cell may be used.
- Effective signal peptide coding sequences for bacterial host cells are the signal peptide coding sequences obtained from the genes for Bacillus NCIB 11837 maltogenic amylase, Bacillus licheniformis subtilisin, Bacillus licheniformis beta-lactamase, Bacillus stearothermophilus alphaamylase, Bacillus stearothermophilus neutral proteases (nprT, nprS, nprM), and Bacillus subtilis prsA. Further signal peptides are described by Freudl, 2018, Microbial Cell Factories 17: 52.
- Effective signal peptide coding sequences for filamentous fungal host cells are the signal peptide coding sequences obtained from the genes for Aspergillus niger neutral amylase, Aspergillus niger glucoamylase, Aspergillus oryzae TAKA amylase, Humicola insolens cellulase, Humicola insolens endoglucanase V, Humicola lanuginosa lipase, and Rhizomucor miehei aspartic proteinase, such as the signal peptide described by Xu etal., 2018, Biotechnology Letters 40: 949-955
- Useful signal peptides for yeast host cells are obtained from the genes for Saccharomyces cerevisiae alpha-factor and Saccharomyces cerevisiae invertase. Other useful signal peptide coding sequences are described by Romanos et al., 1992, supra.
- the control sequence may also be a propeptide coding sequence that encodes a propeptide positioned at the N-terminus of a polypeptide.
- the resultant polypeptide is known as a proenzyme or propolypeptide (or a zymogen in some cases).
- a propolypeptide is generally inactive and can be converted to an active polypeptide by catalytic or autocatalytic cleavage of the propeptide from the propolypeptide.
- the propeptide coding sequence may be obtained from the genes for Bacillus subtilis alkaline protease (aprE), Bacillus subtilis neutral protease (nprT), Myceliophthora thermophila laccase (WO 95/33836), Rhizomucor miehei aspartic proteinase, and Saccharomyces cerevisiae alpha-factor.
- the propeptide sequence is positioned next to the N-terminus of a polypeptide and the signal peptide sequence is positioned next to the N-terminus of the propeptide sequence.
- the polypeptide may comprise only a part of the signal peptide sequence and/or only a part of the propeptide sequence.
- the final or isolated polypeptide may comprise a mixture of mature polypeptides and polypeptides which comprise, either partly or in full length, a propeptide sequence and/or a signal peptide sequence.
- regulatory sequences that regulate expression of the polypeptide relative to the growth of the host cell.
- regulatory sequences are those that cause expression of the gene to be turned on or off in response to a chemical or physical stimulus, including the presence of a regulatory compound.
- Regulatory sequences in prokaryotic systems include the lac, tac, and trp operator systems.
- yeast the ADH2 system or GAL1 system may be used.
- filamentous fungi the Aspergillus n/gerglucoamylase promoter, Aspergillus oryzae TAKA alpha-amylase promoter, and Aspergillus oryzae glucoamylase promoter, Trichoderma reesei cellobiohydrolase I promoter, and Trichoderma reesei cellobiohydrolase II promoter may be used.
- Other examples of regulatory sequences are those that allow for gene amplification. In fungal systems, these regulatory sequences include the dihydrofolate reductase gene that is amplified in the presence of methotrexate, and the metallothionein genes that are amplified with heavy metals.
- the control sequence may also be a transcription factor, a polynucleotide encoding a polynucleotide-specific DNA-binding polypeptide that controls the rate of the transcription of genetic information from DNA to mRNA by binding to a specific polynucleotide sequence.
- the transcription factor may function alone and/or together with one or more other polypeptides or transcription factors in a complex by promoting or blocking the recruitment of RNA polymerase.
- Transcription factors are characterized by comprising at least one DNA-binding domain which often attaches to a specific DNA sequence adjacent to the genetic elements which are regulated by the transcription factor.
- the transcription factor may regulate the expression of a protein of interest either directly, i.e., by activating the transcription of the gene encoding the protein of interest by binding to its promoter, or indirectly, i.e., by activating the transcription of a further transcription factor which regulates the transcription of the gene encoding the protein of interest, such as by binding to the promoter of the further transcription factor.
- Suitable transcription factors for fungal host cells are described in WO 2017/144177.
- Suitable transcription factors for prokaryotic host cells are described in Seshasayee et al., 2011 , Subcellular Biochemistry 52: 7- 23, as well in Balleza et al., 2009, FEMS Microbiol. Rev. 33(1): 133-151.
- the method of the present invention also utilizes recombinant expression vectors comprising a polynucleotide of interest.
- the various nucleotide and control sequences may be joined together to produce a recombinant expression vector that may include one or more convenient restriction sites to allow for insertion or substitution of the polynucleotide of interest at such sites.
- the polynucleotide may be expressed by inserting the polynucleotide or a nucleic acid construct comprising the polynucleotide into an appropriate vector for expression.
- the coding sequence is located in the vector so that the coding sequence is operably linked with the appropriate control sequences for expression.
- the recombinant expression vector may be any vector (e.g., a plasmid or virus) that can be conveniently subjected to recombinant DNA procedures and can bring about expression of the polynucleotide.
- the choice of the vector will typically depend on the compatibility of the vector with the host cell into which the vector is to be introduced.
- the vector may be a linear or closed circular plasmid.
- the vector may be an autonomously replicating vector, i.e., a vector that exists as an extrachromosomal entity, the replication of which is independent of chromosomal replication, e.g., a plasmid, an extrachromosomal element, a minichromosome, or an artificial chromosome.
- the vector may contain any means for assuring self-replication.
- the vector may be one that, when introduced into the host cell, is integrated into the genome and replicated together with the chromosome(s) into which it has been integrated.
- a single vector or plasmid or two or more vectors or plasmids that together contain the total DNA to be introduced into the genome of the host cell, or a transposon may be used.
- the vector preferably contains one or more selectable markers that permit easy selection of transformed, transfected, transduced, or the like cells.
- a selectable marker is a gene the product of which provides for biocide or viral resistance, resistance to heavy metals, prototrophy to auxotrophs, and the like.
- the vector preferably contains at least one element that permits integration of the vector into the host cell's genome or autonomous replication of the vector in the cell independent of the genome.
- the vector may rely on the polynucleotide’s sequence encoding the polypeptide or any other element of the vector for integration into the genome by homologous recombination, such as homology-directed repair (HDR), or non- homologous recombination, such as non-homologous end-joining (NHEJ).
- homologous recombination such as homology-directed repair (HDR), or non- homologous recombination, such as non-homologous end-joining (NHEJ).
- HDR homology-directed repair
- NHEJ non-homologous end-joining
- the vector may further comprise an origin of replication enabling the vector to replicate autonomously in the host cell in question.
- the origin of replication may be any plasmid replicator mediating autonomous replication that functions in a cell.
- the term “origin of replication” or “plasmid replicator” means a polynucleotide that enables a plasmid or vector to replicate in vivo.
- More than one copy of a polynucleotide of interest may be inserted into a host cell to increase production of a polypeptide. For example, 2 or 3 or 4 or 5 or more copies are inserted into a host cell.
- An increase in the copy number of the polynucleotide can be obtained by integrating at least one additional copy of the sequence into the host cell genome or by including an amplifiable selectable marker gene with the polynucleotide where cells containing amplified copies of the selectable marker gene, and thereby additional copies of the polynucleotide, can be selected for by cultivating the cells in the presence of the appropriate selectable agent.
- the invention relates to a host cell comprising in its genome a polynucleotide sequence of interest generated in additional step h), and/or a polynucleotide sequence identified in step e).
- the present invention also relates to host cells which are not recombinant, i.e. , wild type host cells.
- host cells include but are not limited to probiotics, e.g. wherein the host cell is a Bifidobacterium, e.g., Bifidobacterium animalis, or Bifidobacterium animalis subsp. lactis.
- the present invention also relates to recombinant host cells, comprising a polynucleotide of interest, and/or comprising a polynucleotide operably linked to one or more control sequences that direct the production of a polypeptide of interest.
- a construct or vector comprising a polynucleotide is introduced into a host cell so that the construct or vector is maintained as a chromosomal integrant or as a self-replicating extra- chromosomal vector as described earlier.
- the choice of a host cell will to a large extent depend upon the gene encoding the polypeptide and its source.
- the polypeptide can be native or heterologous to the recombinant host cell.
- at least one of the one or more control sequences can be heterologous to the polynucleotide encoding the polypeptide.
- the recombinant host cell may comprise a single copy, or at least two copies, e.g., three, four, five, or more copies of the polynucleotide of the present invention.
- the host cell may be any mammalian cell useful in the recombinant production of a polypeptide of interest, e.g., a Chinese hamster ovary cell, a BHK cell, a mouse cell, a HEK cell.
- the host cell may be any microbial cell useful in the recombinant production of a polypeptide of interest, e.g., a prokaryotic cell or a fungal cell.
- the prokaryotic host cell may be any Gram-positive or Gram-negative bacterium.
- Grampositive bacteria include, but are not limited to, Bacillus, Bifidobacteria, e.g. BB-12®, Clostridium, Enterococcus, Geobacillus, Lactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, and Streptomyces.
- Gram-negative bacteria include, but are not limited to, Campylobacter, E. coli, Flavobacterium, Fusobacterium, Helicobacter, llyobacter, Neisseria, Pseudomonas, Salmonella, and Ureaplasma.
- the bacterial host cell may be any Bacillus cell including, but not limited to, Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, and Bacillus thuringiensis cells.
- the Bacillus cell is a Bacillus amyloliquefaciens, Bacillus licheniformis and Bacillus subtilis cell.
- Bacillus classes/genera/species shall be defined as described in Patel and Gupta, 2020, Int. J. Syst. Evol. Microbiol. 70: 406-438.
- the bacterial host cell may also be any Streptococcus cell including, but not limited to, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp. Zooepidemicus cells.
- the bacterial host cell may also be any Streptomyces cell including, but not limited to, Streptomyces achromogenes, Streptomyces avermitilis, Streptomyces coelicolor, Streptomyces griseus, and Streptomyces lividans cells.
- Methods for introducing DNA into prokaryotic host cells are well-known in the art, and any suitable method can be used including but not limited to protoplast transformation, competent cell transformation, electroporation, conjugation, transduction, with DNA introduced as linearized or as circular polynucleotide. Persons skilled in the art will be readily able to identify a suitable method for introducing DNA into a given prokaryotic cell depending, e.g., on the genus. Methods for introducing DNA into prokaryotic host cells are for example described in Heinze et al., 2018, BMC Microbiology 18:56, Burke et al., 2001 , Proc. Natl. Acad. Sci. USA 98: 6289-6294, Choi et al., 2006, J. Microbiol. Methods 64: 391-397, and Donald et al., 2013, J. Bacteriol. 195(11): 2612- 2620.
- the host cell may be a fungal cell.
- “Fungi” as used herein includes the phyla Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota as well as the Oomycota and all mitosporic fungi (as defined by Hawksworth et al., In, Ainsworth and Bisby’s Dictionary of The Fungi, 8th edition, 1995, CAB International, University Press, Cambridge, UK).
- Fungal cells may be transformed by a process involving protoplast-mediated transformation, Agrobacterium-mediated transformation, electroporation, biolistic method and shock-wave-mediated transformation as reviewed by Li et al., 2017, Microbial Cell Factories 16: 168 and procedures described in EP 238023, Yelton et al., 1984, Proc. Natl. Acad. Sci. USA 81 : 1470-1474, Christensen et al., 1988, Bio/TechnologyQ: 1419-1422, and Lubertozzi and Keasling, 2009, Biotechn. Advances 27: 53-75.
- any method known in the art for introducing DNA into a fungal host cell can be used, and the DNA can be introduced as linearized or as circular polynucleotide.
- the fungal host cell may be a yeast cell.
- yeast as used herein includes ascosporogenous yeast (Endomycetales), basidiosporogenous yeast, and yeast belonging to the Fungi Imperfecti (Blastomycetes). For purposes of this invention, yeast shall be defined as described in Biology and Activities of Yeast (Skinner, Passmore, and Davenport, editors, Soc. App. Bacteriol. Symposium Series No. 9, 1980).
- the yeast host cell may be a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell.
- the yeast host cell is a Pichia or Komagataella cell, e.g., a Pichia pastoris cell (Komagataella phaffii).
- the fungal host cell may be a filamentous fungal cell.
- “Filamentous fungi” include all filamentous forms of the subdivision Eumycota and Oomycota (as defined by Hawksworth et al., 1995, supra).
- the filamentous fungi are generally characterized by a mycelial wall composed of chitin, cellulose, glucan, chitosan, mannan, and other complex polysaccharides. Vegetative growth is by hyphal elongation and carbon catabolism is obligately aerobic. In contrast, vegetative growth by yeasts such as Saccharomyces cerevisiae is by budding of a unicellular thallus and carbon catabolism may be fermentative.
- the filamentous fungal host cell may be an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Fili basidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell.
- the filamentous fungal host cell is an Aspergillus, Trichoderma or Fusarium cell. In a further preferred embodiment, the filamentous fungal host cell is an Aspergillus niger, Aspergillus oryzae, Trichoderma reesei, or Fusarium venenatum cell.
- the filamentous fungal host cell may be an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Bjerkandera adusta, Ceriporiopsis aneirina, Ceriporiopsis caregiea, Ceriporiopsis gilvescens, Ceriporiopsis pannocinta, Ceriporiopsis rivulosa, Ceriporiopsis subrufa, Ceriporiopsis subvermispora, Chrysosporium inops, Chrysosporium keratinophilum, Chrysosporium lucknowense, Chrysosporium merdarium, Chrysosporium pannicola, Chrysosporium queenslandicum, Chrysosporium tropicum, Chrysosporium zona
- the host cell is isolated.
- the host cell is purified.
- the present invention also relates to methods of producing a polypeptide of interest, comprising (a) cultivating a cell according to the second aspect, under conditions conducive for production of the polypeptide; and optionally, (b) recovering the polypeptide.
- the cell is a Bacillus cell.
- the cell is a Bacillus licheniformis cell.
- the cell is an Aspergillus cell.
- the cell is an Aspergillus niger cell.
- the cell is an Aspergillus oryzae cell.
- the cell is an Trichoderma reesei cell.
- the present invention also relates to methods of producing a host cell broth, comprising (a) cultivating a host cell according to the second aspect, under conditions conducive for production of the host cell; and optionally, (b) recovering the host cell.
- the recovered cell is a Bifidobacterium, e.g., Bifidobacterium animalis, or Bifidobacterium animalis subsp. lactis.
- the host cell is cultivated in a nutrient medium suitable for production of the polypeptide using methods known in the art.
- the cell may be cultivated by shake flask cultivation, or small-scale or large-scale fermentation (including continuous, batch, fed-batch, or solid-state, and/or microcarrier-based fermentations) in laboratory or industrial fermentors in a suitable medium and under conditions allowing the polypeptide to be expressed and/or isolated.
- suitable media are available from commercial suppliers or may be prepared according to published compositions (e.g., in catalogues of the American Type Culture Collection). If the polypeptide is secreted into the nutrient medium, the polypeptide can be recovered directly from the medium. If the polypeptide is not secreted, it can be recovered from cell lysates.
- the polypeptide may be detected using methods known in the art that are specific for the polypeptide, including, but not limited to, the use of specific antibodies, formation of an enzyme product, disappearance of an enzyme substrate, or an assay determining the relative or specific activity of the polypeptide.
- the polypeptide may be recovered from the medium using methods known in the art, including, but not limited to, collection, centrifugation, filtration, extraction, spray-drying, evaporation, or precipitation. In one aspect, a whole fermentation broth comprising the polypeptide is recovered. In another aspect, a cell-free fermentation broth comprising the polypeptide is recovered.
- the polypeptide may be purified by a variety of procedures known in the art to obtain substantially pure polypeptides and/or polypeptide fragments (see, e.g., Wingfield, 2015, Current Protocols in Protein Science’, 80(1): 6.1.1-6.1.35; Labrou, 2014, Protein Downstream Processing, 1129: 3-10).
- polypeptide is not recovered.
- the invention relates to the methods according to the first aspect, additionally comprising step g) training a computational model, e.g., machine learning algorithm, with sequence data obtained from step e) and/or score data obtained from step f).
- a computational model e.g., machine learning algorithm
- the computational model of step g) is selected from the list of a linear regression, a decision tree, a random forest model, a support vector machine (SVM), a neural network, a K-means clustering, a native Bayes, a Gaussian mixture model (GMM), or a generative model.
- SVM support vector machine
- GMM Gaussian mixture model
- the computational model is performed in an electronic device, for providing a candidate biological sequence, the method comprising:
- the model is a generative model.
- the generative model is non-unidirectional.
- the input biological sequence comprises one or more polynucleotide of interest identified in step e).
- the input biological sequence is one or more of: an amino acid sequence of a polypeptide of interest, a nucleic acid sequence encoding a polypeptide of interest, a control sequence, e.g., an expression control sequence, and a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
- the candidate biological sequence is one or more of: a control sequence, e.g., an expression control sequence, a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, an amino acid sequence of a polypeptide of interest, and a nucleic acid sequence encoding a polypeptide of interest.
- the input biological sequence is an amino acid sequence of a polypeptide of interest and/or a nucleic acid sequence encoding a polypeptide of interest
- the candidate biological sequence is a nucleic acid sequence increasing compatibility with a host cell.
- the input biological sequence is an amino acid sequence of a polypeptide of interest and/or a nucleic acid sequence encoding a polypeptide of interest
- the candidate biological sequence is a control sequence, e.g., an expression control sequence, and/or a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
- the input biological sequence is a control sequence, e.g., an expression control sequence, and/or a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, and wherein the candidate biological sequence is a nucleic acid sequence encoding a polypeptide of interest.
- the model is a generative model.
- the generative model is non-unidirectional.
- the generative model is one or more of: a generative adversarial network model, a Wasserstein generative adversarial network model, a diffusion model, and a variational autoencoder.
- applying the generative non-unidirectional model to the input data comprises partitioning the generative non-unidirectional model into a plurality of generators, wherein each generator of the plurality of generators is configured to determine, based on the input data, one or more candidate biological sequences for a subset of nucleotides and/or a subset of amino acids and a predetermined criterion.
- determining the candidate biological sequence by applying the model to the input data comprises: predicting, using the generator, a compatibility of the candidate biological sequence with the host cell;
- the predetermined criterion is based on one or more of:
- the method comprises training the model based on a training set of biological sequences, wherein the training set of biological sequences includes training data indicative of one or more biological sequences related to the host cell.
- the training set of biological sequences is heterologous to the genus of the host cell, preferably heterologous to one or more species of the host cell.
- the training data comprises training input data indicative of one or more of: an amino acid sequence of a polypeptide of interest, a nucleic acid sequence encoding a polypeptide of interest, a control sequence, e.g., an expression control sequence, and a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
- the training data comprises training output data indicative of one or more of: a control sequence, e.g., an expression control sequence, a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, an amino acid sequence of a polypeptide of interest, and a nucleic acid sequence encoding a polypeptide of interest.
- training the model comprises predicting, using a discriminator taking as input the training set of biological sequences, and a training candidate biological sequence, a score indicative of the training candidate biological sequence being a referenced biological sequence
- the method is comprising obtaining, from a test environment data repository, experimental data associated with the candidate biological sequence and the host cell; wherein the experimental data indicates a yield performance of the candidate biological sequence associated with the host cell.
- the method is comprising validating the candidate biological sequence based on the experimental data.
- the method is comprising selecting one or more generators based on the experimental data.
- the method is comprising adapting the model based on the experimental data.
- obtaining input data indicative of an input biological sequence comprises obtaining the input data for the input biological sequence from a database and/or a memory of the electronic device.
- the invention also relates to an electronic device comprising a memory circuitry, a processor circuitry, and an interface, wherein the electronic device is configured to perform any of the methods according to the invention.
- the invention also relates to computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device cause the electronic device to perform any of the methods of the invention.
- the method is comprising additional step h) generating one or more synthetic polynucleotide of interest based on the output of the computational model.
- the one or more synthetic polynucleotide generated in step h) comprises or consists of a candidate biological sequence.
- the one or more synthetic polynucleotide of interest generated in step h) is codon-optimized.
- the one or more synthetic polynucleotide of interest generated in step h) encodes a polypeptide with increased substrate binding, increased receptor binding, increased substrate specificity, increased specific activity, and/or increased stability.
- the one or more synthetic polynucleotide of interest generated in step h) results increased expression of a polypeptide of interest.
- the one or more synthetic polynucleotide of interest generated in step h) comprises a control sequence.
- the biological sequence pairs can be found in a database, such as a public database and/or a private database (such as National Center for Biotechnology Information NCBI database and/or a Nucleotide Archive e.g. EMBL). It may be envisaged to transfer the extracted learning to the experimental settings.
- a database such as a public database and/or a private database (such as National Center for Biotechnology Information NCBI database and/or a Nucleotide Archive e.g. EMBL). It may be envisaged to transfer the extracted learning to the experimental settings.
- the present disclosure allows learning compatibility rules from native sequence pairs provided by a database and providing the learned compatibility rules. It may be envisaged that the compatibility rules are further adapted to experimental settings.
- the present disclosure allows some interaction between machine learning approaches and experimental approaches.
- Machine learning-based analysis of biological sequences and experimental screening approaches contribute with two different layers of learnings.
- the first layer of learnings allows extraction of complex biological rules that need to be obeyed while the second layer of learning accumulates data specific to the experimental settings.
- the learning extracted is used to provide (using a Deep Learning approach, such a generative model) a relevant subset of candidate biological sequences that can now be feasibly screened using experimental methods.
- the disclosed technique may lead to unlocking the potential of experimental screening approaches by markedly reducing the complexity of the process of finding satisfactory ‘partner’ biological sequences.
- the actual quality of a candidate biological sequence is validated by experiments.
- the learnings can be used as feedback to update the model (thus ‘informing’ the model about the quality of the suggestions).
- the present disclosure provides a method, performed by an electronic device, for providing a candidate biological sequence.
- the method can be a computer-implemented method.
- the method comprises obtaining input data indicative of an input biological sequence, e.g. from a biological library with diverse sequences.
- the input data can be associated with the input biological sequence and/or be representative of the input biological sequence.
- the input data comprises data representative of the input biological sequence, such as data representative of one or more properties of the input biological sequence.
- the one or more properties of the input data include one or more of: a sequence of amino acids, a sequence of nucleic acids, a three-dimensional structure of the input biological polypeptide sequence (e.g. obtained by Alpha-Fold2), a folding of the input biological sequence, and a pairing of nucleic acids.
- the method comprises determining the candidate biological sequence by applying a model to the input data, e.g. generative model.
- the candidate biological sequence is determined for compatibility with a host cell, e.g., targeting compatibility with a given host cell, and/or for increasing compatibility with the given host cell.
- the model applied to the input data aims at increasing one or more expression steps for a polypeptide of interest in a host cell, e.g. increasing or modifying one or more of: transcription, post-transcriptional modification, translation, post-translational modification, folding, secretion, phenotypic trait, and yield for a polypeptide of interest in a host cell.
- Yield may be intra-cellular and/or extra-cellular. In other words, yield may be seen as a target performance parameter to optimise when determining the candidate biological sequence. It may be noted that yield may be optimized via various steps, such as modified secretion, modified transcription, modified translation, separately or jointly.
- the model generates, based on the input data, the candidate biological sequence.
- the candidate biological sequence may be determined based on one or more of: score generated in step f), host cell data, input data, and information indicating the type of biological sequence to be determined as candidate biological sequence.
- the strain library comprises 102 different signal peptide-encoding polynucleotides which were ordered as synthetic DNA, and fused upstream to a protease-encoding DNA sequence (encoding a serine endopeptidase).
- the library was transformed into Bacillus licheniformis strain MOL3320 as described in patent US 2019/0185847 A1. Selection was done on ERM. The resulting strains expressed the protease with different signal peptide variants.
- the generated strains were then fermented compartmentalized in 50 pL droplets made from nutrient controlled media in fluorinated oil (HFE 7500) on a microfluidic droplet production chip.
- the droplets were stabilized with 2 wt% fluorosurfactant (008-Fluorosurfactant, RanBiotech).
- the resulting emulsion was incubated in a collection vial at 37°C for 4 days.
- the serine endopeptidase secreted by the host cells hydrolyses a proprietary fluorogenic rhodamine substrate.
- fluorescent rhodamine substrates include Rhodamine 110-bis-(succinoyl-L-alanyl-L-alanyl-L- prolyl-L-phenylalanyl amide) (CPC Scientifc Inc., San Jose, CA). The substrate was added to each droplet on a microfluidic chip and after 4 minutes of incubation, the fluorescent assay response was measured which is shown in Fig. 2.
- the release of Rhodamine 110 resulted in an increase of fluorescence at 520 nm. The increase is proportional to the enzymatic activity measured against a standard.
- the measured level of fluorescence signal is directly related to the concentration of protease in each droplet. Using the device shown in Fig.
- each droplet was sorted into one of the five output channels depending on its measured fluorescence level.
- the five output channels were connected to five collection tubes and after collection of at least 1000 droplets in each tube, we seperated the collection tubes from the microfluidic device.
- the collected cell pools were named Pool 1 , Pool 2, Pool 3, Pool 4, and Pool 5 (see Fig. 1).
- the signal peptide sequences upstream of the protease-encoding DNA sequence contained in each pool were amplified via PCR, and were subsequently sent for DNA sequencing.
- Pool 1 comprised empty droplets (peak at around 2500 RFU) and droplets with no or very weak activity.
- sorting the library into five pools allows to investigate each signal peptide variant according to the protease activity of the related droplet.
- each library member is analyzed, providing an analysis of the complete library, without loosing data about one or more library members as each signal peptide coding sequence will be sequenced in the subsequent step after sorting is performed (see example 2).
- the strains were fermented for app. 120 hours and protease activity was measured at the end of fermentation.
- the five droplet sorting pools are ordered according to the fluorescence thresholds used for separation (Pool 5 contains droplets measured with the highest fluorescence signals, and Pool 1 with the lowest fluorescence signals).
- SP sequences with intermediate protease activites were found in Pool 2 (mid-low protease activities) and in Pool 4 (mid-high protease activities). Sorting into more than two pools, e.g., into 5 pools as shown in this example, increased the output resolution and allows to identify not only the very best or worst performers, but also to identify sequences which lay inbetween. With regards to signal peptides, for example, such approach is particularly beneficial when aiming for fine-tuned expression of a polypeptide of interest.
- the screening method allowed to efficiently screen the complete library whilst sorting the library members into five pools, allowing a detailed analysis and understanding of each library member.
- This example validates the results of the multi-channel sorted SP library (examples 1 and 2) against the results obtained from cultivating the same SP library in a MTP-format.
- a score is calculated for a given sequence.
- the fraction of the corresponding reads in a pool is determined by diving the number of reads of the given sequence by the total number of reads obtained when sequencing the entire pool. This is done for every pool generated in the experiment.
- the relative proportions of the given sequence in each of the pools is calculated across all pools.
- the score of the given sequence is calculated by summing up the multiplication products of the relative proportions with the corresponding selection thresholds for each pool.
- a score was obtained for each signal peptide sequence cultivated in MTP-format, based on the protease activity shown for each sequence.
- the multi-channel droplet sorting of the invention represents an improved and substantially cheaper screening method, saving both time and sample volume, whilst providing a high resolution output when screening large biological libraries (see Fig. 3).
- Example 4 Microdroplet method reduces the standard deviation of the assigned scores
- Example 5 Processing the screening results with a computational model
- This example validates the results of the multi-channel microdroplet sorted SP library (examples 1 and 2) against the results obtained from cultivating the same SP library in a MTP- format.
- Fig. 6A on the y-axis indicates the fraction of proline containing sequences.
- the fraction of signal peptides containing proline depending on the yield of the signal peptide, i.e., circa two-third of the good sequences contain at least one proline, whereas only circa one-third of the bad sequences contain at least one proline.
- the model concludes that the presence of proline in a signal peptide is a strong indicator for good expression of the investigated POI.
- Example 6 Increased Training Data Size from Microdroplets improves performance of machine learning model
- Example 7 Microdroplet method allows identification of superior library members
- This example compares the results of the multi-channel sorted SP libraries from examples 1 and 2 to the results obtained from screening the same SP library in a MTP-format. In contrast to the previous examples using 5 pools, this example sorted the library into 7 pools. Each library member is given a unique signal peptide identifier. Each library member consists of a different DNA sequence encoding a signal peptide.
- the SP library was sorted into 7 pools, i.e., pool 1 to pool 7. Droplets with lowest signal were sorted into pool 1 , whereas droplets with highest signal were sorted into pool 7. Pools 2-6 comprised cells with library members that showed signals lower that the threshold for pool 7 and higher than the threshold for pool 1. In other words, signal thresholds increased from pool 1 to pool 7.
- Table 1 shows the results of 304 library members identified using the microdroplet method. Sequences identified both in MTP screen and in microdroplet screening are marked in gray shade (e.g. SP_GAN_208). For each given sequence, table 1 shows the amount of droplets comprising said sequence in each pool. For example, sequence SP_GAN_205 appeared in 61 droplets of pool 7, and in 2 droplets of pool 1. Additionally, for the sequences identified with the microdroplet method the table shows a score calculated as described in example 3. For sequences also identified in MTP, Table 1 shows a relative activity which was identified during MTP cultivation. Importantly, the sequences in Table 1 are ranked by descending droplet score, i.e., highest droplet scores are on top of Table 1 , wherease lowest droplet scores are in the bottom of Table 1.
- the method of the invention allows to identify library members which otherwise would have been overseen and/or not found using conventional methods such as MTP. These library members are shown as lines with a clear background in Table 1. Lines with a gray background represent library members that have been identified using MTP.
- the sequence SP_GAN_208 (marked in grey) was, in terms of scoring, the best performing library member identified in the MTP screen. The same library member was also identified as well performing sequence during the microdroplet screening. However, the microdroplet method of the invention identified 6 additional library members which showed a higher score compared to SP_GAN_208, which were identified as SP_GAN_205, SP_GAN_206, SP_GAN_217, SP_GAN_126, SP_GAN_47 and SP_GAN_232. Thus, the method of the invention is highly beneficial for further improving biotechnological challenges, e.g., by increasing expression of a POI with a new signal peptide sequence.
- a method for screening a biological library comprising the steps of: a) providing a microfluidic device comprising a droplet sorter (200), the droplet sorter comprising at least three output channels (301 , 302, 303), b) providing an emulsion of droplets comprising a library of polynucleotides of interest, and a screenable product, c) determining the amount of screenable product of one or more droplets in the microfluidic device , d) sorting the one or more droplets with the droplet sorter (200) into a receiving output channel of the at least three output channels (301 , 302, 303), wherein the receiving output channel is determined based on the amount of screenable product per droplet, and wherein at least three receiving output channels receive a plurality of droplets comprising an amount of screenable product above and/or below one or more predetermined threshold level, e) identifying one or more polynucleotide of interest present
- each host cell comprises one or more polynucleotide of interest of the library of polynucleotides of interest.
- each droplet comprises at most one host cell, or a plurality of host cells derived from the same parent host cell.
- each droplet comprises at most one polynucleotide of interest.
- the enzyme is selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, beta-galactosidase, glucoamylase, alphaglucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic enzyme, peroxidase, phyta
- the screenable product is an enzyme substrate, preferably for an enzyme selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, beta-galactosidase, glucoamylase, alpha-glucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic
- the screenable product is a fluorescent product.
- the fluorescent product is converted from a fluorogenic substrate by an enzyme encoded by the polynucleotide of interest.
- the amount of screenable product is inversely proportional for one or more of cell number, cell growth, cell division, cell viability, or cell growth rate.
- the amount of screenable product is proportional for one or more of cell number, cell growth, cell division, cell viability, or cell growth rate.
- the screenable product comprises or consists of one or more host cells.
- the screenable product comprises or consists of substantially all the host cells in a droplet.
- the screenable product is a product of an enzymatic reaction, preferably of a reaction catalyzed by an enzyme selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, beta-galactosidase, glucoamylase, alpha-glucosi
- the score is proportional, e.g., normalized, to the number of identical DNA sequences for a first polynucleotide of interest present in an output channel.
- the method of any one of the preceding paragraphs, wherein the score is the total number of identical DNA sequences for a first polynucleotide of interest present in an output channel.
- the method of any one of the preceding paragraphs, wherein the score is proportional, e.g., normalized, to the number of identical DNA sequences for a second polynucleotide of interest present in an output channel. 28.
- the score is the total number of identical DNA sequences for a second polynucleotide of interest present in an output channel.
- microfluidic device comprises an incubation zone (500).
- the cells comprised in one droplet are genetically identical, i.e., the cells are derived from one parental host cell, preferably the same parental host cell.
- the droplet sorter comprises one or more sensing means (600), preferably located downstream of the incubation zone (500), and/or upstream of the sorting means (401 , 402).
- the one or more sensing means (600) comprises a fluorescence sensor.
- the one or more sensing means (600) comprises an absorption sensor.
- the one or more sensing means (600) comprises an image sensor, e.g., a CMOS sensor, or a CCD sensor, or a PMT sensor.
- the one or more sensing means (600) comprises a NEMS (nanoelectromechanical system) sensor.
- the one or more sensing means (600) comprises a mass analyzer suitable for mass spectrometry, e.g. a quadrupole mass analyzer, a TOF mass analyzer, an ion trap mass analyzer, an orbitrap mass analyzer, a magnetic sector mass analyzer, a Q-TOF mass analyser, or a FT-ICR mass analyzer.
- a mass analyzer suitable for mass spectrometry e.g. a quadrupole mass analyzer, a TOF mass analyzer, an ion trap mass analyzer, an orbitrap mass analyzer, a magnetic sector mass analyzer, a Q-TOF mass analyser, or a FT-ICR mass analyzer.
- step e) comprises DNA amplification of the one or more polynucleotide of interest within each output channel.
- step e) comprises DNA sequencing of the one or more polynucleotide of interest, e.g., after PCR amplification, or by nanopore sequencing.
- step e the one or more polynucleotide of interest is identified by a DNA barcode.
- step g) training a computational model, e.g., a machine learning algorithm, with sequence data obtained from step e) and/or score data obtained from step f).
- a computational model e.g., a machine learning algorithm
- step g) is selected from the list of a linear regression, a decision tree, a random forest model, a support vector machine (SVM), a neural network, a K-means clustering, a native Bayes, a Gaussian mixture model (GMM), or a generative model.
- SVM support vector machine
- GMM Gaussian mixture model
- - determining the candidate biological sequence by applying a model, e.g. generative model, to the input data, preferably wherein the generative model is non-unidirectional; and providing biological sequence data indicative of the candidate biological sequence.
- a model e.g. generative model
- the input biological sequence is one or more of: an amino acid sequence of a polypeptide of interest, a nucleic acid sequence encoding a polypeptide of interest, a control sequence, e.g., an expression control sequence, and a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
- the candidate biological sequence is one or more of: a control sequence, e.g., an expression control sequence, a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, an amino acid sequence of a polypeptide of interest, and a nucleic acid sequence encoding a polypeptide of interest.
- the input biological sequence is an amino acid sequence of a polypeptide of interest and/or a nucleic acid sequence encoding a polypeptide of interest
- the candidate biological sequence is a nucleic acid sequence increasing compatibility with a host cell.
- the input biological sequence is an amino acid sequence of a polypeptide of interest and/or a nucleic acid sequence encoding a polypeptide of interest
- the candidate biological sequence is a control sequence, e.g., an expression control sequence, and/or a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
- the input biological sequence is a control sequence, e.g., an expression control sequence, and/or a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, and wherein the candidate biological sequence is a nucleic acid sequence encoding a polypeptide of interest.
- the generative model is one or more of: a generative adversarial network (GAN) model, a Wasserstein generative adversarial network model, a diffusion model, and a variational autoencoder.
- GAN generative adversarial network
- applying the generative non-unidirectional model to the input data comprises partitioning the generative nonunidirectional model into a plurality of generators, wherein each generator of the plurality of generators is configured to determine, based on the input data, one or more candidate biological sequences for a subset of nucleotides and/or a subset of amino acids and a predetermined criterion.
- determining the candidate biological sequence by applying the model to the input data comprises: predicting, using the generator, a compatibility of the candidate biological sequence with the host cell;
- the method comprising training the model based on a training set of biological sequences, wherein the training set of biological sequences includes training data indicative of one or more biological sequences related to the host cell.
- the training set of biological sequences is heterologous to the genus of the host cell, preferably heterologous to one or more species of the host cell.
- the training data comprises training input data indicative of one or more of: an amino acid sequence of a polypeptide of interest, a nucleic acid sequence encoding a polypeptide of interest, a control sequence, e.g., an expression control sequence, and a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
- the training data comprises training output data indicative of one or more of: a control sequence, e.g., an expression control sequence, a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, an amino acid sequence of a polypeptide of interest, and a nucleic acid sequence encoding a polypeptide of interest.
- training the model comprises predicting, using a discriminator taking as input the training set of biological sequences, and a training candidate biological sequence, a score indicative of the training candidate biological sequence being a referenced biological sequence.
- the method comprising obtaining, from a test environment data repository, experimental data associated with the candidate biological sequence and the host cell; wherein the experimental data indicates a yield performance of the candidate biological sequence associated with the host cell.
- obtaining input data indicative of an input biological sequence comprises obtaining the input data for the input biological sequence from a database and/or a memory of the electronic device.
- An electronic device comprising a memory circuitry, a processor circuitry, and an interface, wherein the electronic device is configured to perform any of the methods according to any one of the preceding paragraphs.
- a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device cause the electronic device to perform any of the methods of any one of the preceding paragraphs.
- step h comprises or consists of a candidate biological sequence.
- step h encodes a polypeptide with increased substrate binding, increased receptor binding, increased substrate specificity, increased specific activity, and/or increased stability.
- step h comprises a control sequence
- the droplet sorter (200) comprises one or more sorting means (401 , 402).
- the one or more sorting means comprises, or consists of one or more electrode, one or more acoustic wave generator, one or more valve, and/or one or more pressure-controlled outlets.
- the one or more sorting means comprises at least two electrodes.
- the biological library comprises or consists of wild-type cells with different genotype and/or different phenotype.
- the biological library comprises different codon-optimized DNA sequences encoding the same amino acid sequence of a polypeptide of interest, e.g., a signal peptide, and/or an enzyme.
- the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest encoding a variant of a polypeptide of interest.
- polynucleotide of interest comprises a first polynucleotide of interest encoding a control sequence, and a second polynucleotide of interest encoding a polypeptide of interest.
- the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest encoding a variant of a control sequence.
- control sequence is a promoter sequence, a signal peptide, a leader sequence, a polyadenylation sequence, a propeptide sequence, or a transcription terminator.
- the polynucleotide of interest comprises a first polynucleotide of interest encoding a signal peptide, and a second polynucleotide of interest encoding a polypeptide of interest, wherein the first polynucleotide of interest is operatively linked to the second polynucleotide of interest and located upstream of the second polynucleotide of interest.
- the polynucleotide of interest comprises a first polynucleotide of interest comprising a promoter sequence, and a second polynucleotide of interest encoding a polypeptide of interest, wherein the first polynucleotide of interest is operatively linked to the second polynucleotide of interest and located upstream of the second polynucleotide of interest.
- the biological library comprises identical second polynucleotides of interest, and a plurality of variants of the first polynucleotides of interest.
- the biological library comprises identical first polynucleotides of interest, and a plurality of variants of the second polynucleotides of interest.
- the one or more polynucleotide of interest comprises a promoter, a polynucleotide encoding a signal peptide, a polynucleotide encoding a polypeptide of interest, or a native host cell gene.
- the method of any one of the preceding paragraphs wherein the first and second polynucleotide of interest are endogenous to the host cell. .
- the one or more polynucleotide of interest encodes a polypeptide of interest.
- the polypeptide of interest is an enzyme, a nanobody, an antibody, an antibody-fragment, a fluorescent polypeptide, e.g., GFP, or an alpha-lactalbumin. .
- the amount of screenable product in the droplet is proportional to the amount of a polypeptide encoded by the one or more polynucleotide of interest.
- the amount of screenable product in the droplet is inversely proportional to the amount of a polypeptide encoded by the one or more polynucleotide of interest.
- the biological library comprises at least 100 different one or more polynucleotides of interest, at least 200 different one or more polynucleotides of interest, at least 500 different one or more polynucleotides of interest, at least 1 000 different one or more polynucleotides of interest, at least 2000 different one or more polynucleotides of interest, at least 3 000 different one or more polynucleotides of interest, at least 5 000 different one or more polynucleotides of interest, at least 10 000 different one or more polynucleotides of interest, at least 100 000 different one or more polynucleotides of interest, at least 1 000 000 different one or more polynucleotides of interest, at least 10 000 000 different one or more polynucleotides of interest, at least 50000 000 different one or more polynucleotides of interest, or at least 100 000 000 different polynucleotides of interest.
- the biological library comprises at least 100 different host cells, at least 200 different host cells, at least 500 different host cells, at least 1 000 different host cells, at least 2 000 different host cells, at least 3 000 different host cells, at least 5 000 different host cells est, at least 10 000 different host cells, at least 100 000 different host cells, at least 200 000 different host cells, at least 500 000 different host cells, at least 1 000 000 different host cells, at least 5 000 000 different host cells, at least 10 000 000 different host cells, or at least 100 000 000 different host cells. .
- the amount of screenable product in the droplet is proportional to one or more of: stability of the polypeptide of interest, transcription of the polypeptide of interest, translation of the polypeptide of interest, secretion of the polypeptide of interest, yield of the polypeptide of interest, binding strength of the polypeptide of interest to a target molecule, and activity of the polypeptide of interest. 116.
- the amount of screenable product in the droplet is inversely proportional to one or more of: stability of the polypeptide of interest, transcription of the polypeptide of interest, translation of the polypeptide of interest, secretion of the polypeptide of interest, yield of the polypeptide of interest, binding strength of the polypeptide of interest to a target molecule, and activity of the polypeptide of interest.
- the amount of screenable product in the droplet is proportional to one or more of: cell number, viability of the host cell, cell division rate of the host cell, cell growth rate of the host cell, cell size of the host cell, and protein secretion of the host cell.
- the amount of screenable product in the droplet is inversely proportional to one or more of: cell number, viability of the host cell, cell division rate of the host cell, cell growth rate of the host cell, cell size of the host cell, and protein secretion of the host cell.
- the substrate comprises a fluorophore, e.g., fluorescein, or fluorescein-labelled starch.
- each droplet, before the optional incubation comprises an average occupation of at most 0.01 cells, at most 0.02 cells, at most 0.03 cells, at most 0.04 cells, at most 0.05 cells, at most 0.06 cells, at most 0.07 cells, at most 0.08 cells, at most 0.09 cells, at most 0.1 cells, at most 0.2 cells, at most 0.3 cells, at most 0.4 cells, at most 0.5 cells, at most 0.6 cells, or at most 0.7 cells; preferably at most 0.1 cells. .
- each droplet comprises an average occupation of at most 0.01 polynucleotide of interest, at most 0.02 polynucleotide of interest, at most 0.03 polynucleotide of interest, at most 0.04 polynucleotide of interest, at most 0.05 polynucleotide of interest, at most 0.06 polynucleotide of interest, at most 0.07 polynucleotide of interest, at most 0.08 polynucleotide of interest, at most 0.09 polynucleotide of interest, at most 0.1 polynucleotide of interest, at most 0.2 polynucleotide of interest, at most 0.3 polynucleotide of interest, at most 0.4 polynucleotide of interest, at most 0.5 polynucleotide of interest, at most 0.6 polynucleotide of interest, or at most 0.7 polynucleotide of interest; preferably at most 0.1 polyn
- the droplet sorting is facilitated by an electric field generated by one or more electrode (401 , 402) adjacent to the droplet sorter. .
- the droplet sorting is facilitated by an acoustic wave generated by one or more acoustic wave generators (401 , 402) adjacent to the droplet sorter.
- the droplet sorting is facilitated by a local pressure change generated by one or more pressure-controlled outlets (401 , 402) adjacent to the droplet sorter, e.g., wherein the one or more pressure- controlled outlets are comprised in one or more output channel.
- step c) the amount of screenable product in step c) is determined using a fluorescence-based signal, absorbance, Raman spectroscopy, mass spectrometry (MS), or MALDI-MS.
- a relative and/or an absolute amount of the screenable product per droplet is determined by the one or more sensing means (600).
- one or more output channels comprise at least 10 000 droplets, at least 50 000 droplets, at least 100 000 droplets, at least 500 000 droplets, at least 1 000 000 droplets, at least 2 000 000 droplets, at least 5 000 000 droplets, at least 10 000 000 droplets, or at least 100 000 000 droplets.
- the droplet sorter comprises at least four output channels, at least five output channels, at least six output channels, at least seven output channels, at least 8 output channels, at least 9 output channels, or at least 10 output channels.
- the host cell is is a yeast host cell, e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell. .
- yeast host cell e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisia
- the host cell is a filamentous fungal host cell, e.g., an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Filibasidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell, in particular, an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus n
- the host cell is a prokaryotic host cell, e.g., a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Enterococcus, Geobacillus, Lactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gram- negative bacteria selected from the group consisting of Campylobacter, E.
- a prokaryotic host cell e.g., a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Enterococcus, Geobacillus, Lactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gram- negative bacteria selected from the group consisting of Campylobacter, E.
- coli Flavobacterium, Fusobacterium, Helicobacter, llyobacter, Neisseria, Pseudomonas, Salmonella, and Ureaplasma cells, such as Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, Bacillus thuringiensis, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp.
- Bacillus alkalophilus Bacillus amyloliquefaciens
- Bacillus brevis Bacillus circulans, Bac
- Bacillus licheniformis Bacillus licheniformis.
- Bifidobacterium e.g., Bifidobacterium animalis, or Bifidobacterium animalis subsp. lactis.
- a host cell comprising in its genome a polynucleotide sequence of interest generated in step h), and/or a polynucleotide sequence identified in step e).
- the host cell of any one of the preceding paragraphs which comprises at least two copies, e.g., three, four, five, or more copies of the polynucleotide sequence of interest.
- a method of producing a polypeptide of interest comprising the steps of cultivating the cell according to any one of the preceding paragraphs, under conditions conducive for production of the polypeptide.
- a nucleic acid construct or expression vector comprising a polynucleotide of interest identified by step e), and/or a polynucleotide sequence generated in step h).
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Organic Chemistry (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Zoology (AREA)
- Physics & Mathematics (AREA)
- Dispersion Chemistry (AREA)
- Pathology (AREA)
- Plant Pathology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Crystallography & Structural Chemistry (AREA)
- General Physics & Mathematics (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Signal Processing (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
The present invention relates to methods for screening a biological library. The invention also relates to nucleic acid sequences, vectors, and host cells which have been isolated and/or generated by the methods of the invention.
Description
DROPLET-BASED SCREENING METHOD
Background of the Invention
Field of the Invention
The present invention relates to methods for screening a biological-library using a microfluidic chip. The invention also relates to nucleic acid sequences, vectors, and host cells which have been isolated and/or generated by the methods of the invention.
The methods of the invention also relate to identification of polynucleotides of interest and/or cells having desired characteristics. Droplets sorted in the chip are sometimes referred to as microdroplets or microencapsulations, they typically have an average diameter of about 20 micrometer and are used as compartments or miniscule reaction vessels. They can contain live microbial cells that are, for example, secreting an enzyme. Additionally or alternatively, the droplets can contain cell extracts that enable the expression of a protein encoded by a polynucleotide of interest. The droplets may also contain other components, for example, a fluorogenic enzyme substrate that can reveal the activity of an enzyme.
Description of the Related Art
Decreasing costs for ordering synthetic DNA and creating DNA libraries enables the generation of larger libraries with high sequence diversity. However, although DNA sequencing costs are decreasing too, characterizing the sequences in a time and cost-efficient manner remains challenging. Conventional high-throughput screening (HTS) methods using micro-titer plates (MTP) are known to be both costly and slow. While library sorting using two-way FACS/FADS methods or two-way microdroplet sorters are already available, those methods are inefficient since the separation into multiple bins cannot be performed in a single step. For example, sorting into multiple fractions must be done in a sequential manner, which requires a large starting sample, and is both time inefficient and results in the loss of a significant number of the starting sample.
Thus, there is a need for an improved library sorting and sequencing method.
Summary of the Invention
The inventors of the presented methods have discovered, to their surprise, that compared to conventional high-throughput screening (HTS) methods, sorting of microdroplets into at least three outlet channels with subsequent sequencing brings several combined benefits:
• Efficiency: The method of the invention reduces both the time and cost required per library member.
• Improved Performance: The method of the invention increases the overall efficiency of the process.
• Robustness and Accuracy: The method of the invention is as robust as conventional methods (as demonstrated in Example 3) while showing reduced variability (evidenced by decreased standard deviation as shown in Example 4). Decreased standard deviation facilitates the generation of better performing computational models.
• Scalability: The method of the invention enables the screening of larger libraries. Screening a higher number of library members allows for the generation of more effective models (as shown in Example 6). The method of the invention also identifies library members that are difficult or impossible to identify using other methods (as shown in Example 7).
The above-mentioned advantages are further elaborated in more detail below.
Droplet microfluidics accelerates the speed with which biological screening data can be generated by a factor of ca. 1000 at less than 1% of the cost compared to conventional HTS methods. When sorting a biological library according to the invention, less library members and less reagents are wasted to the trash bin relative to a conventional two-way sorting method. Thus, the methods of the invention require a lower number of cells and/or library members in the starting sample without compromising the read-out quality. Furthermore, after sorting into three or more receiving channels the scoring of the polynucleotide of interest allows a detailed study of the relationship between variants of the polynucleotide of interest and a desired effect, e.g., enzyme activity or enzyme yield. Compared to droplet sorting into two channels (two sub-sets) which only allows droplet separation based on a single threshold value, the methods of the invention allow a higher resolution of read-outs, i.e., it is possible to differentiate between multiple sub-sets after droplet separation based on multiple threshold values. For example, in some cases it is not the highest binding activity of a polypeptide towards a substrate or inhibitor that is favorable, but a moderate binding activity in a “sweet spot” is preferred.
Not only variant polynucleotide sequences correlated with positive effects (e.g., increased enzyme yield or increased enzyme activity), but also variant sequences correlated with less- desired effects (e.g., reduced enzyme yield or reduced enzyme activities) are part of the detailed output and can be considered for further rounds of library screening and/or mutation.
As shown in the examples, the results obtained with the methods of the invention unexpectedly have a strong correlation with the results of the MTP screening methods. Thus, using the methods of the instant invention allows to skip or replace the well-establised MTP screening methods which are known to be time- and resource-demanding. While maintaining the strong correlation with the MTP results, the methods of the invention have also shown a lower standard deviation (STD) compared to MTP. This lower STD is of particular advantage when using the resulting data as input data for a machine learning model as a lower STD results in a high confidence machine learning model in form of a more precise and robust algorithm.
Advantageously, as shown in Table 1 and Example 7 the method of the invention allows to identify promising library members from a large library, which members otherwise would not have been identified using conventional screening methods as these conventional methods are limited to smaller libraries only.
Harnessing the power of machine learning hinges on the availability of extensive datasets. In a conventional droplet screening process, only a small fraction of droplets is being isolated and analyzed. Hence, for most library members, e.g., when screening a cell library, the phenotypegenotype link is lost. To enable the characterization of the entire library diversity, the methods of the invention provide a strategy where droplets are sorted into multiple, at least three, output channels (pools). Then the abundance of each individual sequence in each pool is measured and calculated as a score for each polynucleotide sequence. In this manner, droplet screening technology can be employed for the efficient generation of extensive data sets, while considering each and every polynucleotide sequence present in each pool.
Thus, advantageously, the scoring, identification and/or sequencing of one or more polynucleotide of interest in one or more output channel can be utilized to train a computational model to obtain further insights about the sequence properties, to improve desired sequence characteristics (e.g., increased yield, and/or increased enzyme activity), and/or to generate synthetic sequences with such improved characteristics. Also, when utilizing the output for computational models, it is important to keep the number and degree of artefacts at a minimum by keeping homogenous incubation conditions. Artefacts, e.g., edge effects, often occur in MTP as the outer wells are exposed to slightly different conditions than the more centrally located wells, for example due to heat capacity and evaporation effects. For microdroplets, these artefacts can be avoided or kept at a minimum, which significantly increases the quality of the model.
In a first aspect, the invention relates to a method for screening a biological library, the method comprising the steps of: a) providing a microfluidic device comprising a droplet sorter (200), the droplet sorter comprising at least three output channels (301 , 302, 303),
b) providing an emulsion of droplets comprising a library of polynucleotides of interest and a screenable product, c) determining the amount of screenable product of one or more droplets in the microfluidic device, d) sorting the one or more droplets with the droplet sorter (200) into a receiving output channel of the at least three output channels (301 , 302, 303), wherein the receiving output channel is determined based on the amount of screenable product per droplet, and wherein at least three receiving output channels receive a plurality of droplets comprising an amount of screenable product above and/or below one or more predetermined threshold level, e) identifying one or more polynucleotide of interest present in the at least three output channels (301 , 302, 303) and obtaining sequence data for the one or more polynucleotide of interest, and f) for one or more output channels (301 , 302, 303), assigning a score to each of the one or more polynucleotide of interest, wherein the score is calculated based on the abundance of each of the one or more polynucleotide of interest in one of the one or more output channel (301 , 302, 302).
In a second aspect, the invention relates to a host cell comprising in its genome the synthetic polynucleotide of interest generated in additional step h), and/or a polynucleotide of interest identified in step e).
In a third aspect, the invention relates to a method of producing a polypeptide of interest, the method comprising the steps of cultivating the cell according to the second aspect, under conditions conducive for production of the polypeptide.
Brief Description of the Drawings
Figure 1 shows a schematic overview of microfluidic device with multiple output channels according to one embodiment of the method of the invention.
Figure 2 shows the assay responses for 100.000 droplets and the thresholds used for separation into the five output channels (pools 1-5).
Figure 3 shows the relative abundance for 102 signal peptide variants sorted into five output channels (pools 1-5).
Figure 4 shows the correlation between the scores of the droplet method of the invention and the MTP assay.
Figure 5 shows the standard deviation (o) for MTP fermentations (A) and droplet fermentations (B) based on counts and relative protein yield.
Figure 6 shows the fraction of proline containing sequences amongst the sequences obtained from a MTP screen (A) and a ranking of signal peptide sequences obtained from the same MTP screen (B).
Figure 7 shows the fraction of proline containing sequences amongst the sequences obtained from a microdroplet screen (A) and a ranking of signal peptide sequences obtained from the same microdroplet screen (B).
Figure 8 shows the correlation coefficient between predictions and observed values dependent on the size of the training data.
Definitions
In accordance with this detailed description, the following definitions apply. Note that the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise.
Unless defined otherwise or clearly indicated by context, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
Biological library: The term “biological library” means a library comprising a plurality of variants, including but not limited to one or more of DNA sequence variants, amino acid sequence variants, and cell variants.
Non-limiting examples for DNA sequence variants include a library of recombinant cells comprising said DNA sequence variants, and cell-free systems comprising said DNA sequence variants. Thus, in one embodiment the library of polynucleotides of interest is comprised in recombinant host cells.
Non-limiting examples for DNA sequence variants include a library of wildtype cells comprising native DNA sequence variants. Thus, in one embodiment the library of polynucleotides of interest is comprised in wildtype cells.
Non-limiting examples for amino acid variants include a library or purified polypeptide variants, and a library of recombinant cells expressing polypeptide variants.
Non-limiting examples for cell variants include a library of wildtype cells, and a library of recombinant cells comprising DNA sequence variants. cDNA: The term "cDNA" means a DNA molecule that can be prepared by reverse transcription from a mature, spliced, mRNA molecule obtained from a eukaryotic or prokaryotic cell. cDNA lacks intron sequences that may be present in the corresponding genomic DNA. The initial, primary RNA transcript is a precursor to mRNA that is processed through a series of steps, including splicing, before appearing as mature spliced mRNA.
Coding sequence: The term “coding sequence” means a polynucleotide, which directly specifies the amino acid sequence of a polypeptide. The boundaries of the coding sequence are generally determined by an open reading frame, which begins with a start codon, such as ATG, GTG, or TTG, and ends with a stop codon, such as TAA, TAG, or TGA. The coding sequence may be a genomic DNA, cDNA, synthetic DNA, or a combination thereof.
Computational model: The term “computational model” refers to a computational procedure or process designed to enable computers or machines to learn from data and improve their performance on a specific task without being explicitly programmed for that task. Computational models include machine learning algorithms. These algorithms utilize statistical techniques to recognize patterns, relationships, and correlations within datasets, enabling them to make predictions, classifications, or decisions based on new or unseen data.
Examples for machine learning algorithms include, but are not limited to:
Linear Regression: A foundational algorithm that models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data points. It is commonly used for tasks like predicting numerical values, such as housing prices based on factors like square footage and location.
Decision Trees: A method that uses a tree-like structure to make decisions based on multiple conditions. Each node in the tree represents a decision based on a particular feature, eventually leading to a leaf node with the final prediction or classification. Decision trees are employed for tasks like classification, where an algorithm determines the category of an input based on features.
Random Forest: An ensemble learning technique that combines multiple decision trees to enhance predictive accuracy and mitigate overfitting. Each tree in the forest makes a prediction, and the final result is determined by aggregating the predictions of all trees. Random forests are applied in various domains, such as image classification and medical diagnosis.
A random forest model is an advanced machine learning algorithm for diverse applications such as classification, regression, and data analysis. During its training phase, the algorithm constructs an ensemble comprising numerous decision trees. Notably, each decision tree is established utilizing a subset of the training dataset and a randomized assortment of input features.
The distinctive potency of the random forest model stems from its capacity to amalgamate predictions derived from multiple decision trees. This fusion, termed "bagging," engenders augmented accuracy and heightened resilience in contrast to individual trees. By way of ameliorating the collective biases and curbing the variability exhibited by individual trees, the random forest model effectively averts overfitting. Consequently, its performance is markedly enhanced in terms of making accurate predictions on novel, previously unseen data instances.
Additionally, the random forest model adeptly manages high-dimensional datasets and intricate feature interdependencies, rendering it particularly applicable to intricate real-world predicaments. It is notable for its competence in accommodating missing data values, ensuring sustained accuracy even when confronted with incompleteness within portions of the data.
Support Vector Machines (SVM): A classification algorithm that finds the optimal hyperplane to separate different classes of data points by maximizing the margin between them. SVMs are used for tasks like text classification, image recognition, and bioinformatics.
Neural Networks: Complex algorithms inspired by the structure and function of biological neural networks. They consist of layers of interconnected nodes (neurons) that process and transform data. Deep learning, a subset of neural networks, involves multiple hidden layers and is utilized for tasks like natural language processing, image generation, and autonomous driving.
K-Means Clustering: An unsupervised learning algorithm used to partition a dataset into distinct clusters based on similarities in the data points' features. It is employed in market segmentation, customer profiling, and image compression.
Reinforcement Learning: An approach where an algorithm learns to make sequences of decisions by interacting with an environment to maximize a cumulative reward. This is often used in robotics, game playing, and autonomous systems.
Naive Bayes: A probabilistic algorithm based on Bayes' theorem that is particularly effective for text classification tasks like spam detection and sentiment analysis.
Principal Component Analysis (PCA): A dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional representation while preserving as much variance as possible. PCA is used in image compression, data visualization, and feature extraction.
Gaussian Mixture Model (GMM): A probabilistic model that assumes data is generated from a mixture of several Gaussian distributions. GMMs are applied in various fields, including speech recognition, image segmentation, and anomaly detection.
Generative Adversarial Network (GAN): A specialized class of machine learning algorithm that involves two neural networks, a generator, and a discriminator, engaged in a competitive process. The generator creates synthetic data instances (such as images or text) that resemble real data, while the discriminator evaluates whether a given data instance is real or generated. The two networks iteratively refine their performance, with the generator aiming to produce increasingly realistic data and the discriminator improving its ability to differentiate between real and generated data. A non-limiting example of a suitable GAN is disclosed in WO2024/133344 (Novozymes A/S).
Control sequences: The term “control sequences” means nucleic acid sequences involved in regulation of expression of a polynucleotide in a specific organism or in vitro. Each
control sequence may be native (/.e., from the same gene) or heterologous (/.e., from a different gene) to the polynucleotide encoding the polypeptide, and native or heterologous to each other. Such control sequences include, but are not limited to leader, polyadenylation, prepropeptide, propeptide, signal peptide, promoter, terminator, enhancer, and transcription or translation initiator and terminator sequences. At a minimum, the control sequences include a promoter, and transcriptional and translational stop signals. The control sequences may be provided with linkers for the purpose of introducing specific restriction sites facilitating ligation of the control sequences with the coding region of the polynucleotide encoding a polypeptide. ddPCR: The term “Droplet Digital PCR” or “ddPCR” refers to an advanced molecular biology technique employed for the precise analysis and quantification of nucleic acids, including DNA and RNA, within a sample. This method represents an innovation over conventional polymerase chain reaction (PCR) methodologies, devised to address the inherent limitations of traditional PCR by enabling accurate measurement and detection of rare target sequences or subtle variations in target concentrations.
In the ddPCR process, the sample containing the target nucleic acid is intelligently subdivided into numerous individual droplets, each operating as an independent reaction compartment. This strategic partitioning step facilitates the isolated amplification of the target nucleic acid, minimizing the potential for amplification biases and interference from non-target molecules. Following amplification, the droplets undergo fluorescence-based analysis, determining the presence or absence of the amplified target sequence within each individual droplet.
Droplet sorter: The term “droplet sorter” (200) means an arrangement within the microfluidic device which allows the sorting of droplets into three or more output channels, wherein the sorting is based on the amount of screenable product detected in the droplet. For example, the sorting is carried out by using one or more sorting means, e.g., electrodes or valves. The amount of screenable product of the droplet is detected by one or more sensing means, and communicated to the sorting means, e.g., two or more electrodes.
Expression: The term “expression” means any step involved in the production of a polypeptide including, but not limited to, transcription, post-transcriptional modification, translation, post-translational modification, and secretion.
Expression vector: An "expression vector" refers to a linear or circular DNA construct comprising a DNA sequence encoding a polypeptide, which coding sequence is operably linked to a suitable control sequence capable of effecting expression of the DNA in a suitable host. Such control sequences may include a promoter to effect transcription, an optional operator sequence to control transcription, a sequence encoding suitable ribosome binding sites on the mRNA, enhancers and sequences which control termination of transcription and translation.
Extension: The term “extension” means an addition of one or more amino acids to the amino and/or carboxyl terminus of a polypeptide, wherein the “extended” polypeptide has enzyme activity.
Fragment: The term “fragment” means a polypeptide having one or more amino acids absent from the amino and/or carboxyl terminus of the mature polypeptide, wherein the fragment has enzyme activity.
Fusion polypeptide: The term “fusion polypeptide” is a polypeptide in which one polypeptide is fused at the N-terminus and/or the C-terminus of a polypeptide of the present invention. A fusion polypeptide is produced by fusing a polynucleotide encoding another polypeptide to a polynucleotide of the present invention, or by fusing two or more polynucleotides of the present invention together. Techniques for producing fusion polypeptides are known in the art, and include ligating the coding sequences encoding the polypeptides so that they are in frame and that expression of the fusion polypeptide is under control of the same promoter(s) and terminator. Fusion polypeptides may also be constructed using intein technology in which fusion polypeptides are created post-translationally (Cooper et al., 1993, EMBO J. 12: 2575-2583; Dawson et al., 1994, Science 266: 776-779). A fusion polypeptide can further comprise a cleavage site between the two polypeptides. Upon secretion of the fusion protein, the site is cleaved releasing the two polypeptides. Examples of cleavage sites include, but are not limited to, the sites disclosed in Martin et al., 2003, J. Ind. Microbiol. Biotechnol. 3: 568-576; Svetina et al., 2000, J. Biotechnol. 7Q: 245-251 ; Rasmussen-Wilson et al., 1997, Appl. Environ. Microbiol. 63: 3488-3493; Ward et al., 1995, Biotechnology 13: 498-503; and Contreras et al., 1991 , Biotechnology 9: 378-381 ; Eaton etal., 1986, Biochemistry 25: 505-512; Collins-Racie etal., 1995, Biotechnology 13: 982-987; Carter et al., 1989, Proteins: Structure, Function, and Genetics 6: 240-248; and Stevens, 2003, Drug Discovery World 4: 35-48.
Heterologous: The term "heterologous" means, with respect to a host cell, that a polypeptide or nucleic acid does not naturally occur in the host cell. The term "heterologous" means, with respect to a polypeptide or nucleic acid, that a control sequence, e.g., promoter, of a polypeptide or nucleic acid is not naturally associated with the polypeptide or nucleic acid, i.e., the control sequence is from a gene other than the gene encoding the mature polypeptide.
Host Strain or Host Cell: A "host strain" or "host cell" is an organism comprising a polynucleotide of interest. Exemplary host strains are microorganism cells (e.g., bacteria, filamentous fungi, and yeast) capable of expressing a polypeptide of interest and/or fermenting saccharides, and/or probiotic microorganisms.
A recomobinant host strain or recombinant host cell is an organism into which an expression vector, phage, virus, or other DNA construct, including a polynucleotide encoding a polypeptide of interest (e.g., an amylase) has been introduced. Exemplary recombinant host
strains are microorganism cells (e.g., bacteria, filamentous fungi, and yeast) capable of expressing the polypeptide of interest and/or fermenting saccharides. The term "host cell" includes protoplasts created from cells.
Introduced: The term "introduced" in the context of inserting a nucleic acid sequence into a cell, means "transfection", "transformation" or "transduction," as known in the art.
Isolated: The term “isolated” means a polypeptide, nucleic acid, cell, or other specified material or component that has been separated from at least one other material or component, including but not limited to, other proteins, nucleic acids, cells, etc. An isolated polypeptide, nucleic acid, cell or other material is thus in a form that does not occur in nature. An isolated polypeptide includes, but is not limited to, a culture broth containing the secreted polypeptide expressed in a host cell.
Mature polypeptide: The term “mature polypeptide” means a polypeptide in its mature form following N-terminal and/or C-terminal processing (e.g., removal of signal peptide).
Mature polypeptide coding sequence: The term “mature polypeptide coding sequence” means a polynucleotide that encodes a mature polypeptide.
Microfluidic device: According to the invention, the microfluidic device comprises a droplet sorter (200), the droplet sorter comprising at least three output channels (301 , 302, 303). Typically, the microfluidic device also comprises a plurality of liquid inlets and/or liquid inlets. In one embodiment, the device comprises an incubation chamber (500).
Native: The term "native" means a nucleic acid or polypeptide naturally occurring in a host cell.
Nucleic acid: The term "nucleic acid" encompasses DNA, RNA, heteroduplexes, and synthetic molecules capable of encoding a polypeptide. Nucleic acids may be single stranded or double stranded, and may be chemical modifications. The terms "nucleic acid" and "polynucleotide" are used interchangeably. Because the genetic code is degenerate, more than one codon may be used to encode a particular amino acid, and the present compositions and methods encompass nucleotide sequences that encode a particular amino acid sequence. Unless otherwise indicated, nucleic acid sequences are presented in 5'-to-3' orientation.
Nucleic acid construct: The term "nucleic acid construct" means a nucleic acid molecule, either single- or double-stranded, which is isolated from a naturally occurring gene or is modified to contain segments of nucleic acids in a manner that would not otherwise exist in nature or which is synthetic, and which comprises one or more control sequences operably linked to the nucleic acid sequence.
Operably linked: The term "operably linked" means that specified components are in a relationship (including but not limited to juxtaposition) permitting them to function in an intended
manner. For example, a regulatory sequence is operably linked to a coding sequence such that expression of the coding sequence is under control of the regulatory sequence.
Protease: In one aspect the polynucleotide of interest encodes a protease. Suitable proteases include those of bacterial, fungal, plant, viral or animal origin e.g. microbial or vegetable origin. Microbial origin is preferred. Chemically modified or protein engineered variants are included. It may be an alkaline protease, such as a serine protease or a metalloprotease. A serine protease may for example be of the S1 family, such as trypsin, or the S8 family such as subtilisin. A metalloproteases protease may for example be a thermolysin from e.g. family M4 or other metalloprotease such as those from M5, M7 or M8 families. Serine endopeptidases hydrolyse the substrate N-Succinyl-Ala-Ala-Pro-Phe pnitroanilide. In the context of the examples, the reaction was performed at room temperature at pH 9.0. The release of pNA results in an increase of absorbance at 405 nm and this increase is proportional to the enzymatic activity measured against a standard.
Purified: The term “purified” means a nucleic acid, polypeptide or cell that is substantially free from other components as determined by analytical techniques well known in the art (e.g., a purified polypeptide or nucleic acid may form a discrete band in an electrophoretic gel, chromatographic eluate, and/or a media subjected to density gradient centrifugation). A purified nucleic acid or polypeptide is at least about 50% pure, usually at least about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 91 %, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, about 99.5%, about 99.6%, about 99.7%, about 99.8% or more pure (e.g., percent by weight or on a molar basis). In a related sense, a composition is enriched for a molecule when there is a substantial increase in the concentration of the molecule after application of a purification or enrichment technique. The term "enriched" refers to a compound, polypeptide, cell, nucleic acid, amino acid, or other specified material or component that is present in a composition at a relative or absolute concentration that is higher than a starting composition.
In one aspect, the term "purified" as used herein refers to the polypeptide or cell being essentially free from components (especially insoluble components) from the production organism. In other aspects, the term "purified" refers to the polypeptide being essentially free of insoluble components (especially insoluble components) from the native organism from which it is obtained. In one aspect, the polypeptide is separated from some of the soluble components of the organism and culture medium from which it is recovered. The polypeptide may be purified (/.e., separated) by one or more of the unit operations filtration, precipitation, or chromatography.
Accordingly, the polypeptide may be purified such that only minor amounts of other proteins, in particular, other polypeptides, are present. The term "purified" as used herein may refer to removal of other components, particularly other proteins and most particularly other
enzymes present in the cell of origin of the polypeptide. The polypeptide may be "substantially pure", i.e., free from other components from the organism in which it is produced, e.g., a host organism for recombinantly produced polypeptide. In one aspect, the polypeptide is at least 40% pure by weight of the total polypeptide material present in the preparation. In one aspect, the polypeptide is at least 50%, 60%, 70%, 80% or 90% pure by weight of the total polypeptide material present in the preparation. As used herein, a "substantially pure polypeptide" may denote a polypeptide preparation that contains at most 10%, preferably at most 8%, more preferably at most 6%, more preferably at most 5%, more preferably at most 4%, more preferably at most 3%, even more preferably at most 2%, most preferably at most 1%, and even most preferably at most 0.5% by weight of other polypeptide material with which the polypeptide is natively or recombinantly associated.
It is, therefore, preferred that the substantially pure polypeptide is at least 92% pure, preferably at least 94% pure, more preferably at least 95% pure, more preferably at least 96% pure, more preferably at least 97% pure, more preferably at least 98% pure, even more preferably at least 99% pure, most preferably at least 99.5% pure by weight of the total polypeptide material present in the preparation. The polypeptide of the present invention is preferably in a substantially pure form i.e., the preparation is essentially free of other polypeptide material with which it is natively or recombinantly associated). This can be accomplished, for example by preparing the polypeptide by well-known recombinant methods or by classical purification methods.
Recombinant: The term "recombinant" is used in its conventional meaning to refer to the manipulation, e.g., cutting and rejoining, of nucleic acid sequences to form constellations different from those found in nature. The term recombinant refers to a cell, nucleic acid, polypeptide or vector that has been modified from its native state. Thus, for example, recombinant cells express genes that are not found within the native (non-recombinant) form of the cell, or express native genes at different levels or under different conditions than found in nature. The term “recombinant” is synonymous with “genetically modified” and “transgenic”.
Recover: The terms "recover" or “recovery” means the removal of a polypeptide from at least one fermentation broth component selected from the list of a cell, a nucleic acid, or other specified material, e.g., recovery of the polypeptide from the whole fermentation broth, or from the cell-free fermentation broth, by polypeptide crystal harvest, by filtration, e.g. depth filtration (by use of filter aids or packed filter medias, cloth filtration in chamber filters, rotary-drum filtration, drum filtration, rotary vacuum-drum filters, candle filters, horizontal leaf filters or similar, using sheed or pad filtration in framed or modular setups) or membrane filtration (using sheet filtration, module filtration, candle filtration, microfiltration, ultrafiltration in either cross flow, dynamic cross flow or dead end operation), or by centrifugation (using decanter centrifuges, disc stack centrifuges, hyrdo cyclones or similar), or by precipitating the polypeptide and using relevant solid-
liquid separation methods to harvest the polypeptide from the broth media by use of classification separation by particle sizes. Recovery encompasses isolation and/or purification of the polypeptide.
Score: In the context of the invention a score is calculated for each of the one or more polynucleotide of interest. In one embodiment, the score is the sum of products of the normalized relative abundances in each output channel multiplied with the sorting threshold score for the corresponding output channel. In a prefered embodiment, the score is calculated as described in Example 3.
Screenable product: The term “screenable product” means a molecule which is detectable by the sensing means (600). The screenable product includes but is not limited to fluorescent molecules (e.g., green fluorescent protein (GFP), mCherry, mVenus, DsRed, EGFP, nile red (9-(diethylamino)benzo[a]phenoxazin-5-one), a fluorescent vitamine, DAPI (4’,6- diamidino-2-phenylindole), and BIODIPY), and fluorogenic molecules, e.g. fluorgenic Rhodamine. For example, the screenable product is added to the emulsion, or is generated from a substrate by a process taking place in the droplet, e.g., during incubation. In another example, the screenable product is a polypeptide expressed in the droplets. In another example, the screenable product is a host cell in the droplets. In yet another example, the screenable product comprises an absorbing molecule. In one embodiment the absorbing molecule comprises para-nitro-anilin (PNA).
For example, the amount of the screenable product in the droplet may be inversely proportional to the amount of a polypeptide of interest expressed in the droplet, and/or by the host cells, e.g., when the polypeptide of interest binds or degrades the screenable product.
For example, the amount of the screenable product in the droplet may be proportional to the amount of a polypeptide of interest expressed in the droplet, and/or by the host cells, e.g., when the polypeptide of interest degrades a substrate, which results in formation of the screenable product, or when the screenable product incorporates into the host cells or parts thereof (e.g., host cell membrane, or host cell wall), for example Nile Red.
The screenable product can thus, for example, be used as a proxy for one or more of the features selected from the list of cell growth, cell division, polypeptide of interest expression, polypeptide of interest binding, polypeptide of interest stability, and polypeptide of interest activity. In some examples, more than one screenable product is present in the droplets, e.g., to determine two or more different features selected from the aforementioned features.
Sequence identity: The relatedness between two amino acid sequences or between two nucleotide sequences is described by the parameter “sequence identity”.
For purposes of the present invention, the sequence identity between two amino acid sequences is determined as the output of “longest identity” using the Needleman-Wunsch
algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet. 16: 276-277), preferably version 6.6.0 or later. The parameters used are a gap open penalty of 10, a gap extension penalty of 0.5, and the EBLOSUM62 (EMBOSS version of BLOSUM62) substitution matrix. In order for the Needle program to report the longest identity, the -nobrief option must be specified in the command line. The output of Needle labeled “longest identity” is calculated as follows:
(Identical Residues x 100)/(Length of Alignment - Total Number of Gaps in Alignment)
For purposes of the present invention, the sequence identity between two polynucleotide sequences is determined as the output of “longest identity” using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, supra) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, supra), preferably version 6.6.0 or later. The parameters used are a gap open penalty of 10, a gap extension penalty of 0.5, and the EDNAFULL (EMBOSS version of NCBI NLIC4.4) substitution matrix. In order for the Needle program to report the longest identity, the nobrief option must be specified in the command line. The output of Needle labeled “longest identity” is calculated as follows:
(Identical Deoxyribonucleotides x 100)/(Length of Alignment- Total Number of Gaps in Alignment)
Signal Peptide: A "signal peptide" is a sequence of amino acids attached to the N- terminal portion of a protein, which facilitates the secretion of the protein outside the cell. The mature form of an extracellular protein lacks the signal peptide, which is cleaved off during the secretion process.
Subsequence: The term “subsequence” means a polynucleotide having one or more nucleotides absent from the 5' and/or 3' end of a mature polypeptide coding sequence; wherein the subsequence encodes a fragment having enzyme activity.
Variant: The term “variant” means a polypeptide having enzyme activity comprising a man-made mutation, i.e., a substitution, insertion (including extension), and/or deletion (e.g., truncation), at one or more positions. A substitution means replacement of the amino acid occupying a position with a different amino acid; a deletion means removal of the amino acid occupying a position; and an insertion means adding 1-5 amino acids (e.g., 1-3 amino acids, in particular, 1 amino acid) adjacent to and immediately following the amino acid occupying a position.
Wild-type: The term "wild-type" in reference to an amino acid sequence or nucleic acid sequence means that the amino acid sequence or nucleic acid sequence is a native or naturally- occurring sequence. As used herein, the term "naturally-occurring" refers to anything (e.g., proteins, amino acids, or nucleic acid sequences) that is found in nature. Conversely, the term
"non-naturally occurring" refers to anything that is not found in nature (e.g., recombinant nucleic acids and protein sequences produced in the laboratory or modification of the wild-type sequence).
Detailed Description of the Invention
In a first aspect, the invention relates to a method for screening a biological-library, the method comprising the steps of: a) providing a microfluidic device comprising a droplet sorter (200), the droplet sorter comprising at least three output channels (301 , 302, 303), b) providing an emulsion of droplets comprising a library of polynucleotides of interest, and a screenable product, c) determining the amount of screenable product of one or more droplets in the microfluidic device , d) sorting the one or more droplets with the droplet sorter (200) into a receiving output channel of the at least three output channels (301 , 302, 303), wherein the receiving output channel is determined based on the amount of screenable product per droplet, and wherein at least three receiving output channels receive a plurality of droplets comprising an amount of screenable product above and/or below one or more predetermined threshold level, e) identifying one or more polynucleotide of interest present in the at least three output channels (301 , 302, 303) and obtaining sequence data for the one or more polynucleotide of interest, and f) for one or more output channels (301 , 302, 303), assigning a score to each of the one or more polynucleotide of interest, wherein the score is calculated based on the abundance of each of the one or more polynucleotide of interest in one of the one or more output channel (301 , 302, 302).
In one embodiment, the emulsion of droplets comprises one or more host cells.
In one embodiment, each host cell comprises one or more polynucleotide of interest of the library of polynucleotides of interest.
In one embodiment, in step b) each droplet comprises at most one host cell, or a plurality of host cells derived from the same parent host cell.
In one embodiment, in step b) each droplet comprises at most one polynucleotide of interest.
In one embodiment, the screenable product is produced by the host cells.
In one embodiment, the screenable product is catalyzed by an enzyme, preferably the enzyme is encoded by the polynucleotide of interest.
In one embodiment, the screenable product is encoded by the one or more polynucleotide of interest.
In one embodiment, the screenable product is produced by a polypeptide expressed by the host cells.
In one embodiment, the screenable product is produced by a polypeptide encoded by the one or more polynucleotide of interest.
In one embodiment, the screenable product is a polypeptide expressed by the host cells.
In one embodiment, the screenable product is an enzyme.
In one embodiment, the enzyme is expressed by the host cells.
In one embodiment, the enzyme is selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, betagalactosidase, glucoamylase, alpha-glucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic enzyme, peroxidase, phytase, polyphenoloxidase, proteolytic enzyme, ribonuclease, transglutaminase, xylanase, or beta-xylosidase.
In one embodiment, the screenable product is degraded by the host cells.
In one embodiment, the screenable product is degraded by the polypeptide encoded by the one or more polynucleotide of interest.
In one embodiment, the screenable product is degraded by a polypeptide expressed by the host cells.
In one embodiment, the screenable product is an enzyme substrate, preferably for an enzyme selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, beta-galactosidase, glucoamylase, alphaglucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic enzyme, peroxidase, phytase, polyphenoloxidase, proteolytic enzyme, ribonuclease, transglutaminase, xylanase, or beta-xylosidase.
In one embodiment, the screenable product is a fluorescent product.
In one embodiment, the fluorescent product is converted from a fluorogenic substrate by an enzyme encoded by the polynucleotide of interest.
In one embodiment, the amount of screenable product is inversely proportional for one or more of cell number, cell growth, cell division, cell viability, or cell growth rate.
In one embodiment, the amount of screenable product is proportional for one or more of cell number, cell growth, cell division, cell viability, or cell growth rate.
In one embodiment, the screenable product comprises or consists of one or more host cells.
In one embodiment, the screenable product comprises or consists of substantially all the host cells in a droplet.
In one embodiment, the screenable product is a product of an enzymatic reaction, preferably of a reaction catalyzed by an enzyme selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, betagalactosidase, glucoamylase, alpha-glucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic enzyme, peroxidase, phytase, polyphenoloxidase, proteolytic enzyme, ribonuclease, transglutaminase, xylanase, or beta-xylosidase.
In one embodiment, the score is proportional, e.g., normalized, to the number of identical DNA sequences for a first polynucleotide of interest present in an output channel.
In one embodiment, wherein the score is the total number of identical DNA sequences for a first polynucleotide of interest present in an output channel.
In one embodiment, the score is proportional, e.g., normalized, to the number of identical DNA sequences for a second polynucleotide of interest present in an output channel.
In one embodiment, the score is the total number of identical DNA sequences for a second polynucleotide of interest present in an output channel.
In one embodiment, the microfluidic device comprises an incubation zone (500).
In one embodiment, the incubation zone (500) is located upstream of the droplet sorter (200) and/or upstream of one or more sorting means (401 , 402).
In one embodiment, the method comprises incubation of the emulsion of droplets under conditions allowing cell growth, and/or allowing DNA transcription from DNA to RNA, and/or allowing translation from RNA to a polypeptide, preferably the incubation takes place prior to step c).
In one embodiment, the incubation does not take place in the microfluidic chip.
In one embodiment, the incubation takes place on and/or in the microfluidic device.
In one embodiment, after incubation, the cells comprised in one droplet are genetically identical, i.e., the cells are derived from one parental host cell, preferably the same parental host cell.
In one embodiment, the droplet sorter comprises one or more sensing means (600), preferably located downstream of the incubation zone (500), and/or upstream of the sorting means (401 , 402).
In one embodiment, the one or more sensing means (600) comprises a fluorescence sensor.
In one embodiment, the one or more sensing means (600) comprises an absorption sensor.
In one embodiment, the one or more sensing means (600) comprises an image sensor, e.g., a CMOS sensor, or a CCD sensor, or a PMT sensor.
In one embodiment, the one or more sensing means (600) comprises a NEMS (nanoelectromechanical system) sensor.
In one embodiment, the one or more sensing means (600) comprises a mass analyzer suitable for mass spectrometry, e.g. a quadrupole mass analyzer, a TOF mass analyzer, an ion trap mass analyzer, an orbitrap mass analyzer, a magnetic sector mass analyzer, a Q-TOF mass analyser, or a FT-ICR mass analyzer.
In one embodiment, step e) comprises DNA amplification of the one or more polynucleotide of interest within each output channel.
In one embodiment, the DNA amplification is a PCR method.
In one embodiment, the DNA amplification is a ddPCR (droplet digital PCR) method.
In one embodiment, step e) comprises DNA sequencing of the one or more polynucleotide of interest, e.g., after PCR amplification, or by nanopore sequencing.
In one embodiment, during step e) the one or more polynucleotide of interest is identified by a DNA barcode.
In one embodiment, the droplet sorter (200) comprises one or more sorting means (401 , 402).
In one embodiment, the one or more sorting means comprises, or consists of one or more electrode, one or more acoustic wave generator, one or more valve, and/or one or more pressure- controlled outlets.
In one embodiment, the one or more sorting means comprises at least two electrodes.
In one embodiment, the one or more sorting means consists of one electrode.
In one embodiment, the one or more sorting means consists of two electrodes.
In one embodiment, the biological library comprises or consists of wild-type cells with different genotype and/or different phenotype.
In one embodiment, the biological library comprises or consists of recombinant cells.
In one embodiment, the biological library encodes different variants of the same polypeptide of interest, preferably the polypeptide of interest is an enzyme.
In one embodiment, the biological library encodes different signal peptide variants.
In one embodiment, the biological library encodes different promoter variants.
In one embodiment, the biological library comprises different codon-optimized DNA sequences encoding the same amino acid sequence of a polypeptide of interest, e.g., a signal peptide, and/or an enzyme.
In one embodiment, the polynucleotide of interest encodes a polypeptide of interest.
In one embodiment, the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest encoding a variant of a polypeptide of interest.
In one embodiment, the polynucleotide of interest comprises a first polynucleotide of interest encoding a control sequence, and a second polynucleotide of interest encoding a polypeptide of interest.
In one embodiment, the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest encoding a variant of a control sequence.
In one embodiment, the control sequence is a promoter sequence, a signal peptide, a leader sequence, a polyadenylation sequence, a propeptide sequence, or a transcription terminator.
In one embodiment, the polynucleotide of interest comprises a first polynucleotide of interest encoding a signal peptide, and a second polynucleotide of interest encoding a polypeptide of
interest, wherein the first polynucleotide of interest is operatively linked to the second polynucleotide of interest and located upstream of the second polynucleotide of interest.
In one embodiment, the polynucleotide of interest comprises a first polynucleotide of interest comprising a promoter sequence, and a second polynucleotide of interest encoding a polypeptide of interest, wherein the first polynucleotide of interest is operatively linked to the second polynucleotide of interest and located upstream of the second polynucleotide of interest.
In one embodiment, the biological library comprises identical second polynucleotides of interest, and a plurality of variants of the first polynucleotides of interest.
In one embodiment, the biological library comprises identical first polynucleotides of interest, and a plurality of variants of the second polynucleotides of interest.
In one embodiment, the first polynucleotide of interest is heterologous to the second polynucleotide of interest.
In one embodiment, the first polynucleotide of interest is endogenous to the second polynucleotide of interest.
In one embodiment, the one or more polynucleotide of interest comprises a promoter, a polynucleotide encoding a signal peptide, a polynucleotide encoding a polypeptide of interest, or a native host cell gene.
In one embodiment, the polynucleotide of interest is substantially the whole genome of the host cell.
In one embodiment, the one or more polynucleotide of interest is heterologous to the host cell.
In one embodiment, the one or more polynucleotide of interest is endogenous to the host cell.
In one embodiment, the first polynucleotide of interest is heterologous to the host cell.
In one embodiment, the first polynucleotide of interest is endogenous to the host cell.
In one embodiment, the second polynucleotide of interest is heterologous to the host cell.
In one embodiment, the second polynucleotide of interest is endogenous to the host cell.
In one embodiment, the first and second polynucleotide of interest are heterologous to the host cell.
In one embodiment, the first and second polynucleotide of interest are endogenous to the host cell.
In one embodiment, the one or more polynucleotide of interest encodes a polypeptide of interest.
In one embodiment, the polypeptide of interest is an enzyme, a nanobody, an antibody, an antibody-fragment, a fluorescent polypeptide, e.g., GFP, or an alpha-lactalbumin.
In one embodiment, the amount of screenable product in the droplet is proportional to the amount of a polypeptide encoded by the one or more polynucleotide of interest.
In one embodiment, the amount of screenable product in the droplet is inversely proportional to the amount of a polypeptide encoded by the one or more polynucleotide of interest.
In one embodiment, the biological library comprises at least 100 different one or more polynucleotides of interest, at least 200 different one or more polynucleotides of interest, at least 500 different one or more polynucleotides of interest, at least 1 000 different one or more polynucleotides of interest, at least 2 000 different one or more polynucleotides of interest, at least 3 000 different one or more polynucleotides of interest, at least 5 000 different one or more polynucleotides of interest, at least 10 000 different one or more polynucleotides of interest, at least 100 000 different one or more polynucleotides of interest, at least 1 000 000 different one or more polynucleotides of interest, at least 10 000 000 different one or more polynucleotides of interest, at least 50 000 000 different one or more polynucleotides of interest, or at least 100 000 000 different polynucleotides of interest.
In one embodiment, the biological library comprises at least 100 different host cells, at least 200 different host cells, at least 500 different host cells, at least 1 000 different host cells, at least 2 000 different host cells, at least 3 000 different host cells, at least 5 000 different host cells est, at least 10 000 different host cells, at least 100 000 different host cells, at least 200 000 different host cells, at least 500 000 different host cells, at least 1 000 000 different host cells, at least 5 000 000 different host cells, at least 10 000 000 different host cells, or at least 100 000 000 different host cells.
In one embodiment, the amount of screenable product in the droplet is proportional to one or more of: stability of the polypeptide of interest, transcription of the polypeptide of interest, translation of the polypeptide of interest, secretion of the polypeptide of interest, yield of the polypeptide of interest, binding strength of the polypeptide of interest to a target molecule, and activity of the polypeptide of interest.
In one embodiment, the amount of screenable product in the droplet is inversely proportional to one or more of: stability of the polypeptide of interest, transcription of the polypeptide of interest, translation of the polypeptide of interest, secretion of the polypeptide of interest, yield of the
polypeptide of interest, binding strength of the polypeptide of interest to a target molecule, and activity of the polypeptide of interest.
In one embodiment, the amount of screenable product in the droplet is proportional to one or more of: cell number, viability of the host cell, cell division rate of the host cell, cell growth rate of the host cell, cell size of the host cell, and protein secretion of the host cell.
In one embodiment, the amount of screenable product in the droplet is inversely proportional to one or more of: cell number, viability of the host cell, cell division rate of the host cell, cell growth rate of the host cell, cell size of the host cell, and protein secretion of the host cell.
In one embodiment, the one or more droplets comprise a substrate.
In one embodiment, the substrate comprises or consists of the screenable product.
In one embodiment, the substrate is a fluorescent substrate.
In one embodiment, the substrate is a fluorogenic Rhodamine.
In one embodiment, the substrate is a fluorochrome.
In one embodiment, the substrate is a fluorogenic substrate.
In one embodiment, the substrate comprises a fluorophore, e.g., fluorescein, or fluorescein- labelled starch. In one embodiment, the substrate is Nile red.
In one embodiment, the substrate is DAPI (4’,6-diamidino-2-phenylindole).
In one embodiment, each droplet, before the optional incubation, comprises an average occupation of at most 0.01 cells, at most 0.02 cells, at most 0.03 cells, at most 0.04 cells, at most 0.05 cells, at most 0.06 cells, at most 0.07 cells, at most 0.08 cells, at most 0.09 cells, at most 0.1 cells, at most 0.2 cells, at most 0.3 cells, at most 0.4 cells, at most 0.5 cells, at most 0.6 cells, or at most 0.7 cells; preferably at most 0.1 cells.
In one embodiment, each droplet comprises an average occupation of at most 0.01 polynucleotide of interest, at most 0.02 polynucleotide of interest, at most 0.03 polynucleotide of interest, at most 0.04 polynucleotide of interest, at most 0.05 polynucleotide of interest, at most 0.06 polynucleotide of interest, at most 0.07 polynucleotide of interest, at most 0.08 polynucleotide of interest, at most 0.09 polynucleotide of interest, at most 0.1 polynucleotide of interest, at most 0.2 polynucleotide of interest, at most 0.3 polynucleotide of interest, at most 0.4 polynucleotide of interest, at most 0.5 polynucleotide of interest, at most 0.6 polynucleotide of interest, or at most 0.7 polynucleotide of interest; preferably at most 0.1 polynucleotide of interest.
In one embodiment, the droplet sorting is facilitated by an electric field generated by one or more electrode (401 , 402) adjacent to the droplet sorter.
In one embodiment, the droplet sorting is facilitated by an acoustic wave generated by one or more acoustic wave generators (401 , 402) adjacent to the droplet sorter.
In one embodiment, the droplet sorting is facilitated by a local pressure change generated by one or more pressure-controlled outlets (401 , 402) adjacent to the droplet sorter, e.g., wherein the one or more pressure-controlled outlets are comprised in one or more output channel.
In one embodiment, the amount of screenable product in step c) is determined using a fluorescence-based signal, absorbance, Raman spectroscopy, mass spectrometry (MS), or MALDI-MS.
In one embodiment, a relative and/or an absolute amount of the screenable product per droplet is determined by the one or more sensing means (600).
In one embodiment, after step d), one or more output channels comprise at least 10 000 droplets, at least 50 000 droplets, at least 100 000 droplets, at least 500 000 droplets, at least 1 000 000 droplets, at least 2 000 000 droplets, at least 5 000 000 droplets, at least 10 000 000 droplets, or at least 100 000 000 droplets.
In one embodiment, the droplet sorter comprises at least four output channels, at least five output channels, at least six output channels, at least seven output channels, at least 8 output channels, at least 9 output channels, or at least 10 output channels.
In one embodiment, the host cell is is a yeast host cell, e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell.
In one embodiment, the host cell is a filamentous fungal host cell, e.g., an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Fili basidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell, in particular, an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Bjerkandera adusta, Ceriporiopsis aneirina, Ceriporiopsis caregiea, Ceriporiopsis gilvescens,
Ceriporiopsis pannocinta, Ceriporiopsis rivulosa, Ceriporiopsis subrufa, Ceriporiopsis subvermispora, Chrysosporium inops, Chrysosporium keratinophilum, Chrysosporium lucknowense, Chrysosporium merdarium, Chrysosporium pannicola, Chrysosporium queenslandicum, Chrysosporium tropicum, Chrysosporium zonatum, Coprinus cinereus, Coriolus hirsutus, Fusarium bactridioides, Fusarium cerealis, Fusarium crookwellense, Fusarium culmorum, Fusarium graminearum, Fusarium graminum, Fusarium heterosporum, Fusarium negundi, Fusarium oxysporum, Fusarium reticulatum, Fusarium roseum, Fusarium sambucinum, Fusarium sarcochroum, Fusarium sporotrichioides, Fusarium sulphureum, Fusarium torulosum, Fusarium trichothecioides, Fusarium venenatum, Humicola insolens, Humicola lanuginosa, Mucor miehei, Myceliophthora thermophila, Neurospora crassa, Penicillium purpurogenum, Phanerochaete chrysosporium, Phlebia radiata, Pleurotus eryngii, Talaromyces emersonii, Thielavia terrestris, Trametes villosa, Trametes versicolor, Trichoderma harzianum, Trichoderma koningii, Trichoderma longibrachiatum, Trichoderma reesei, or Trichoderma viride cell.
In one embodiment, the host cell is a prokaryotic host cell, e.g., a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Enterococcus, Geobacillus, Lactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gramnegative bacteria selected from the group consisting of Campylobacter, E. coli, Flavobacterium, Fusobacterium, Helicobacter, llyobacter, Neisseria, Pseudomonas, Salmonella, and Ureaplasma cells, such as Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, Bacillus thuringiensis, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp. Zooepidemicus, Streptomyces achromogenes, Streptomyces avermitilis, Streptomyces coelicolor, Streptomyces griseus, and Streptomyces lividans cells.
In one embodiment, the host cell is Bacillus subtilis.
In one embodiment, the host cell is Bacillus licheniformis.
In one embodiment, the host cell is Trichoderma reesei.
In one embodiment, the host cell is Aspergillus niger.
In one embodiment, the host cell is Aspergillus oryzae.
In one embodiment, the host cell is a Bifidobacterium, e.g., Bifidobacterium animalis, or Bifidobacterium animalis subsp. lactis.
It is envisioned that the method of the present invention may employ a library of isolated polynucleotides and in vitro expression systems, as well as recombinant cells, and/or wildytpe cells. However, a preferred embodiment of the invention is a setup comprising host cells, i.e., wherein the library is comprised within a host cell.
In a preferred embodiment, each individual polynucleotide of the library is in its own separate host cell. Accordingly, in one embodiment, each droplet in step (b) of the first aspect comprises at most a single host cell, which optionally can be incubated to grow into a plurality of cells before determining the amount of screenable product in step c).
The features of the one or more polynucleotide of interest of the biologial library are assayed either qualitatively or quantitatively, e.g., by detecting the conversion of an enzyme substrate into a detectable or quantifiable enzyme product (= screenable product). For example, a fluorogenic enzyme substrate is added which is turned into a fluorescent enzyme product (screenable product) to be detected or measured.
Accordingly, in a preferred embodiment of the invention, the substrate for the one or more enzyme is fluorogenic and the activity of the enzyme converts the fluorogenic substrate into a fluorescent product (screenable product).
Once a particularly interesting feature has been detected in a selected droplet or in a plurality of droplets collected in one or more output channels, the polynucleotide library member inside the droplet needs to be identified. Typically, the polynucleotide is identified through DNA sequencing.
The polynucleotide may also have been outfitted with an identifying sequence tag to serve as a "bar code" when the library was constructed, thus obviating the need for sequencing. Based on the identification of the bar-code, the DNA sequence of the polynucleotide would then immediately be known and it would, thus, be identified.
In a preferred aspect of the invention, the one or more polynucleotide of interest is identified in step e) by DNA sequencing of the one or more polynucleotide of interest.
In the first aspect of the invention, several aliquotes of solutions are introduced into the droplets. The aliquotes are usually much smaller in volume than the droplets, but they may in principle range in size up to the same volume as the droplets or even larger. In the examples below, the aliquotes are significantly smaller than the droplets. There are many ways of introducing an aliquote into a droplet in a microfluidic device or, termed in another way, to merge or coalesce two droplets.
The design of microfluidic devices that enable the application of an electric field to merge or coalesce two or more droplets is disclosed, for example, in WO 2007/061448. Another way to
introduce small aliquotes of an aqueous liquid into an aqueous droplet in a microfluidic device is known as "pico-injection" and is disclosed, for example, in WO 2010/151776.
In the examples below, the aliquotes were introduced into the droplets by merging or coalescing the aliquotes and the droplets through the application of an electric field.
Accordingly, in a preferred embodiment of the first aspect, the aliquotes are introduced into the droplets by merging or coalescing the aliquotes and the droplets through the application of an electric field or by injection.
Figure 1 shows one embodiment of the invention, wherein the device comprises a droplet sorter (200) with five output channels (301 , 302, 303, 304, 305), and with an incubation zone (500). The device furthermore comprises sensing means (600) and two electrodes (401 , 402). Droplets comprising host cells and screenable product are shown in circles. Schematically, the amount of screenable product present in each droplet is represented by a black filling. Schematically, the amount of black color is proportional to the amount of screenable product present in each droplet.
Flow directed from the incubation zone (500) to the output channels (301 , 302, 303, 304, 305) allows droplets to pass the sensing means (600) which determines the amount of screenable product in each droplet (step c)). The sensing means (600) communicates the amount of screenable product to the electrodes (401 , 402). Based on the information about the amount of screenable product in each droplet, the electrodes apply an electric field which allows sorting of the droplet into one of the five output channels (step d)). In this example, droplets with high amount of screenable product are sorted into the top output channel (305) and collected in pool 5, while droplets with no/low amount of screenable product are sorted into the lowest output channel (301) and collected in pool 1. Furthermore, droplets with intermediate amounts of screenable product are sorted into the remaining three output channels (302, 303, and 304) and collected in pools 2-4. The design with three or more output channels allows parallel sorting into multiple output channels, using multiple predetermined threshold values, wherein no sample volume is lost.
Library design and generation of variant sequences
The methods of the present invention utilize biological libraries of variants (amino acid sequences, DNA sequences and/or host cell variants), but also enable the generation of synthetic variant sequences based on the read-out of the method.
In one aspect, synthetic sequence variants are generated by substitution, deletion or addition of one or several amino acids (for polypeptide variants) or one or several nucleotides (for DNA sequence variants).
In one aspect, the polypeptide variant is derived from a mature polypeptide by substitution, deletion or addition of one or several amino acids. In one aspect, the number of amino acid substitutions, deletions and/or insertions introduced into the polypeptide is up to 15, e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, or 15. The amino acid changes may be of a minor nature, that is conservative amino acid substitutions or insertions that do not significantly affect the folding and/or activity of the protein; small deletions, typically of 1-30 amino acids; small amino- or carboxyl-terminal extensions, such as an amino-terminal methionine residue; a small linker peptide of up to 20-25 residues; or a small extension that facilitates purification by changing net charge or another function, such as a poly-histidine tract, an antigenic epitope or a binding module.
Essential amino acids in a polypeptide can be identified according to procedures known in the art, such as site-directed mutagenesis or alanine-scanning mutagenesis (Cunningham and Wells, 1989, Science 244: 1081-1085). In the latter technique, single alanine mutations are introduced at every residue in the molecule, and the resultant molecules are tested for enzyme activity to identify amino acid residues that are critical to the activity of the molecule. See also, Hilton et al., 1996, J. Biol. Chem. 271 : 4699-4708. The active site of the enzyme or other biological interaction can also be determined by physical analysis of structure, as determined by such techniques as nuclear magnetic resonance, crystallography, electron diffraction, or photoaffinity labeling, in conjunction with mutation of putative contact site amino acids. See, for example, de Vos et al., 1992, Science 255: 306-312; Smith et al., 1992, J. Mol. Biol. 224: 899-904; Wlodaver et al., 1992, FEBS Lett. 309: 59-64. The identity of essential amino acids can also be inferred from an alignment with a related polypeptide, and/or be inferred from sequence homology and conserved catalytic machinery with a related polypeptide or within a polypeptide or protein family with polypeptides/proteins descending from a common ancestor, typically having similar three- dimensional structures, functions, and significant sequence similarity. Additionally or alternatively, protein structure prediction tools can be used for protein structure modelling to identify essential amino acids and/or active sites of polypeptides. See, for example, Jumper et al., 2021 , “Highly accurate protein structure prediction with AlphaFold”, Nature 596: 583-589.
Single or multiple amino acid substitutions, deletions, and/or insertions can be made and tested using known methods of mutagenesis, recombination, and/or shuffling, followed by a relevant screening procedure, such as those disclosed by Reidhaar-Olson and Sauer, 1988, Science 241 : 53-57; Bowie and Sauer, 1989, Proc. Natl. Acad. Sci. USA 86: 2152-2156; WO 95/17413; or WO 95/22625. Other methods that can be used include error-prone PCR, phage display (e.g., Lowman etal., 1991 , Biochemistry 30: 10832-10837; US 5,223,409; WO 92/06204), and region-directed mutagenesis (Derbyshire et al., 1986, Gene 46: 145; Ner et a/., 1988, DNA 7: 127).
Mutagenesis/shuffling methods can be combined with high-throughput, automated screening methods to detect activity of cloned, mutagenized polypeptides expressed by host cells (Ness et al., 1999, Nature Biotechnology 17: 893-896). Mutagenized DNA molecules that encode active polypeptides can be recovered from the host cells and rapidly sequenced using standard methods in the art. These methods allow the rapid determination of the importance of individual amino acid residues in a polypeptide.
In one aspect, DNA-variant sequences are derived by substitution, deletion or addition of one or several nucleic acids.
The polynucleotide may also be mutated by introduction of nucleotide substitutions that do not result in a change in the amino acid sequence of the polypeptide, but which correspond to the codon usage of the host organism intended for production of the enzyme, or by introduction of nucleotide substitutions that may give rise to a different amino acid sequence. For a general description of nucleotide substitution, see, e.g., Ford et al., 1991 , Protein Expression and Purification 2: 95-107.
DNA sequences for the library design may be obtained from microorganisms of any genus.
Similarly, polypeptide sequences comprising e.g., an enzyme, a signal peptide, or a nanobody may be obtained from microorganisms of any genus. For purposes of the present invention, the term “obtained from” as used herein in connection with a given source shall mean that the polypeptide encoded by a polynucleotide is produced by the source or by a strain in which the polynucleotide of the invention has been inserted. In one aspect, the polypeptide obtained from a given source is secreted extracellularly.
It will be understood that for the aforementioned species, the invention encompasses both the perfect and imperfect states, and other taxonomic equivalents, e.g., anamorphs, regardless of the species name by which they are known. Those skilled in the art will readily recognize the identity of appropriate equivalents.
The polypeptides may be identified and obtained from other sources including microorganisms isolated from nature (e.g., soil, composts, water, etc.) or DNA samples obtained directly from natural materials (e.g., soil, composts, water, etc.) using the above-mentioned probes. Techniques for isolating microorganisms and DNA directly from natural habitats are well known in the art. A polynucleotide encoding the polypeptide may then be obtained by similarly screening a genomic DNA or cDNA library of another microorganism or mixed DNA sample. Once a polynucleotide encoding a polypeptide has been detected with the probe(s), the polynucleotide can be isolated or cloned by utilizing techniques that are known to those of ordinary skill in the art (see, e.g., Davis et al., 2012, Basic Methods in Molecular Biology, Elsevier).
Screening a biological library comprising control sequences
The present invention also relates to screening a biological library, wherein the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest comprising a first polynucleotide of interest encoding a control sequence, and a second polynucleotide of interest encoding a polypeptide of interest. In one embodiment, the biological library comprises a plurality of variants of the control sequence.
Preferably, the second polynucleotide of interest is operably linked to one or more control sequences (first polynucleotide of interest) that direct the expression of the second polynucleotide of interest in a suitable host cell under conditions compatible with the control sequences.
The control sequence may be manipulated in a variety of ways to provide for expression of the polypeptide of interest, and/or to create a control sequence library. Manipulation of the control sequence prior to its insertion into a vector may be desirable or necessary depending on the expression vector. Techniques for modifying the control sequences utilizing recombinant DNA methods are well known in the art.
Promoters
The control sequence may be a promoter, a polynucleotide that is recognized by a host cell for expression of a polynucleotide encoding a polypeptide of the present invention. The promoter contains transcriptional control sequences that mediate the expression of the polypeptide. The promoter may be any polynucleotide that shows transcriptional activity in the host cell including mutant, truncated, and hybrid promoters, and may be obtained from genes encoding extracellular or intracellular polypeptides either homologous or heterologous to the host cell.
Examples of suitable promoters for directing transcription of the polynucleotide of the present invention in a bacterial host cell are described in Sambrook et al. , 1989, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Lab., NY, Davis et al., 2012, supra, and Song et al., 2016, PLOS One 11(7): e0158447.
Examples of suitable promoters for directing transcription of the polynucleotide of the present invention in a filamentous fungal host cell are promoters obtained from Aspergillus, Fusarium, Rhizomucor and Trichoderma cells, such as the promoters described in Mukherjee et al., 2013, “Trichoderma: Biology and Applications”, and by Schmoll and Dattenbdck, 2016, “Gene Expression Systems in Fungi: Advancements and Applications”, Fungal Biology.
For expression in a yeast host, examples of useful promoters are described by Smolke et al., 2018, “Synthetic Biology: Parts, Devices and Applications” (Chapter 6: Constitutive and Regulated Promoters in Yeast: How to Design and Make Use of Promoters in S. cerevisiae), and
by Schmoll and Dattenbdck, 2016, “Gene Expression Systems in Fungi: Advancements and Applications”, Fungal Biology.
Terminators
The control sequence may also be a transcription terminator, which is recognized by a host cell to terminate transcription. The terminator is operably linked to the 3’-terminus of the polynucleotide encoding the polypeptide. Any terminator that is functional in the host cell may be used in the present invention.
Preferred terminators for bacterial host cells may be obtained from the genes for Bacillus clausii alkaline protease (aprH), Bacillus licheniformis alpha-amylase (amyL), and Escherichia coli ribosomal RNA (rrnB).
Preferred terminators for filamentous fungal host cells may be obtained from Aspergillus or Trichoderma species, such as obtained from the genes for Aspergillus niger glucoamylase, Trichoderma reesei beta-glucosidase, Trichoderma reesei cellobiohydrolase I, and Trichoderma reesei endoglucanase I, such as the terminators described in Mukherjee et al., 2013, “Trichoderma: Biology and Applications”, and by Schmoll and Dattenbdck, 2016, “Gene Expression Systems in Fungi: Advancements and Applications”, Fungal Biology.
Preferred terminators for yeast host cells may be obtained from the genes for Saccharomyces cerevisiae enolase, Saccharomyces cerevisiae cytochrome C (CYC1), and Saccharomyces cerevisiae glyceraldehyde-3-phosphate dehydrogenase. Other useful terminators for yeast host cells are described by Romanos et al., 1992, Yeast 8: 423-488. mRNA Stabilizers
The control sequence may also be an mRNA stabilizer region downstream of a promoter and upstream of the coding sequence of a gene which increases expression of the gene.
Examples of suitable mRNA stabilizer regions are obtained from a Bacillus thuringiensis crylllA gene (WO 94/25612) and a Bacillus subtilis SP82 gene (Hue etal., 1995, J. Bacteriol. 177: 3465-3471).
Examples of mRNA stabilizer regions for fungal cells are described in Geisberg et al., 2014, Cell 156(4): 812-824, and in Morozov et al., 2006, Eukaryotic Ce// 5(11): 1838-1846.
Leader Sequences
The control sequence may also be a leader, a non-translated region of an mRNA that is important for translation by the host cell. The leader is operably linked to the 5’-terminus of the polynucleotide encoding the polypeptide. Any leader that is functional in the host cell may be used.
Suitable leaders for bacterial host cells are described by Hambraeus et al., 2000, Microbiology 146(12): 3051-3059, and by Kaberdin and Blasi, 2006, FEMS Microbiol. Rev. 30(6): 967-979.
Preferred leaders for filamentous fungal host cells may be obtained from the genes for Aspergillus oryzae TAKA amylase and Aspergillus nidulans triose phosphate isomerase.
Suitable leaders for yeast host cells may be obtained from the genes for Saccharomyces cerevisiae enolase (ENO-1), Saccharomyces cerevisiae 3-phosphoglycerate kinase, Saccharomyces cerevisiae alpha-factor, and Saccharomyces cerevisiae alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase (ADH2/GAP).
Polyadenylation Sequences
The control sequence may also be a polyadenylation sequence, a sequence operably linked to the 3’-terminus of the polynucleotide which, when transcribed, is recognized by the host cell as a signal to add polyadenosine residues to transcribed mRNA. Any polyadenylation sequence that is functional in the host cell may be used.
Preferred polyadenylation sequences for filamentous fungal host cells are obtained from the genes for Aspergillus nidulans anthranilate synthase, Aspergillus niger glucoamylase, Aspergillus niger alpha-glucosidase, Aspergillus oryzae TAKA amylase, and Fusarium oxysporum trypsin-like protease.
Useful polyadenylation sequences for yeast host cells are described by Guo and Sherman, 1995, Mol. Cellular Biol. 15: 5983-5990.
Signal Peptides
As shown in the examples, the control sequence may also be a signal peptide coding region that encodes a signal peptide linked to the N-terminus of a polypeptide and directs the polypeptide into the cell’s secretory pathway. The 5’-end of the coding sequence of the polynucleotide may inherently contain a signal peptide coding sequence naturally linked in translation reading frame with the segment of the coding sequence that encodes the polypeptide. Alternatively, the 5’-end of the coding sequence may contain a signal peptide coding sequence that is heterologous to the coding sequence. A heterologous signal peptide coding sequence may be required where the coding sequence does not naturally contain a signal peptide coding sequence. Alternatively, a heterologous signal peptide coding sequence may simply replace the natural signal peptide coding sequence to enhance secretion of the polypeptide. Any signal peptide coding sequence that directs the expressed polypeptide into the secretory pathway of a host cell may be used.
Effective signal peptide coding sequences for bacterial host cells are the signal peptide coding sequences obtained from the genes for Bacillus NCIB 11837 maltogenic amylase, Bacillus licheniformis subtilisin, Bacillus licheniformis beta-lactamase, Bacillus stearothermophilus alphaamylase, Bacillus stearothermophilus neutral proteases (nprT, nprS, nprM), and Bacillus subtilis prsA. Further signal peptides are described by Freudl, 2018, Microbial Cell Factories 17: 52.
Effective signal peptide coding sequences for filamentous fungal host cells are the signal peptide coding sequences obtained from the genes for Aspergillus niger neutral amylase, Aspergillus niger glucoamylase, Aspergillus oryzae TAKA amylase, Humicola insolens cellulase, Humicola insolens endoglucanase V, Humicola lanuginosa lipase, and Rhizomucor miehei aspartic proteinase, such as the signal peptide described by Xu etal., 2018, Biotechnology Letters 40: 949-955
Useful signal peptides for yeast host cells are obtained from the genes for Saccharomyces cerevisiae alpha-factor and Saccharomyces cerevisiae invertase. Other useful signal peptide coding sequences are described by Romanos et al., 1992, supra.
Propeptides
The control sequence may also be a propeptide coding sequence that encodes a propeptide positioned at the N-terminus of a polypeptide. The resultant polypeptide is known as a proenzyme or propolypeptide (or a zymogen in some cases). A propolypeptide is generally inactive and can be converted to an active polypeptide by catalytic or autocatalytic cleavage of the propeptide from the propolypeptide. The propeptide coding sequence may be obtained from the genes for Bacillus subtilis alkaline protease (aprE), Bacillus subtilis neutral protease (nprT), Myceliophthora thermophila laccase (WO 95/33836), Rhizomucor miehei aspartic proteinase, and Saccharomyces cerevisiae alpha-factor.
Where both signal peptide and propeptide sequences are present, the propeptide sequence is positioned next to the N-terminus of a polypeptide and the signal peptide sequence is positioned next to the N-terminus of the propeptide sequence. Additionally or alternatively, when both signal peptide and propeptide sequences are present, the polypeptide may comprise only a part of the signal peptide sequence and/or only a part of the propeptide sequence. Alternatively, the final or isolated polypeptide may comprise a mixture of mature polypeptides and polypeptides which comprise, either partly or in full length, a propeptide sequence and/or a signal peptide sequence.
Regulatory Sequences
It may also be desirable to add regulatory sequences that regulate expression of the polypeptide relative to the growth of the host cell. Examples of regulatory sequences are those
that cause expression of the gene to be turned on or off in response to a chemical or physical stimulus, including the presence of a regulatory compound.
Regulatory sequences in prokaryotic systems include the lac, tac, and trp operator systems. In yeast, the ADH2 system or GAL1 system may be used. In filamentous fungi, the Aspergillus n/gerglucoamylase promoter, Aspergillus oryzae TAKA alpha-amylase promoter, and Aspergillus oryzae glucoamylase promoter, Trichoderma reesei cellobiohydrolase I promoter, and Trichoderma reesei cellobiohydrolase II promoter may be used. Other examples of regulatory sequences are those that allow for gene amplification. In fungal systems, these regulatory sequences include the dihydrofolate reductase gene that is amplified in the presence of methotrexate, and the metallothionein genes that are amplified with heavy metals.
Transcription Factors
The control sequence may also be a transcription factor, a polynucleotide encoding a polynucleotide-specific DNA-binding polypeptide that controls the rate of the transcription of genetic information from DNA to mRNA by binding to a specific polynucleotide sequence. The transcription factor may function alone and/or together with one or more other polypeptides or transcription factors in a complex by promoting or blocking the recruitment of RNA polymerase. Transcription factors are characterized by comprising at least one DNA-binding domain which often attaches to a specific DNA sequence adjacent to the genetic elements which are regulated by the transcription factor. The transcription factor may regulate the expression of a protein of interest either directly, i.e., by activating the transcription of the gene encoding the protein of interest by binding to its promoter, or indirectly, i.e., by activating the transcription of a further transcription factor which regulates the transcription of the gene encoding the protein of interest, such as by binding to the promoter of the further transcription factor. Suitable transcription factors for fungal host cells are described in WO 2017/144177. Suitable transcription factors for prokaryotic host cells are described in Seshasayee et al., 2011 , Subcellular Biochemistry 52: 7- 23, as well in Balleza et al., 2009, FEMS Microbiol. Rev. 33(1): 133-151.
Expression Vectors
The method of the present invention also utilizes recombinant expression vectors comprising a polynucleotide of interest. The various nucleotide and control sequences may be joined together to produce a recombinant expression vector that may include one or more convenient restriction sites to allow for insertion or substitution of the polynucleotide of interest at such sites. Alternatively, the polynucleotide may be expressed by inserting the polynucleotide or a nucleic acid construct comprising the polynucleotide into an appropriate vector for expression.
In creating the expression vector, the coding sequence is located in the vector so that the coding sequence is operably linked with the appropriate control sequences for expression.
The recombinant expression vector may be any vector (e.g., a plasmid or virus) that can be conveniently subjected to recombinant DNA procedures and can bring about expression of the polynucleotide. The choice of the vector will typically depend on the compatibility of the vector with the host cell into which the vector is to be introduced. The vector may be a linear or closed circular plasmid.
The vector may be an autonomously replicating vector, i.e., a vector that exists as an extrachromosomal entity, the replication of which is independent of chromosomal replication, e.g., a plasmid, an extrachromosomal element, a minichromosome, or an artificial chromosome. The vector may contain any means for assuring self-replication. Alternatively, the vector may be one that, when introduced into the host cell, is integrated into the genome and replicated together with the chromosome(s) into which it has been integrated. Furthermore, a single vector or plasmid or two or more vectors or plasmids that together contain the total DNA to be introduced into the genome of the host cell, or a transposon, may be used.
The vector preferably contains one or more selectable markers that permit easy selection of transformed, transfected, transduced, or the like cells. A selectable marker is a gene the product of which provides for biocide or viral resistance, resistance to heavy metals, prototrophy to auxotrophs, and the like.
The vector preferably contains at least one element that permits integration of the vector into the host cell's genome or autonomous replication of the vector in the cell independent of the genome.
For integration into the host cell genome, the vector may rely on the polynucleotide’s sequence encoding the polypeptide or any other element of the vector for integration into the genome by homologous recombination, such as homology-directed repair (HDR), or non- homologous recombination, such as non-homologous end-joining (NHEJ).
For autonomous replication, the vector may further comprise an origin of replication enabling the vector to replicate autonomously in the host cell in question. The origin of replication may be any plasmid replicator mediating autonomous replication that functions in a cell. The term “origin of replication” or “plasmid replicator” means a polynucleotide that enables a plasmid or vector to replicate in vivo.
More than one copy of a polynucleotide of interest may be inserted into a host cell to increase production of a polypeptide. For example, 2 or 3 or 4 or 5 or more copies are inserted into a host cell. An increase in the copy number of the polynucleotide can be obtained by integrating at least one additional copy of the sequence into the host cell genome or by including an amplifiable selectable marker gene with the polynucleotide where cells containing amplified
copies of the selectable marker gene, and thereby additional copies of the polynucleotide, can be selected for by cultivating the cells in the presence of the appropriate selectable agent.
Host Cells
In a second aspect, the invention relates to a host cell comprising in its genome a polynucleotide sequence of interest generated in additional step h), and/or a polynucleotide sequence identified in step e).
The present invention also relates to host cells which are not recombinant, i.e. , wild type host cells. Such host cells include but are not limited to probiotics, e.g. wherein the host cell is a Bifidobacterium, e.g., Bifidobacterium animalis, or Bifidobacterium animalis subsp. lactis.
The present invention also relates to recombinant host cells, comprising a polynucleotide of interest, and/or comprising a polynucleotide operably linked to one or more control sequences that direct the production of a polypeptide of interest.
A construct or vector comprising a polynucleotide is introduced into a host cell so that the construct or vector is maintained as a chromosomal integrant or as a self-replicating extra- chromosomal vector as described earlier. The choice of a host cell will to a large extent depend upon the gene encoding the polypeptide and its source. The polypeptide can be native or heterologous to the recombinant host cell. Also, at least one of the one or more control sequences can be heterologous to the polynucleotide encoding the polypeptide. The recombinant host cell may comprise a single copy, or at least two copies, e.g., three, four, five, or more copies of the polynucleotide of the present invention.
The host cell may be any mammalian cell useful in the recombinant production of a polypeptide of interest, e.g., a Chinese hamster ovary cell, a BHK cell, a mouse cell, a HEK cell.
The host cell may be any microbial cell useful in the recombinant production of a polypeptide of interest, e.g., a prokaryotic cell or a fungal cell.
The prokaryotic host cell may be any Gram-positive or Gram-negative bacterium. Grampositive bacteria include, but are not limited to, Bacillus, Bifidobacteria, e.g. BB-12®, Clostridium, Enterococcus, Geobacillus, Lactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, and Streptomyces. Gram-negative bacteria include, but are not limited to, Campylobacter, E. coli, Flavobacterium, Fusobacterium, Helicobacter, llyobacter, Neisseria, Pseudomonas, Salmonella, and Ureaplasma.
The bacterial host cell may be any Bacillus cell including, but not limited to, Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, and Bacillus
thuringiensis cells. In an embodiment, the Bacillus cell is a Bacillus amyloliquefaciens, Bacillus licheniformis and Bacillus subtilis cell.
For purposes of this invention, Bacillus classes/genera/species shall be defined as described in Patel and Gupta, 2020, Int. J. Syst. Evol. Microbiol. 70: 406-438.
The bacterial host cell may also be any Streptococcus cell including, but not limited to, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp. Zooepidemicus cells.
The bacterial host cell may also be any Streptomyces cell including, but not limited to, Streptomyces achromogenes, Streptomyces avermitilis, Streptomyces coelicolor, Streptomyces griseus, and Streptomyces lividans cells.
Methods for introducing DNA into prokaryotic host cells are well-known in the art, and any suitable method can be used including but not limited to protoplast transformation, competent cell transformation, electroporation, conjugation, transduction, with DNA introduced as linearized or as circular polynucleotide. Persons skilled in the art will be readily able to identify a suitable method for introducing DNA into a given prokaryotic cell depending, e.g., on the genus. Methods for introducing DNA into prokaryotic host cells are for example described in Heinze et al., 2018, BMC Microbiology 18:56, Burke et al., 2001 , Proc. Natl. Acad. Sci. USA 98: 6289-6294, Choi et al., 2006, J. Microbiol. Methods 64: 391-397, and Donald et al., 2013, J. Bacteriol. 195(11): 2612- 2620.
The host cell may be a fungal cell. “Fungi” as used herein includes the phyla Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota as well as the Oomycota and all mitosporic fungi (as defined by Hawksworth et al., In, Ainsworth and Bisby’s Dictionary of The Fungi, 8th edition, 1995, CAB International, University Press, Cambridge, UK).
Fungal cells may be transformed by a process involving protoplast-mediated transformation, Agrobacterium-mediated transformation, electroporation, biolistic method and shock-wave-mediated transformation as reviewed by Li et al., 2017, Microbial Cell Factories 16: 168 and procedures described in EP 238023, Yelton et al., 1984, Proc. Natl. Acad. Sci. USA 81 : 1470-1474, Christensen et al., 1988, Bio/TechnologyQ: 1419-1422, and Lubertozzi and Keasling, 2009, Biotechn. Advances 27: 53-75. However, any method known in the art for introducing DNA into a fungal host cell can be used, and the DNA can be introduced as linearized or as circular polynucleotide.
The fungal host cell may be a yeast cell. “Yeast” as used herein includes ascosporogenous yeast (Endomycetales), basidiosporogenous yeast, and yeast belonging to the Fungi Imperfecti (Blastomycetes). For purposes of this invention, yeast shall be defined as described in Biology and Activities of Yeast (Skinner, Passmore, and Davenport, editors, Soc. App. Bacteriol. Symposium Series No. 9, 1980).
The yeast host cell may be a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell. In a preferred embodiment, the yeast host cell is a Pichia or Komagataella cell, e.g., a Pichia pastoris cell (Komagataella phaffii).
The fungal host cell may be a filamentous fungal cell. “Filamentous fungi” include all filamentous forms of the subdivision Eumycota and Oomycota (as defined by Hawksworth et al., 1995, supra). The filamentous fungi are generally characterized by a mycelial wall composed of chitin, cellulose, glucan, chitosan, mannan, and other complex polysaccharides. Vegetative growth is by hyphal elongation and carbon catabolism is obligately aerobic. In contrast, vegetative growth by yeasts such as Saccharomyces cerevisiae is by budding of a unicellular thallus and carbon catabolism may be fermentative.
The filamentous fungal host cell may be an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Fili basidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell. In a preferred embodiment, the filamentous fungal host cell is an Aspergillus, Trichoderma or Fusarium cell. In a further preferred embodiment, the filamentous fungal host cell is an Aspergillus niger, Aspergillus oryzae, Trichoderma reesei, or Fusarium venenatum cell.
For example, the filamentous fungal host cell may be an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Bjerkandera adusta, Ceriporiopsis aneirina, Ceriporiopsis caregiea, Ceriporiopsis gilvescens, Ceriporiopsis pannocinta, Ceriporiopsis rivulosa, Ceriporiopsis subrufa, Ceriporiopsis subvermispora, Chrysosporium inops, Chrysosporium keratinophilum, Chrysosporium lucknowense, Chrysosporium merdarium, Chrysosporium pannicola, Chrysosporium queenslandicum, Chrysosporium tropicum, Chrysosporium zonatum, Coprinus cinereus, Coriolus hirsutus, Fusarium bactridioides, Fusarium cerealis, Fusarium crookwellense, Fusarium culmorum, Fusarium graminearum, Fusarium graminum, Fusarium heterosporum, Fusarium negundi, Fusarium oxysporum, Fusarium reticulatum, Fusarium roseum, Fusarium sambucinum, Fusarium sarcochroum, Fusarium sporotrichioides, Fusarium sulphureum, Fusarium torulosum, Fusarium trichothecioides, Fusarium venenatum, Humicola insolens, Humicola lanuginosa, Mucor miehei, Myceliophthora thermophila, Neurospora crassa, Penicillium purpurogenum, Phanerochaete chrysosporium, Phlebia radiata, Pleurotus eryngii, Talaromyces emersonii, Thielavia terrestris, Trametes villosa, Trametes versicolor, Trichoderma harzianum,
Trichoderma koningii, Trichoderma longibrachiatum, Trichoderma reesei, or Trichoderma viride cell.
In an aspect, the host cell is isolated.
In another aspect, the host cell is purified.
Methods of Production
In a third aspect, the present invention also relates to methods of producing a polypeptide of interest, comprising (a) cultivating a cell according to the second aspect, under conditions conducive for production of the polypeptide; and optionally, (b) recovering the polypeptide. In one aspect, the cell is a Bacillus cell. In another aspect, the cell is a Bacillus licheniformis cell. In another aspect, the cell is an Aspergillus cell. In one aspect the cell is an Aspergillus niger cell. In one aspect the cell is an Aspergillus oryzae cell. In one aspect the cell is an Trichoderma reesei cell.
The present invention also relates to methods of producing a host cell broth, comprising (a) cultivating a host cell according to the second aspect, under conditions conducive for production of the host cell; and optionally, (b) recovering the host cell.
In one aspect the recovered cell is a Bifidobacterium, e.g., Bifidobacterium animalis, or Bifidobacterium animalis subsp. lactis.
The host cell is cultivated in a nutrient medium suitable for production of the polypeptide using methods known in the art. For example, the cell may be cultivated by shake flask cultivation, or small-scale or large-scale fermentation (including continuous, batch, fed-batch, or solid-state, and/or microcarrier-based fermentations) in laboratory or industrial fermentors in a suitable medium and under conditions allowing the polypeptide to be expressed and/or isolated. Suitable media are available from commercial suppliers or may be prepared according to published compositions (e.g., in catalogues of the American Type Culture Collection). If the polypeptide is secreted into the nutrient medium, the polypeptide can be recovered directly from the medium. If the polypeptide is not secreted, it can be recovered from cell lysates.
The polypeptide may be detected using methods known in the art that are specific for the polypeptide, including, but not limited to, the use of specific antibodies, formation of an enzyme product, disappearance of an enzyme substrate, or an assay determining the relative or specific activity of the polypeptide.
The polypeptide may be recovered from the medium using methods known in the art, including, but not limited to, collection, centrifugation, filtration, extraction, spray-drying, evaporation, or precipitation. In one aspect, a whole fermentation broth comprising the polypeptide is recovered. In another aspect, a cell-free fermentation broth comprising the polypeptide is recovered.
The polypeptide may be purified by a variety of procedures known in the art to obtain substantially pure polypeptides and/or polypeptide fragments (see, e.g., Wingfield, 2015, Current Protocols in Protein Science’, 80(1): 6.1.1-6.1.35; Labrou, 2014, Protein Downstream Processing, 1129: 3-10).
In an alternative aspect, the polypeptide is not recovered.
Computational Model
In one aspect, the invention relates to the methods according to the first aspect, additionally comprising step g) training a computational model, e.g., machine learning algorithm, with sequence data obtained from step e) and/or score data obtained from step f).
In one embodiment, the computational model of step g) is selected from the list of a linear regression, a decision tree, a random forest model, a support vector machine (SVM), a neural network, a K-means clustering, a native Bayes, a Gaussian mixture model (GMM), or a generative model.
In one embodiment, the computational model is performed in an electronic device, for providing a candidate biological sequence, the method comprising:
- obtaining input data indicative of an input biological sequence;
- determining the candidate biological sequence by applying a model to the input data; and providing biological sequence data indicative of the candidate biological sequence.
In one embodiment the model is a generative model.
In one embodiment, the generative model is non-unidirectional.
In one embodiment, the input biological sequence comprises one or more polynucleotide of interest identified in step e).
In one embodiment, the input biological sequence is one or more of: an amino acid sequence of a polypeptide of interest, a nucleic acid sequence encoding a polypeptide of interest, a control sequence, e.g., an expression control sequence, and a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
In one embodiment, the candidate biological sequence is one or more of: a control sequence, e.g., an expression control sequence, a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, an amino acid sequence of a polypeptide of interest, and a nucleic acid sequence encoding a polypeptide of interest.
In one embodiment, the input biological sequence is an amino acid sequence of a polypeptide of interest and/or a nucleic acid sequence encoding a polypeptide of interest, and wherein the candidate biological sequence is a nucleic acid sequence increasing compatibility with a host cell.
In one embodiment, the input biological sequence is an amino acid sequence of a polypeptide of interest and/or a nucleic acid sequence encoding a polypeptide of interest, and wherein the candidate biological sequence is a control sequence, e.g., an expression control sequence, and/or a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
In one embodiment, the input biological sequence is a control sequence, e.g., an expression control sequence, and/or a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, and wherein the candidate biological sequence is a nucleic acid sequence encoding a polypeptide of interest.
In one embodiment the model is a generative model.
In one embodiment the generative model is non-unidirectional.
In one embodiment, the generative model is one or more of: a generative adversarial network model, a Wasserstein generative adversarial network model, a diffusion model, and a variational autoencoder.
In one embodiment, applying the generative non-unidirectional model to the input data comprises partitioning the generative non-unidirectional model into a plurality of generators, wherein each generator of the plurality of generators is configured to determine, based on the input data, one or more candidate biological sequences for a subset of nucleotides and/or a subset of amino acids and a predetermined criterion.
In one embodiment, determining the candidate biological sequence by applying the model to the input data comprises: predicting, using the generator, a compatibility of the candidate biological sequence with the host cell; and
- determining the candidate biological sequence having a predicted compatibility meeting the predetermined criterion.
In one embodiment, the predetermined criterion is based on one or more of:
- a proportion of the set of nucleotides in the candidate biological sequence;
- a class of host cell;
- a host cell genus or species;
- a GC content of a host cell genome;
- a GC content of the candidate biological sequence; and
- a parameter associated with a property of the candidate biological sequence.
In one embodiment, the method comprises training the model based on a training set of biological sequences, wherein the training set of biological sequences includes training data indicative of one or more biological sequences related to the host cell.
In one embodiment, the training set of biological sequences is heterologous to the genus of the host cell, preferably heterologous to one or more species of the host cell.
In one embodiment, the training data comprises training input data indicative of one or more of: an amino acid sequence of a polypeptide of interest, a nucleic acid sequence encoding a polypeptide of interest, a control sequence, e.g., an expression control sequence, and a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
In one embodiment, the training data comprises training output data indicative of one or more of: a control sequence, e.g., an expression control sequence, a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, an amino acid sequence of a polypeptide of interest, and a nucleic acid sequence encoding a polypeptide of interest.
In one embodiment, training the model comprises predicting, using a discriminator taking as input the training set of biological sequences, and a training candidate biological sequence, a score indicative of the training candidate biological sequence being a referenced biological sequence
In one embodiment, the method is comprising obtaining, from a test environment data repository, experimental data associated with the candidate biological sequence and the host cell; wherein the experimental data indicates a yield performance of the candidate biological sequence associated with the host cell.
In one embodiment, the method is comprising validating the candidate biological sequence based on the experimental data.
In one embodiment, the method is comprising selecting one or more generators based on the experimental data.
In one embodiment, the method is comprising adapting the model based on the experimental data.
In one embodiment, obtaining input data indicative of an input biological sequence comprises obtaining the input data for the input biological sequence from a database and/or a memory of the electronic device.
The invention also relates to an electronic device comprising a memory circuitry, a processor circuitry, and an interface, wherein the electronic device is configured to perform any of the methods according to the invention.
The invention also relates to computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device cause the electronic device to perform any of the methods of the invention.
In one embodiment, the method is comprising additional step h) generating one or more synthetic polynucleotide of interest based on the output of the computational model.
In one embodiment, the one or more synthetic polynucleotide generated in step h) comprises or consists of a candidate biological sequence.
In one embodiment, the one or more synthetic polynucleotide of interest generated in step h) is codon-optimized.
In one embodiment, the one or more synthetic polynucleotide of interest generated in step h) encodes a polypeptide with increased substrate binding, increased receptor binding, increased substrate specificity, increased specific activity, and/or increased stability.
In one embodiment, the one or more synthetic polynucleotide of interest generated in step h) results increased expression of a polypeptide of interest.
In one embodiment, the one or more synthetic polynucleotide of interest generated in step h) comprises a control sequence.
The chances of finding satisfactory ‘partner’ biological sequences, using conventional screening strategies, are limited due to the combinatorial nature of the problem. For example, finding the best 10-mer peptide (e.g., a peptide consisting of 10 amino acids) that is to be used in combination with some biological sequence of interest requires screening 20A10 different biological sequences (if only the 20 different amino acids from the standard genetic code are considered). This illustrates that finding satisfactory sequences by screening clearly is a tedious task and possibly an unfeasible task in a reasonable time frame, especially as biological sequences most often include more than 10 amino acids.
The present disclosure allows extracting learnings from native biological sequence pairs, e.g. as the biological sequences occur in nature. For example, the biological sequence pairs can be found in a database, such as a public database and/or a private database (such as National Center for Biotechnology Information NCBI database and/or a Nucleotide Archive e.g. EMBL). It may be envisaged to transfer the extracted learning to the experimental settings. The present disclosure allows learning compatibility rules from native sequence pairs provided by a database and providing the learned compatibility rules. It may be envisaged that the compatibility rules are further adapted to experimental settings.
The present disclosure allows some interaction between machine learning approaches and experimental approaches. Machine learning-based analysis of biological sequences and experimental screening approaches contribute with two different layers of learnings. For example, the first layer of learnings allows extraction of complex biological rules that need to be obeyed while the second layer of learning accumulates data specific to the experimental settings. In the disclosed technique, for example, the learning extracted is used to provide (using a Deep Learning approach, such a generative model) a relevant subset of candidate biological sequences that can now be feasibly screened using experimental methods.
By applying the model, e.g. generative model, the disclosed technique may lead to unlocking the potential of experimental screening approaches by markedly reducing the complexity of the process of finding satisfactory ‘partner’ biological sequences. In other words, the actual quality of a candidate biological sequence is validated by experiments. In some examples, the learnings can be used as feedback to update the model (thus ‘informing’ the model about the quality of the suggestions).
The present disclosure provides a method, performed by an electronic device, for providing a candidate biological sequence. In other words, the method can be a computer-implemented method.
The method comprises obtaining input data indicative of an input biological sequence, e.g. from a biological library with diverse sequences. The input data can be associated with the input biological sequence and/or be representative of the input biological sequence. The input data comprises data representative of the input biological sequence, such as data representative of one or more properties of the input biological sequence. In some examples, the one or more properties of the input data include one or more of: a sequence of amino acids, a sequence of nucleic acids, a three-dimensional structure of the input biological polypeptide sequence (e.g. obtained by Alpha-Fold2), a folding of the input biological sequence, and a pairing of nucleic acids.
The method comprises determining the candidate biological sequence by applying a model to the input data, e.g. generative model. For example, the candidate biological sequence is determined for compatibility with a host cell, e.g., targeting compatibility with a given host cell, and/or for increasing compatibility with the given host cell. In other words, for example, the model applied to the input data aims at increasing one or more expression steps for a polypeptide of interest in a host cell, e.g. increasing or modifying one or more of: transcription, post-transcriptional modification, translation, post-translational modification, folding, secretion, phenotypic trait, and yield for a polypeptide of interest in a host cell. Yield may be intra-cellular and/or extra-cellular. In other words, yield may be seen as a target performance parameter to optimise when determining the candidate biological sequence. It may be noted that yield may be optimized via various steps, such as modified secretion, modified transcription, modified translation, separately or jointly.
In some examples, the model generates, based on the input data, the candidate biological sequence. In some examples, the candidate biological sequence may be determined based on one or more of: score generated in step f), host cell data, input data, and information indicating the type of biological sequence to be determined as candidate biological sequence.
The present invention is further described by the following examples that should not be construed as limiting the scope of the invention.
Examples
Example 1 : Multi-channel sorting & subsequent DNA sequencing
The strain library comprises 102 different signal peptide-encoding polynucleotides which were ordered as synthetic DNA, and fused upstream to a protease-encoding DNA sequence (encoding a serine endopeptidase). The library was transformed into Bacillus licheniformis strain MOL3320 as described in patent US 2019/0185847 A1. Selection was done on ERM. The resulting strains expressed the protease with different signal peptide variants.
The generated strains were then fermented compartmentalized in 50 pL droplets made from nutrient controlled media in fluorinated oil (HFE 7500) on a microfluidic droplet production chip. The droplets were stabilized with 2 wt% fluorosurfactant (008-Fluorosurfactant, RanBiotech). The resulting emulsion was incubated in a collection vial at 37°C for 4 days.
The serine endopeptidase secreted by the host cells hydrolyses a proprietary fluorogenic rhodamine substrate. Commercially available fluorescent rhodamine substrates include Rhodamine 110-bis-(succinoyl-L-alanyl-L-alanyl-L- prolyl-L-phenylalanyl amide) (CPC Scientifc Inc., San Jose, CA). The substrate was added to each droplet on a microfluidic chip and after 4 minutes of incubation, the fluorescent assay response was measured which is shown in Fig. 2. The release of Rhodamine 110 resulted in an increase of fluorescence at 520 nm. The increase is proportional to the enzymatic activity measured against a standard. Thus, the measured level of fluorescence signal is directly related to the concentration of protease in each droplet. Using the device shown in Fig. 1 , each droplet was sorted into one of the five output channels depending on its measured fluorescence level. Here, we used two electrodes on either side of the input channel to direct the droplets into one of the five output channels via dielectrophoretic droplet sorting. The five output channels were connected to five collection tubes and after collection of at least 1000 droplets in each tube, we seperated the collection tubes from the microfluidic device. The collected cell pools were named Pool 1 , Pool 2, Pool 3, Pool 4, and Pool 5 (see Fig. 1). The signal peptide sequences upstream of the protease-encoding DNA sequence contained in each pool were amplified via PCR, and were subsequently sent for DNA sequencing.
As shown in Fig. 2, Pool 1 comprised empty droplets (peak at around 2500 RFU) and droplets with no or very weak activity. As also shown in Fig. 2, sorting the library into five pools allows to investigate each signal peptide variant according to the protease activity of the related droplet. In other words, each library member is analyzed, providing an analysis of the complete library, without loosing data about one or more library members as each signal peptide coding sequence will be sequenced in the subsequent step after sorting is performed (see example 2).
Example 2: Signal peptide library analysis & MTP cultivation
For the scoring of the individual signal peptide sequences in MTP-format, the strains were fermented for app. 120 hours and protease activity was measured at the end of fermentation.
For the scoring of the individual signal peptide sequences using the outlined droplet sorting method, the abundance of each signal peptide sequence in each of the five pools was analyzed. The abundance of 102 signal peptide sequences in each pool can be seen in Figure 3 (black = high abundance; white = low abundance).
In Figure 3 the signal peptide sequences are ordered from ”1” to ”102” according to their protease activity measured in MTP-format (”0” = highest activity, ”102" = lowest activity). The five droplet sorting pools are ordered according to the fluorescence thresholds used for separation
(Pool 5 contains droplets measured with the highest fluorescence signals, and Pool 1 with the lowest fluorescence signals).
As can be seen from Figure 3, in Pool 1 (lowest fluorescence signal) predominantly signal peptides were identified which showed very poor protease activities in MTP-format. In Pool 3 (medium fluorescence signal) predominantly signal peptides were identified which showed moderate protease activities in MTP-format. In Pool 5 (highest fluorescence signal) predominantly signal peptides were identified which resulted in high protease activity during the MTP-format.
Further, as can be seen from Fig. 3, the proportion of SP sequences with high protease activities increased from Pool 1 to Pool 2, from Pool 2 to Pool 3, from Pool 3 to Pool 4, and had the highest proportion in Pool 5.
SP sequences with intermediate protease activites were found in Pool 2 (mid-low protease activities) and in Pool 4 (mid-high protease activities). Sorting into more than two pools, e.g., into 5 pools as shown in this example, increased the output resolution and allows to identify not only the very best or worst performers, but also to identify sequences which lay inbetween. With regards to signal peptides, for example, such approach is particularly beneficial when aiming for fine-tuned expression of a polypeptide of interest.
In conclusion, the screening method allowed to efficiently screen the complete library whilst sorting the library members into five pools, allowing a detailed analysis and understanding of each library member.
Example 3: Validation against MTP screening method
This example validates the results of the multi-channel sorted SP library (examples 1 and 2) against the results obtained from cultivating the same SP library in a MTP-format.
Based on the abundance of a given sequence in each of the sorting pools obtained by multichannel sorting, a score is calculated for a given sequence. First, for the given sequence, the fraction of the corresponding reads in a pool is determined by diving the number of reads of the given sequence by the total number of reads obtained when sequencing the entire pool. This is done for every pool generated in the experiment. Next, the relative proportions of the given sequence in each of the pools is calculated across all pools. Finally, the score of the given sequence is calculated by summing up the multiplication products of the relative proportions with the corresponding selection thresholds for each pool. Similarly, a score was obtained for each signal peptide sequence cultivated in MTP-format, based on the protease activity shown for each sequence.
The scores derived from the multi-channel droplet sorting experiment showed a high level of correlation to scores derived from measurements in MTP-format (R2 = 0.85, Figure 4). Figure 4 shows scores from microtiter plate (MTP-format, y-axis) plotted against scores from multi- channnel sorted signal peptide sequences (x-axis).
Due to the high correlation between the scores of both methods, we conclude that the multi-channel droplet sorting of the invention represents an improved and substantially cheaper screening method, saving both time and sample volume, whilst providing a high resolution output when screening large biological libraries (see Fig. 3).
Example 4: Microdroplet method reduces the standard deviation of the assigned scores
This example investigates the variability of measurements of clonal cells cultivated either in MTP format or in the microdroplet format of the invention. Cells used for this experiment are B. licheniformis cells with 6 copies of a secreted protease (protein of interest, POI).
We compared the variability of POI concentration measurements in 1 mL (96-deepwell MTP) and in 50 pL microdroplet format. We conducted parallel fermentations of a single POI- producing strain and measured the standard deviations of the released POI amount.
The results are shown in Figure 5, showing MTP fermentation in Fig. 5A, and droplet fermentation in Fig. 5B (y-axis: counts; x-axis: relative POI yield). The standard deviation (STD) for the 96-deepwell MTP format was 7.3% (see Figure 5A), while for the droplet format, it was 4.6% (see Figure 5B). This result demonstrates a significantly lower variability in the droplet format compared to the traditional MTP format. The decreased STD will directly impact the score calculation and result in more precise scoring of the library members. When using the scoring for a computational model, a more precise score is ultimately leading to improved input data qualityallowing to build a high confidence machine learning model. Thus, a model trained on data obtained from the microdroplet method of the invention has a higher confidence than a model trained on MTP data.
Example 5: Processing the screening results with a computational model
This example validates the results of the multi-channel microdroplet sorted SP library (examples 1 and 2) against the results obtained from cultivating the same SP library in a MTP- format.
We explicitly focused on a subset of signal peptides found in both the MTP method and also in the microdroplet method of the invention. With this subset, it was possible to compare the findings of both methods with another.. For this subset we ranked the signal peptides based
according to yield, explicitly at the amino acid level i.e. distentangled from the contributions to yield coming from the codon level. This was achieved by considering the codon distribution of every individual signal peptide and defined as the maximum achievable yield over codons, and then interpreted as the yield potential coming exclusively from the amino acid sequence. Here, we applied a robust estimate of what is maximally achievable by considering the 75th percentile of yield measurements over codon variants, for every given signal peptide. We refer to this value as ‘yield at the peptide level’.
In the following we evaluated if rankings of ‘yield at the peptide level’ in MTP and microdroplets are similar in terms of the underlining features giving rise to the ranking, and if a machine learning model trained either with MTP data or microdroplet data will result in similar output.
As shown in Figure 6, we calculated the ‘yield at peptide level’ from the data coming from MTP and divided the signal peptides into two groups (‘good’ and ‘bad’), by cutting the signal peptides in half (dotted line in Fig. 6B), based on the calculated ‘yield_peptide_level’ (Fig. 6B; y- axis indicates relative peptide yield). “Good” signal peptides are shown in the right half of Fig. 6B, and “bad” ones are shown in the left half of Fig. 6B. We then trained a computational model with these “good” and “bad” sequences (subsets), respectively, and asked the question if within each subset there is any amino acid which is over- or underrepresented, relative to the sequences of the other subset. A Random Forest (RF) model was used for this task.
The model answered the question by indicating that the amino acid proline is relatively more present in “good” sequences compared to its presence in “bad” sequences. Fig. 6A on the y-axis indicates the fraction of proline containing sequences. As shown in fig. 6 there is a clear difference between the fraction of signal peptides containing proline, depending on the yield of the signal peptide, i.e., circa two-third of the good sequences contain at least one proline, whereas only circa one-third of the bad sequences contain at least one proline. Thus, the model concludes that the presence of proline in a signal peptide is a strong indicator for good expression of the investigated POI.
The same experiment as shown for MTP in Figure 6 was repeated with the microdroplet method of the invention. The results are shown in Fig. 7 where in Fig. 7A the y-axis indicates the fraction of proline containing sequences as output from the computational model, and Fig. 7B the y-axis indicates relative peptide yield of different SP sequences. Comparing the results of Fig. 6A with the results of Fig. 7A, we conclude that the results obtained with the microdroplet method of the invention are comparable to the results obtained with training a computational model with results of a conventional MTP method.
From figure 6 and 7 we can conclude that the microdroplet screening of the invention, although having lower running costs than MTP screening, can provide high quality data in shorter time compared to MTP methods. These results further confirm the unexpected efficiency of the method of the invention.
Example 6: Increased Training Data Size from Microdroplets improves performance of machine learning model
We adressed the issue of proving the importance of training data size of the performance of machine learning models, for the task of predicting ‘yield at the peptide level’ in microdroplets as explained earlier.
In the following, we calculated ‘yield at the peptide level’ for all signal peptides coming out of the microdroplet screen of the invention and not just, as in the earlier examples, using only signal peptides found also in MTP.
Here, 30% of microdroplet data was left out as test set, and the remaining 70% was used for training a Random Forrest regressor model. Among the remaining 70% of training data, we varied the fraction of training data that the model was allowed to see. Each time, we trained the model on the available data and evaluated the performance of the machine learning model up against the test set, allocated initially, by calculating the correlation coefficient between predictions and actual observed values.
The results are shown in Figure 8. We repeated the entire process 3 times (to minimize the contribution coming from stochastic components of the method) and plotted all the correlation coefficients (y-axis of Fig. 8) as a function of the actual number of training data that the model was allowed to see at each round.
As shown in Fig. 8, we can clearly see that the performance of the model increases as we add more data for training. This indicates that data size (an attribute of the microdroplet method of the invention) has high value for down stream machine learning methods.
Example 7: Microdroplet method allows identification of superior library members
This example compares the results of the multi-channel sorted SP libraries from examples 1 and 2 to the results obtained from screening the same SP library in a MTP-format. In contrast to the previous examples using 5 pools, this example sorted the library into 7 pools. Each library
member is given a unique signal peptide identifier. Each library member consists of a different DNA sequence encoding a signal peptide. Using the microdroplet method of the invention, the SP library was sorted into 7 pools, i.e., pool 1 to pool 7. Droplets with lowest signal were sorted into pool 1 , whereas droplets with highest signal were sorted into pool 7. Pools 2-6 comprised cells with library members that showed signals lower that the threshold for pool 7 and higher than the threshold for pool 1. In other words, signal thresholds increased from pool 1 to pool 7.
Table 1 shows the results of 304 library members identified using the microdroplet method. Sequences identified both in MTP screen and in microdroplet screening are marked in gray shade (e.g. SP_GAN_208). For each given sequence, table 1 shows the amount of droplets comprising said sequence in each pool. For example, sequence SP_GAN_205 appeared in 61 droplets of pool 7, and in 2 droplets of pool 1. Additionally, for the sequences identified with the microdroplet method the table shows a score calculated as described in example 3. For sequences also identified in MTP, Table 1 shows a relative activity which was identified during MTP cultivation. Importantly, the sequences in Table 1 are ranked by descending droplet score, i.e., highest droplet scores are on top of Table 1 , wherease lowest droplet scores are in the bottom of Table 1.
As can be seen in Table 1 , the method of the invention allows to identify library members which otherwise would have been overseen and/or not found using conventional methods such as MTP. These library members are shown as lines with a clear background in Table 1. Lines with a gray background represent library members that have been identified using MTP.
As shown in Table 1 , the sequence SP_GAN_208 (marked in grey) was, in terms of scoring, the best performing library member identified in the MTP screen. The same library member was also identified as well performing sequence during the microdroplet screening. However, the microdroplet method of the invention identified 6 additional library members which showed a higher score compared to SP_GAN_208, which were identified as SP_GAN_205, SP_GAN_206, SP_GAN_217, SP_GAN_126, SP_GAN_47 and SP_GAN_232. Thus, the method of the invention is highly beneficial for further improving biotechnological challenges, e.g., by increasing expression of a POI with a new signal peptide sequence.
The invention described and claimed herein is not to be limited in scope by the specific aspects herein disclosed, since these aspects are intended as illustrations of several aspects of the invention. Any equivalent aspects are intended to be within the scope of this invention. Indeed,
various modifications of the invention in addition to those shown and described herein will become apparent to those skilled in the art from the foregoing description. Such modifications are also intended to fall within the scope of the appended claims. In the case of conflict, the present disclosure including definitions will control.
The invention is further defined by the following numbered paragraphs: . A method for screening a biological library, the method comprising the steps of: a) providing a microfluidic device comprising a droplet sorter (200), the droplet sorter comprising at least three output channels (301 , 302, 303), b) providing an emulsion of droplets comprising a library of polynucleotides of interest, and a screenable product, c) determining the amount of screenable product of one or more droplets in the microfluidic device , d) sorting the one or more droplets with the droplet sorter (200) into a receiving output channel of the at least three output channels (301 , 302, 303), wherein the receiving output channel is determined based on the amount of screenable product per droplet, and wherein at least three receiving output channels receive a plurality of droplets comprising an amount of screenable product above and/or below one or more predetermined threshold level, e) identifying one or more polynucleotide of interest present in the at least three output channels (301 , 302, 303) and obtaining sequence data for the one or more polynucleotide of interest, and f) for one or more output channels (301 , 302, 303), assigning a score to each of the one or more polynucleotide of interest, wherein the score is calculated based on the abundance of each of the one or more polynucleotide of interest in one of the one or more output channel (301 , 302, 302).
2. The method of paragraph 1 , wherein the emulsion of droplets comprises one or more host cells.
3. The method of paragraph 2, wherein each host cell comprises one or more polynucleotide of interest of the library of polynucleotides of interest.
4. The method of any one of the preceding paragraphs, wherein in step b) each droplet comprises at most one host cell, or a plurality of host cells derived from the same parent host cell.
5. The method of any one of the preceding paragraphs, wherein in step b) each droplet comprises at most one polynucleotide of interest.
6. The method of any one of the preceding paragraphs, wherein the screenable product is produced by the host cells.
6a. The method of any one of the preceding paragraphs, wherein the screenable product is catalyzed by an enzyme, preferably the enzyme is encoded by the polynucleotide of interest.
7. The method of any one of the preceding paragraphs, wherein the screenable product is encoded by the one or more polynucleotide of interest.
8. The method of any one of the preceding paragraphs, wherein the screenable product is produced by a polypeptide expressed by the host cells.
9. The method of any one of the preceding paragraphs, wherein the screenable product is produced by a polypeptide encoded by the one or more polynucleotide of interest.
10. The method of any one of the preceding paragraphs, wherein the screenable product is a polypeptide expressed by the host cells.
11 . The method of any one of the preceding paragraphs, wherein the screenable product is an enzyme.
12. The method of any one of the preceding paragraphs, wherein the enzyme is expressed by the host cells.
13. The method of any one of the preceding paragraphs, wherein the enzyme is selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, beta-galactosidase, glucoamylase, alphaglucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic enzyme, peroxidase, phytase, polyphenoloxidase, proteolytic enzyme, ribonuclease, transglutaminase, xylanase, or beta-xylosidase.
14. The method of any one of the preceding paragraphs, wherein the screenable product is degraded by the host cells.
15. The method of any one of the preceding paragraphs, wherein the screenable product is degraded by the polypeptide encoded by the one or more polynucleotide of interest.
16. The method of any one of the preceding paragraphs, wherein the screenable product is degraded by a polypeptide expressed by the host cells.
17. The method of any one of the preceding paragraphs, wherein the screenable product is an enzyme substrate, preferably for an enzyme selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, beta-galactosidase, glucoamylase, alpha-glucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic enzyme, peroxidase, phytase, polyphenoloxidase, proteolytic enzyme, ribonuclease, transglutaminase, xylanase, or beta-xylosidase.
The method of any one of the preceding paragraphs, wherein the screenable product is a fluorescent product. The method of any one of the preceding paragraphs, wherein the fluorescent product is converted from a fluorogenic substrate by an enzyme encoded by the polynucleotide of interest. The method of any one of the preceding paragraphs, wherein the amount of screenable product is inversely proportional for one or more of cell number, cell growth, cell division, cell viability, or cell growth rate. The method of any one of the preceding paragraphs, wherein the amount of screenable product is proportional for one or more of cell number, cell growth, cell division, cell viability, or cell growth rate. The method of any one of the preceding paragraphs, wherein the screenable product comprises or consists of one or more host cells. The method of any one of the precding paragraphs, wherein the screenable product comprises or consists of substantially all the host cells in a droplet. The method of any one of the preceding paragraphs, wherein the screenable product is a product of an enzymatic reaction, preferably of a reaction catalyzed by an enzyme selected from the list of a hydrolase, isomerase, ligase, lyase, oxidoreductase, or transferase, e.g., an aminopeptidase, amylase, carbohydrase, carboxypeptidase, catalase, cellobiohydrolase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, deoxyribonuclease, endoglucanase, esterase, alpha-galactosidase, beta-galactosidase, glucoamylase, alpha-glucosidase, beta-glucosidase, invertase, laccase, lipase, mannosidase, mutanase, oxidase, pectinolytic enzyme, peroxidase, phytase, polyphenoloxidase, proteolytic enzyme, ribonuclease, transglutaminase, xylanase, or beta-xylosidase. The method of any one of the preceding paragraphs, wherein the score is proportional, e.g., normalized, to the number of identical DNA sequences for a first polynucleotide of interest present in an output channel. The method of any one of the preceding paragraphs, wherein the score is the total number of identical DNA sequences for a first polynucleotide of interest present in an output channel. The method of any one of the preceding paragraphs, wherein the score is proportional, e.g., normalized, to the number of identical DNA sequences for a second polynucleotide of interest present in an output channel.
28. The method of any one of the preceding paragraphs, wherein the score is the total number of identical DNA sequences for a second polynucleotide of interest present in an output channel.
29. The method of any one of the preceding paragraphs, wherein the microfluidic device comprises an incubation zone (500).
30. The method of any one of the preceding paragraphs, wherein the incubation zone (500) is located upstream of the droplet sorter (200) and/or upstream of one or more sorting means (401 , 402).
31. The method of any one of the preceding paragraphs, comprising incubation of the emulsion of droplets under conditions allowing cell growth, and/or allowing DNA transcription from DNA to RNA, and/or allowing translation from RNA to a polypeptide, preferably the incubation takes place prior to step c).
32. The method of any one of the preceding paragraphs, wherein the incubation does not take place in the microfluidic chip.
33. The method of any one of the preceding paragraphs, wherein the incubation takes place on and/or in the microfluidic device.
34. The method of any one of the preceding paragraphs, wherein after incubation, the cells comprised in one droplet are genetically identical, i.e., the cells are derived from one parental host cell, preferably the same parental host cell.
35. The method of any one of the preceding paragraphs, wherein the droplet sorter comprises one or more sensing means (600), preferably located downstream of the incubation zone (500), and/or upstream of the sorting means (401 , 402).
36. The method of any one of the preceding paragraphs, wherein the one or more sensing means (600) comprises a fluorescence sensor.
37. The method of any one of the preceding paragraphs, wherein the one or more sensing means (600) comprises an absorption sensor.
38. The method of any one of the preceding paragraphs, wherein the one or more sensing means (600) comprises an image sensor, e.g., a CMOS sensor, or a CCD sensor, or a PMT sensor.
39. The method of any one of the preceding paragraphs, wherein the one or more sensing means (600) comprises a NEMS (nanoelectromechanical system) sensor.
40. The method of any one of the preceding paragraphs, wherein the one or more sensing means (600) comprises a mass analyzer suitable for mass spectrometry, e.g. a quadrupole mass analyzer, a TOF mass analyzer, an ion trap mass analyzer, an orbitrap mass analyzer, a magnetic sector mass analyzer, a Q-TOF mass analyser, or a FT-ICR mass analyzer.
41. The method of any one of the preceding paragraphs, wherein step e) comprises DNA amplification of the one or more polynucleotide of interest within each output channel.
42. The method of any one of the preceding paragraphs, wherein the DNA amplification is a PCR method.
43. The method of any one of the preceding paragraphs, wherein the DNA amplification is a ddPCR (droplet digital PCR) method.
44. The method of any one of the preceding paragraphs, wherein step e) comprises DNA sequencing of the one or more polynucleotide of interest, e.g., after PCR amplification, or by nanopore sequencing.
45. The method of any one of the preceding paragraphs, wherein during step e) the one or more polynucleotide of interest is identified by a DNA barcode.
46. The method of any one of the preceding paragraphs, comprising step g) training a computational model, e.g., a machine learning algorithm, with sequence data obtained from step e) and/or score data obtained from step f).
47. The method of paragraph 46, wherein the computational model of step g) is selected from the list of a linear regression, a decision tree, a random forest model, a support vector machine (SVM), a neural network, a K-means clustering, a native Bayes, a Gaussian mixture model (GMM), or a generative model.
48. The method of any one of the preceding paragraphs, wherein the computational model is performed in an electronic device, for providing a candidate biological sequence, the method comprising:
- obtaining input data indicative of an input biological sequence;
- determining the candidate biological sequence by applying a model, e.g. generative model, to the input data, preferably wherein the generative model is non-unidirectional; and providing biological sequence data indicative of the candidate biological sequence.
49. The method according to any one of the preceding paragraphs, wherein the input biological sequence comprises one or more polynucleotide of interest identified in step e).
50. The method according to any one of the preceding paragraphs, wherein the input biological sequence is one or more of: an amino acid sequence of a polypeptide of interest, a nucleic acid sequence encoding a polypeptide of interest, a control sequence, e.g., an expression control sequence, and a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
51 . The method according to any of the previous paragraphs, wherein the candidate biological sequence is one or more of: a control sequence, e.g., an expression control sequence, a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence,
an amino acid sequence of a polypeptide of interest, and a nucleic acid sequence encoding a polypeptide of interest.
52. The method according to any of the previous paragraphs, wherein the input biological sequence is an amino acid sequence of a polypeptide of interest and/or a nucleic acid sequence encoding a polypeptide of interest, and wherein the candidate biological sequence is a nucleic acid sequence increasing compatibility with a host cell.
53. The method according to any of the previous paragraphs, wherein the input biological sequence is an amino acid sequence of a polypeptide of interest and/or a nucleic acid sequence encoding a polypeptide of interest, and wherein the candidate biological sequence is a control sequence, e.g., an expression control sequence, and/or a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
54. The method according to any of the previous paragraphs, wherein the input biological sequence is a control sequence, e.g., an expression control sequence, and/or a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, and wherein the candidate biological sequence is a nucleic acid sequence encoding a polypeptide of interest.
55. The method according to any of the previous paragraphs, wherein the generative model is one or more of: a generative adversarial network (GAN) model, a Wasserstein generative adversarial network model, a diffusion model, and a variational autoencoder.
56. The method according to any of the previous paragraphs, wherein applying the generative non-unidirectional model to the input data comprises partitioning the generative nonunidirectional model into a plurality of generators, wherein each generator of the plurality of generators is configured to determine, based on the input data, one or more candidate biological sequences for a subset of nucleotides and/or a subset of amino acids and a predetermined criterion.
57. The method according to any of the previous paragraphs, wherein determining the candidate biological sequence by applying the model to the input data comprises: predicting, using the generator, a compatibility of the candidate biological sequence with the host cell; and
- determining the candidate biological sequence having a predicted compatibility meeting the predetermined criterion.
57a. The method according to paragraph 57 wherein the model is a generative model.
58. The method according to any of the previous paragraphs, wherein the predetermined criterion is based on one or more of:
- a proportion of the set of nucleotides in the candidate biological sequence;
- a class of host cell;
- a host cell genus or species;
- a GC content of a host cell genome;
- a GC content of the candidate biological sequence; and
- a parameter associated with a property of the candidate biological sequence.
59. The method according to any of the previous paragraphs, the method comprising training the model based on a training set of biological sequences, wherein the training set of biological sequences includes training data indicative of one or more biological sequences related to the host cell.
60. The method according to any of the previous paragraphs, wherein the training set of biological sequences is heterologous to the genus of the host cell, preferably heterologous to one or more species of the host cell.
61. The method according to any of the previous paragraphs, wherein the training data comprises training input data indicative of one or more of: an amino acid sequence of a polypeptide of interest, a nucleic acid sequence encoding a polypeptide of interest, a control sequence, e.g., an expression control sequence, and a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence.
62. The method according to any of the previous paragraphs, wherein the training data comprises training output data indicative of one or more of: a control sequence, e.g., an expression control sequence, a nucleic acid sequence encoding a control sequence, e.g., an expression control sequence, an amino acid sequence of a polypeptide of interest, and a nucleic acid sequence encoding a polypeptide of interest.
63. The method according to any of the previous paragraphs, wherein training the model comprises predicting, using a discriminator taking as input the training set of biological sequences, and a training candidate biological sequence, a score indicative of the training candidate biological sequence being a referenced biological sequence.
64. The method according to any of the previous paragraphs, the method comprising obtaining, from a test environment data repository, experimental data associated with the candidate biological sequence and the host cell; wherein the experimental data indicates a yield performance of the candidate biological sequence associated with the host cell.
65. The method according to any of the previous paragraphs, the method comprising validating the candidate biological sequence based on the experimental data.
66. The method according to any of the previous paragraphs, the method comprising selecting one or more generators based on the experimental data.
67. The method according to any of the previous paragraphs, the method comprising adapting the model based on the experimental data.
68. The method according to any of the previous claims, wherein obtaining input data indicative of an input biological sequence comprises obtaining the input data for the input biological sequence from a database and/or a memory of the electronic device.
69. An electronic device comprising a memory circuitry, a processor circuitry, and an interface, wherein the electronic device is configured to perform any of the methods according to any one of the preceding paragraphs.
70. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device cause the electronic device to perform any of the methods of any one of the preceding paragraphs.
71. The method of any one of the preceding paragraphs, comprising additional step h) generating one or more synthetic polynucleotide of interest based on the output of the computational model.
72. The method of any one of the preceding paragraphs, wherein the one or more synthetic polynucleotide generated in step h) comprises or consists of a candidate biological sequence.
73. The method of any one of the preceding paragraphs, wherein the one or more synthetic polynucleotide of interest generated in step h) is codon-optimized.
74. The method of any one of the preceding paragraphs, wherein the one or more synthetic polynucleotide of interest generated in step h) encodes a polypeptide with increased substrate binding, increased receptor binding, increased substrate specificity, increased specific activity, and/or increased stability.
75. The method of any one of the preceding paragraphs, wherein the one or more synthetic polynucleotide of interest generated in step h) results increased expression of a polypeptide of interest.
76. The method of any one of the preceding paragraphs, wherein the one or more synthetic polynucleotide of interest generated in step h) comprises a control sequence
77. The method of any one of the preceding paragraphs, wherein the droplet sorter (200) comprises one or more sorting means (401 , 402).
78. The method of any one of the preceding paragraphs, wherein the one or more sorting means comprises, or consists of one or more electrode, one or more acoustic wave generator, one or more valve, and/or one or more pressure-controlled outlets.
79. The method of any one of the preceding paragraphs, wherein the one or more sorting means comprises at least two electrodes.
80. The method of any one of the preceding paragraphs, wherein the one or more sorting means consists of one electrode.
81. The method of any one of the preceding paragraphs, wherein the one or more sorting means consists of two electrodes.
82. The method of any one of the preceding paragraphs, wherein the biological library comprises or consists of wild-type cells with different genotype and/or different phenotype.
83. The method of any one of the preceding paragraphs, wherein the biological library comprises or consists of recombinant cells.
84. The method of any one of the preceding paragraphs, wherein the biological library encodes different variants of the same polypeptide of interest, preferably the polypeptide of interest is an enzyme.
85. The method of any one of the preceding paragraphs, wherein the biological library encodes different signal peptide variants.
86. The method of any one of the preceding paragraphs, wherein the biological library encodes different promoter variants.
87. The method of any one of the preceding paragraphs, wherein the biological library comprises different codon-optimized DNA sequences encoding the same amino acid sequence of a polypeptide of interest, e.g., a signal peptide, and/or an enzyme.
88. The method of any one of the preceding paragraphs, wherein the polynucleotide of interest encodes a polypeptide of interest.
89. The method of any one of the preceding paragraphs, wherein the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest encoding a variant of a polypeptide of interest.
90. The method of any one of the preceding paragraphs, wherein the polynucleotide of interest comprises a first polynucleotide of interest encoding a control sequence, and a second polynucleotide of interest encoding a polypeptide of interest.
91. The method of any one of the preceding paragraphs, wherein the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest encoding a variant of a control sequence.
92. The method of any one of the preceding paragraphs, wherein the control sequence is a promoter sequence, a signal peptide, a leader sequence, a polyadenylation sequence, a propeptide sequence, or a transcription terminator.
93. The method of any one of the preceding paragraphs, wherein the polynucleotide of interest comprises a first polynucleotide of interest encoding a signal peptide, and a second polynucleotide of interest encoding a polypeptide of interest, wherein the first polynucleotide of interest is operatively linked to the second polynucleotide of interest and located upstream of the second polynucleotide of interest.
94. The method of any one of the preceding paragraphs, wherein the polynucleotide of interest comprises a first polynucleotide of interest comprising a promoter sequence, and a second polynucleotide of interest encoding a polypeptide of interest, wherein the first polynucleotide of interest is operatively linked to the second polynucleotide of interest and located upstream of the second polynucleotide of interest.
95. The method of any one of the preceding paragraphs, wherein the biological library comprises identical second polynucleotides of interest, and a plurality of variants of the first polynucleotides of interest.
96. The method of any one of the preceding paragraphs, wherein the biological library comprises identical first polynucleotides of interest, and a plurality of variants of the second polynucleotides of interest.
97. The method of any one of the preceding paragraphs, wherein the first polynucleotide of interest is heterologous to the second polynucleotide of interest.
98. The method of any one of the preceding paragraphs, wherein the first polynucleotide of interest is endogenous to the second polynucleotide of interest.
99. The method of any one of the preceding paragraphs, wherein the one or more polynucleotide of interest comprises a promoter, a polynucleotide encoding a signal peptide, a polynucleotide encoding a polypeptide of interest, or a native host cell gene.
100. The method of any one of the preceding paragraphs, wherein the polynucleotide of interest is substantially the whole genome of the host cell.
101. The method of any one of the preceding paragraphs, wherein the one or more polynucleotide of interest is heterologous to the host cell.
102. The method of any one of the preceding paragraphs, wherein the one or more polynucleotide of interest is endogenous to the host cell.
103. The method of any one of the preceding paragraphs, wherein the first polynucleotide of interest is heterologous to the host cell.
104. The method of any one of the preceding paragraphs, wherein the first polynucleotide of interest is endogenous to the host cell.
105. The method of any one of the preceding paragraphs, wherein the second polynucleotide of interest is heterologous to the host cell.
106. The method of any one of the preceding paragraphs, wherein the second polynucleotide of interest is endogenous to the host cell.
107. The method of any one of the preceding paragraphs, wherein the first and second polynucleotide of interest are heterologous to the host cell.
108. The method of any one of the preceding paragraphs, wherein the first and second polynucleotide of interest are endogenous to the host cell.
. The method of any one of the preceding paragraphs, wherein the one or more polynucleotide of interest encodes a polypeptide of interest. . The method of any one of the preceding paragraphs, wherein the polypeptide of interest is an enzyme, a nanobody, an antibody, an antibody-fragment, a fluorescent polypeptide, e.g., GFP, or an alpha-lactalbumin. . The method of any one of the preceding paragraphs, wherein the amount of screenable product in the droplet is proportional to the amount of a polypeptide encoded by the one or more polynucleotide of interest. . The method of any one of the preceding paragraphs, wherein the amount of screenable product in the droplet is inversely proportional to the amount of a polypeptide encoded by the one or more polynucleotide of interest. . The method of any one of the preceding paragraphs, wherein the biological library comprises at least 100 different one or more polynucleotides of interest, at least 200 different one or more polynucleotides of interest, at least 500 different one or more polynucleotides of interest, at least 1 000 different one or more polynucleotides of interest, at least 2000 different one or more polynucleotides of interest, at least 3 000 different one or more polynucleotides of interest, at least 5 000 different one or more polynucleotides of interest, at least 10 000 different one or more polynucleotides of interest, at least 100 000 different one or more polynucleotides of interest, at least 1 000 000 different one or more polynucleotides of interest, at least 10 000 000 different one or more polynucleotides of interest, at least 50000 000 different one or more polynucleotides of interest, or at least 100 000 000 different polynucleotides of interest. . The method of any one of the preceding paragraphs, wherein the biological library comprises at least 100 different host cells, at least 200 different host cells, at least 500 different host cells, at least 1 000 different host cells, at least 2 000 different host cells, at least 3 000 different host cells, at least 5 000 different host cells est, at least 10 000 different host cells, at least 100 000 different host cells, at least 200 000 different host cells, at least 500 000 different host cells, at least 1 000 000 different host cells, at least 5 000 000 different host cells, at least 10 000 000 different host cells, or at least 100 000 000 different host cells. . The method of any one of the preceding paragraphs, wherein the amount of screenable product in the droplet is proportional to one or more of: stability of the polypeptide of interest, transcription of the polypeptide of interest, translation of the polypeptide of interest, secretion of the polypeptide of interest, yield of the polypeptide of interest, binding strength of the polypeptide of interest to a target molecule, and activity of the polypeptide of interest.
116. The method of any one of the preceding paragraphs, wherein the amount of screenable product in the droplet is inversely proportional to one or more of: stability of the polypeptide of interest, transcription of the polypeptide of interest, translation of the polypeptide of interest, secretion of the polypeptide of interest, yield of the polypeptide of interest, binding strength of the polypeptide of interest to a target molecule, and activity of the polypeptide of interest.
117. The method of any one of the preceding paragraphs, wherein the amount of screenable product in the droplet is proportional to one or more of: cell number, viability of the host cell, cell division rate of the host cell, cell growth rate of the host cell, cell size of the host cell, and protein secretion of the host cell.
118. The method of any one of the preceding paragraphs, wherein the amount of screenable product in the droplet is inversely proportional to one or more of: cell number, viability of the host cell, cell division rate of the host cell, cell growth rate of the host cell, cell size of the host cell, and protein secretion of the host cell.
119. The method of any one of the preceding paragraphs, wherein the one or more droplets comprise a substrate.
120. The method of any one of the preceding paragraphs, wherein the substrate comprises or consists of the screenable product.
121. The method of any one of the preceding paragraphs, wherein the substrate is a fluorescent substrate.
122. The method of any one of the preceding paragraphs, wherein the substrate is a fluorogenic Rhodamine.
123. The method of any one of the preceding paragraphs, wherein the substrate is a fluorochrome.
124. The method of any one of the preceding paragraphs, wherein the substrate is a fluorogenic substrate.
125. The method of any one of the preceding paragraphs, wherein the substrate comprises a fluorophore, e.g., fluorescein, or fluorescein-labelled starch.
126. The method of any one of the preceding paragraphs, wherein the substrate is Nile red.
127. The method of any one of the preceding paragraphs, wherein the substrate is DAPI (4’,6-diamidino-2-phenylindole).
128. The method of any one of the preceding paragraphs, wherein each droplet, before the optional incubation, comprises an average occupation of at most 0.01 cells, at most 0.02 cells, at most 0.03 cells, at most 0.04 cells, at most 0.05 cells, at most 0.06 cells, at most 0.07 cells, at most 0.08 cells, at most 0.09 cells, at most 0.1 cells, at most 0.2 cells,
at most 0.3 cells, at most 0.4 cells, at most 0.5 cells, at most 0.6 cells, or at most 0.7 cells; preferably at most 0.1 cells. . The method of any one of the preceding paragraphs, wherein each droplet comprises an average occupation of at most 0.01 polynucleotide of interest, at most 0.02 polynucleotide of interest, at most 0.03 polynucleotide of interest, at most 0.04 polynucleotide of interest, at most 0.05 polynucleotide of interest, at most 0.06 polynucleotide of interest, at most 0.07 polynucleotide of interest, at most 0.08 polynucleotide of interest, at most 0.09 polynucleotide of interest, at most 0.1 polynucleotide of interest, at most 0.2 polynucleotide of interest, at most 0.3 polynucleotide of interest, at most 0.4 polynucleotide of interest, at most 0.5 polynucleotide of interest, at most 0.6 polynucleotide of interest, or at most 0.7 polynucleotide of interest; preferably at most 0.1 polynucleotide of interest. . The method of any one of the preceding paragraphs, wherein the droplet sorting is facilitated by an electric field generated by one or more electrode (401 , 402) adjacent to the droplet sorter. . The method of any one of the preceding paragraphs, wherein the droplet sorting is facilitated by an acoustic wave generated by one or more acoustic wave generators (401 , 402) adjacent to the droplet sorter. . The method of any one of the preceding paragraphs, wherein the droplet sorting is facilitated by a local pressure change generated by one or more pressure-controlled outlets (401 , 402) adjacent to the droplet sorter, e.g., wherein the one or more pressure- controlled outlets are comprised in one or more output channel. . The method of any one of the preceding paragraphs, wherein the amount of screenable product in step c) is determined using a fluorescence-based signal, absorbance, Raman spectroscopy, mass spectrometry (MS), or MALDI-MS. . The method of any one of the preceding paragraphs, wherein a relative and/or an absolute amount of the screenable product per droplet is determined by the one or more sensing means (600). . The method of any one of the preceding paragraphs, wherein after step d), one or more output channels comprise at least 10 000 droplets, at least 50 000 droplets, at least 100 000 droplets, at least 500 000 droplets, at least 1 000 000 droplets, at least 2 000 000 droplets, at least 5 000 000 droplets, at least 10 000 000 droplets, or at least 100 000 000 droplets. . The method of any one of the preceding paragraphs, wherein the droplet sorter comprises at least four output channels, at least five output channels, at least six output
channels, at least seven output channels, at least 8 output channels, at least 9 output channels, or at least 10 output channels. . The method of any one of the preceding paragraphs, wherein the host cell is is a yeast host cell, e.g., a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell, such as a Kluyveromyces lactis, Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, Saccharomyces oviformis, or Yarrowia lipolytica cell. . The method of any one of the preceding paragraphs, wherein the host cell is a filamentous fungal host cell, e.g., an Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Chrysosporium, Coprinus, Coriolus, Cryptococcus, Filibasidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Phanerochaete, Phlebia, Piromyces, Pleurotus, Schizophyllum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Trametes, or Trichoderma cell, in particular, an Aspergillus awamori, Aspergillus foetidus, Aspergillus fumigatus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Bjerkandera adusta, Ceriporiopsis aneirina, Ceriporiopsis caregiea, Ceriporiopsis gilvescens, Ceriporiopsis pannocinta, Ceriporiopsis rivulosa, Ceriporiopsis subrufa, Ceriporiopsis subvermispora, Chrysosporium inops, Chrysosporium keratinophilum, Chrysosporium lucknowense, Chrysosporium merdarium, Chrysosporium pannicola, Chrysosporium queenslandicum, Chrysosporium tropicum, Chrysosporium zonatum, Coprinus cinereus, Coriolus hirsutus, Fusarium bactridioides, Fusarium cerealis, Fusarium crookwellense, Fusarium culmorum, Fusarium graminearum, Fusarium graminum, Fusarium heterosporum, Fusarium negundi, Fusarium oxysporum, Fusarium reticulatum, Fusarium roseum, Fusarium sambucinum, Fusarium sarcochroum, Fusarium sporotrichioides, Fusarium sulphureum, Fusarium torulosum, Fusarium trichothecioides, Fusarium venenatum, Humicola insolens, Humicola lanuginosa, Mucor miehei, Myceliophthora thermophila, Neurospora crassa, Penicillium purpurogenum, Phanerochaete chrysosporium, Phlebia radiata, Pleurotus eryngii, Talaromyces emersonii, Thielavia terrestris, Trametes villosa, Trametes versicolor, Trichoderma harzianum, Trichoderma koningii, Trichoderma longibrachiatum, Trichoderma reesei, or Trichoderma viride cell. . The method of any one of the preceding paragraphs, wherein the host cell is a prokaryotic host cell, e.g., a Gram-positive cell selected from the group consisting of Bacillus, Clostridium, Enterococcus, Geobacillus, Lactobacillus, Lactococcus, Oceanobacillus, Staphylococcus, Streptococcus, or Streptomyces cells, or a Gram-
negative bacteria selected from the group consisting of Campylobacter, E. coli, Flavobacterium, Fusobacterium, Helicobacter, llyobacter, Neisseria, Pseudomonas, Salmonella, and Ureaplasma cells, such as Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus firmus, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus pumilus, Bacillus stearothermophilus, Bacillus subtilis, Bacillus thuringiensis, Streptococcus equisimilis, Streptococcus pyogenes, Streptococcus uberis, and Streptococcus equi subsp. Zooepidemicus, Streptomyces achromogenes, Streptomyces avermitilis, Streptomyces coelicolor, Streptomyces griseus, and Streptomyces lividans cells.
140. The method of any one of the preceding paragraphs, wherein the host cell is
Bacillus subtilis.
141. The method of any one of the preceding paragraphs, wherein the host cell is
Bacillus licheniformis.
142. The method of any one of the preceding paragraphs, wherein the host cell is Trichoderma reesei.
143. The method of any one of the preceding paragraphs, wherein the host cell is Aspergillus niger.
144. The method of any one of the preceding paragraphs, wherein the host cell is
Aspergillus oryzae.
145. The method of any one of the preceding paragraphs, wherein the host cell is a
Bifidobacterium, e.g., Bifidobacterium animalis, or Bifidobacterium animalis subsp. lactis.
146. A host cell comprising in its genome a polynucleotide sequence of interest generated in step h), and/or a polynucleotide sequence identified in step e).
147. The host cell of any one of the preceding paragraphs, which is isolated.
148. The host cell of any one of the preceding paragraphs, which is purified.
149. The host cell of any one of the preceding paragraphs, which comprises at least two copies, e.g., three, four, five, or more copies of the polynucleotide sequence of interest.
150. A method of producing a polypeptide of interest, the method comprising the steps of cultivating the cell according to any one of the preceding paragraphs, under conditions conducive for production of the polypeptide.
151. The method of paragraph 150, further comprising recovering the polypeptide.. A nucleic acid construct or expression vector comprising a polynucleotide of interest identified by step e), and/or a polynucleotide sequence generated in step h).
Claims
1. A method for screening a biological library, the method comprising the steps of: a) providing a microfluidic device comprising a droplet sorter (200), the droplet sorter comprising at least three output channels (301 , 302, 303), b) providing an emulsion of droplets comprising a library of polynucleotides of interest and a screenable product, c) determining the amount of screenable product of one or more droplets in the microfluidic device, d) sorting the one or more droplets with the droplet sorter (200) into a receiving output channel of the at least three output channels (301 , 302, 303), wherein the receiving output channel is determined based on the amount of screenable product per droplet, and wherein at least three receiving output channels receive a plurality of droplets comprising an amount of screenable product above and/or below one or more predetermined threshold level, e) identifying one or more polynucleotide of interest present in the at least three output channels (301 , 302, 303) and obtaining sequence data for the one or more polynucleotide of interest, and f) for one or more output channels (301 , 302, 303), assigning a score to each of the one or more polynucleotide of interest, wherein the score is calculated based on the abundance of each of the one or more polynucleotide of interest in one of the one or more output channel (301 , 302, 302).
2. The method of claim 1 , wherein the droplet sorter comprises at least four output channels, at least five output channels, at least six output channels, at least seven output channels, at least 8 output channels, at least 9 output channels, or at least 10 output channels.
3. The method of any one of claims 1-2, wherein the emulsion of droplets comprises one or more host cells.
4. The method of any one of claims 1-3, wherein the microfluidic device comprises an incubation zone (500), and wherein the method comprises incubation of the emulsion of droplets under conditions allowing host cell growth, and/or allowing transcription of DNA to RNA, and/or allowing translation of RNA to a polypeptide, preferably the incubation
takes place prior to step c).
5. The method of claim 4, wherein each of the one or more droplet, before the incubation, comprises an average occupation of at most 0.01 cells, at most 0.02 cells, at most 0.03 cells, at most 0.04 cells, at most 0.05 cells, at most 0.06 cells, at most 0.07 cells, at most 0.08 cells, at most 0.09 cells, at most 0.1 cells, at most 0.2 cells, at most 0.3 cells, at most 0.4 cells, at most 0.5 cells, at most 0.6 cells, or at most 0.7 cells; preferably at most 0.1 cells.
6. The method of any one of claims 3-5, wherein each host cell comprises one or more polynucleotide of interest of the library of polynucleotides of interest.
7. The method of any one of the preceding claims, wherein the screenable product is catalyzed by an enzyme, preferably the enzyme is encoded by the polynucleotide of interest.
8. The method of any one of the preceding claims, wherein the screenable product is produced by the host cells.
9. The method of any one of the preceding claims, wherein the biological library comprises or consists of a plurality of polynucleotides of interest, each polynucleotide of interest encoding a variant of a polypeptide of interest.
10. The method of any one of the preceding claims, wherein the polynucleotide of interest comprises a first polynucleotide of interest encoding a control sequence, and a second polynucleotide of interest encoding a polypeptide of interest.
11. The method of any one of the preceding claims, comprising step g) training a computational model, e.g., a machine learning algorithm, with sequence data obtained from step e) and/or score data obtained from step f).
12. The method of any one of the preceding claims, wherein the computational model is providing a candidate biological sequence, the method of the model comprising:
- obtaining input data indicative of an input biological sequence;
- determining the candidate biological sequence by applying the model to the input data; and
providing biological sequence data indicative of the candidate biological sequence.
13. The method according to claim 12, wherein the input biological sequence comprises one or more polynucleotide of interest identified in step e).
14. The method of any one of the preceding claims, comprising additional step h) generating one or more synthetic polynucleotide of interest with the computational model.
15. A host cell comprising in its genome
(i) the synthetic polynucleotide of interest generated in step h) of claim 14, and/or a polynucleotide of interest identified in step e) of claim 1 ; and
(ii) a polynucleotide encoding a polypeptide of interest.
16. A method of producing a polypeptide of interest, the method comprising the steps of cultivating the cell according to claim 15, under conditions conducive for production of the polypeptide.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23200852 | 2023-09-29 | ||
| EP23200852.4 | 2023-09-29 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2024240965A2 true WO2024240965A2 (en) | 2024-11-28 |
| WO2024240965A3 WO2024240965A3 (en) | 2025-01-09 |
Family
ID=88237933
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2024/077083 Pending WO2024240965A2 (en) | 2023-09-29 | 2024-09-26 | Droplet-based screening method |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2024240965A2 (en) |
Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0238023A2 (en) | 1986-03-17 | 1987-09-23 | Novo Nordisk A/S | Process for the production of protein products in Aspergillus oryzae and a promoter for use in Aspergillus |
| WO1992006204A1 (en) | 1990-09-28 | 1992-04-16 | Ixsys, Inc. | Surface expression libraries of heteromeric receptors |
| US5223409A (en) | 1988-09-02 | 1993-06-29 | Protein Engineering Corp. | Directed evolution of novel binding proteins |
| WO1994025612A2 (en) | 1993-05-05 | 1994-11-10 | Institut Pasteur | Nucleotide sequences for the control of the expression of dna sequences in a cellular host |
| WO1995017413A1 (en) | 1993-12-21 | 1995-06-29 | Evotec Biosystems Gmbh | Process for the evolutive design and synthesis of functional polymers based on designer elements and codes |
| WO1995022625A1 (en) | 1994-02-17 | 1995-08-24 | Affymax Technologies N.V. | Dna mutagenesis by random fragmentation and reassembly |
| WO1995033836A1 (en) | 1994-06-03 | 1995-12-14 | Novo Nordisk Biotech, Inc. | Phosphonyldipeptides useful in the treatment of cardiovascular diseases |
| WO2007061448A2 (en) | 2005-05-18 | 2007-05-31 | President And Fellows Of Harvard College | Fabrication of conductive pathways, microcircuits and microstructures in microfluidic networks |
| WO2010151776A2 (en) | 2009-06-26 | 2010-12-29 | President And Fellows Of Harvard College | Fluid injection |
| WO2017144177A1 (en) | 2016-02-26 | 2017-08-31 | Keskin Hüseyin | Driving and/or flight simulator |
| US20190185847A1 (en) | 2016-07-06 | 2019-06-20 | Novozymes A/S | Improving a Microorganism by CRISPR-Inhibition |
| WO2024133344A1 (en) | 2022-12-20 | 2024-06-27 | Novozymes A/S | A method for providing a candidate biological sequence and related electronic device |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| DK3271477T3 (en) * | 2015-03-20 | 2020-09-14 | Novozymes As | Drop-based selection by injection |
| EP3289362B1 (en) * | 2015-04-30 | 2022-04-13 | European Molecular Biology Laboratory | Microfluidic droplet detection and sorting |
| WO2021072306A1 (en) * | 2019-10-10 | 2021-04-15 | 1859, Inc. | Methods and systems for microfluidic screening |
-
2024
- 2024-09-26 WO PCT/EP2024/077083 patent/WO2024240965A2/en active Pending
Patent Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0238023A2 (en) | 1986-03-17 | 1987-09-23 | Novo Nordisk A/S | Process for the production of protein products in Aspergillus oryzae and a promoter for use in Aspergillus |
| US5223409A (en) | 1988-09-02 | 1993-06-29 | Protein Engineering Corp. | Directed evolution of novel binding proteins |
| WO1992006204A1 (en) | 1990-09-28 | 1992-04-16 | Ixsys, Inc. | Surface expression libraries of heteromeric receptors |
| WO1994025612A2 (en) | 1993-05-05 | 1994-11-10 | Institut Pasteur | Nucleotide sequences for the control of the expression of dna sequences in a cellular host |
| WO1995017413A1 (en) | 1993-12-21 | 1995-06-29 | Evotec Biosystems Gmbh | Process for the evolutive design and synthesis of functional polymers based on designer elements and codes |
| WO1995022625A1 (en) | 1994-02-17 | 1995-08-24 | Affymax Technologies N.V. | Dna mutagenesis by random fragmentation and reassembly |
| WO1995033836A1 (en) | 1994-06-03 | 1995-12-14 | Novo Nordisk Biotech, Inc. | Phosphonyldipeptides useful in the treatment of cardiovascular diseases |
| WO2007061448A2 (en) | 2005-05-18 | 2007-05-31 | President And Fellows Of Harvard College | Fabrication of conductive pathways, microcircuits and microstructures in microfluidic networks |
| WO2010151776A2 (en) | 2009-06-26 | 2010-12-29 | President And Fellows Of Harvard College | Fluid injection |
| WO2017144177A1 (en) | 2016-02-26 | 2017-08-31 | Keskin Hüseyin | Driving and/or flight simulator |
| US20190185847A1 (en) | 2016-07-06 | 2019-06-20 | Novozymes A/S | Improving a Microorganism by CRISPR-Inhibition |
| WO2024133344A1 (en) | 2022-12-20 | 2024-06-27 | Novozymes A/S | A method for providing a candidate biological sequence and related electronic device |
Non-Patent Citations (54)
| Title |
|---|
| "Biology and Activities of Yeast", 1980, SOC. APP. BACTERIOL. SYMPOSIUM SERIES |
| BALLEZA ET AL., FEMS MICROBIOL. REV, vol. 33, no. 1, 2009, pages 133 - 151 |
| BOWIESAUER, PROC. NATL. ACAD. SCI. USA, vol. 86, 1989, pages 2152 - 2156 |
| BURKE ET AL., PROC. NATL. ACAD. SCI. USA, vol. 98, 2001, pages 6289 - 6294 |
| CARTER ET AL., PROTEINS: STRUCTURE, FUNCTION, AND GENETICS, vol. 6, 1989, pages 240 - 248 |
| CHOI ET AL., J. MICROBIOL. METHODS, vol. 64, 2006, pages 391 - 397 |
| CHRISTENSEN ET AL., BIO/TECHNOLOGY, vol. 6, 1988, pages 1419 - 1422 |
| COLLINS-RACIE ET AL., BIOTECHNOLOGY, vol. 13, 1995, pages 982 - 987 |
| CONTRERAS ET AL., BIOTECHNOLOGY, vol. 9, 1991, pages 378 - 381 |
| COOPER ET AL., EMBO J., vol. 12, 1993, pages 2575 - 2583 |
| CUNNINGHAMWELLS, SCIENCE, vol. 244, 1989, pages 1081 - 1085 |
| DAVIS ET AL.: "Basic Methods in Molecular Biology", 2012, ELSEVIER |
| DERBYSHIRE ET AL., GENE, 1986, pages 145 |
| DONALD ET AL., J. BACTERIOL, vol. 195, no. 11, 2013, pages 2612 - 2620 |
| EATON ET AL., BIOCHEMISTRY, vol. 25, 1986, pages 505 - 512 |
| FORD ET AL., PROTEIN EXPRESSION AND PURIFICATION, vol. 2, 1991, pages 95 - 107 |
| FREUDL, MICROBIAL CELL FACTORIES, vol. 17, 2018, pages 52 |
| GEISBERG ET AL., CELL, vol. 156, no. 4, 2014, pages 812 - 824 |
| GUOSHERMAN, MOL. CELLULAR BIOL, vol. 15, 1995, pages 5983 - 5990 |
| HAMBRAEUS ET AL., MICROBIOLOGY, vol. 146, no. 12, 2000, pages 3051 - 3059 |
| HAWKSWORTH ET AL.: "In, Ainsworth and Bisby's Dictionary of The Fungi", 1995, CAB INTERNATIONAL, UNIVERSITY PRESS |
| HEINZE ET AL., BMC MICROBIOLOGY, vol. 18, 2018, pages 56 |
| HILTON ET AL., J. BIOL. CHEM., vol. 271, 1996, pages 4699 - 4708 |
| HUE ET AL., J. BACTERIOL, vol. 177, 1995, pages 3465 - 3471 |
| JUMPER ET AL.: "Highly accurate protein structure prediction with AlphaFold", NATURE, vol. 596, 2021, pages 583 - 589, XP055888904, DOI: 10.1038/s41586-021-03819-2 |
| KABERDINBLASI, FEMS MICROBIOL. REV, vol. 30, no. 6, 2006, pages 967 - 979 |
| LABROU, PROTEIN DOWNSTREAM PROCESSING, vol. 1129, 2014, pages 3 - 10 |
| LI ET AL., MICROBIAL CELL FACTORIES, vol. 16, 2017, pages 168 |
| LOWMAN ET AL., BIOCHEMISTRY, vol. 30, 1991, pages 10832 - 10837 |
| LUBERTOZZIKEASLING, BIOTECHN. ADVANCES, vol. 27, 2009, pages 53 - 75 |
| MARTIN ET AL., J. IND. MICROBIOL. BIOTECHNOL, vol. 3, 2003, pages 568 - 576 |
| MOROZOV ET AL., EUKARYOTIC CELL, vol. 5, no. 11, pages 1838 - 1846 |
| MUKHERJEE ET AL., TRICHODERMA: BIOLOGY AND APPLICATIONS, 2013 |
| NEEDLEMANWUNSCH, J. MOL. BIOL., vol. 48, 1970, pages 443 - 453 |
| NER ET AL., DNA, vol. 7, 1988, pages 127 |
| NESS ET AL., NATURE BIOTECHNOLOGY, vol. 17, 1999, pages 893 - 896 |
| PATELGUPTA, INT. J. SYST. EVOL. MICROBIOL, vol. 70, 2020, pages 406 - 438 |
| RASMUSSEN-WILSON ET AL., APPL. ENVIRON. MICROBIOL, vol. 63, 1997, pages 3488 - 3493 |
| REIDHAAR-OLSONSAUER, SCIENCE, vol. 241, 1988, pages 53 - 57 |
| RICE ET AL.: "Trends Genet", vol. 16, 2000, article "EMBOSS: The European Molecular Biology Open Software Suite", pages: 276 - 277 |
| ROMANOS ET AL., YEAST, vol. 8, 1992, pages 423 - 488 |
| SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR LAB |
| SCHMOLLDATTENBÖCK: "Gene Expression Systems in Fungi: Advancements and Applications", FUNGAL BIOLOGY, 2016 |
| SESHASAYEE ET AL., SUBCELLULAR BIOCHEMISTRY, vol. 52, 2011, pages 7 - 23 |
| SMITH ET AL., J. MOL. BIOL., vol. 224, 1992, pages 899 - 904 |
| SMOLKE ET AL., SYNTHETIC BIOLOGY: PARTS, DEVICES AND APPLICATIONS, 2018 |
| SONG ET AL., PLOS ONE, vol. 11, no. 7, 2016, pages 0158447 |
| STEVENS, DRUG DISCOVERY WORLD, vol. 4, 2003, pages 35 - 48 |
| SVETINA ET AL., J. BIOTECHNOL, vol. 76, 2000, pages 245 - 251 |
| V\AODAVER ET AL., FEBS LETT, vol. 309, 1992, pages 59 - 64 |
| VOS ET AL., SCIENCE, vol. 255, 1992, pages 306 - 312 |
| WINGFIELD, CURRENT PROTOCOLS IN PROTEIN SCIENCE, vol. 80, no. 1, 2015, pages 1 - 35 |
| XU ET AL., BIOTECHNOLOGY LETTERS, vol. 40, 2018, pages 949 - 955 |
| YELTON ET AL., PROC. NATL. ACAD. SCI. USA, vol. 81, pages 1470 - 1474 |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2024240965A3 (en) | 2025-01-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8227227B2 (en) | DNase expression in recombinant host cells | |
| WO2024133344A1 (en) | A method for providing a candidate biological sequence and related electronic device | |
| US10995325B2 (en) | Additional phytase variants and methods | |
| US9528096B1 (en) | Phytases and uses thereof | |
| US10351832B2 (en) | Phytases and uses thereof | |
| US9045748B2 (en) | Methods for transforming and expression screening of filamentous fungal cells with a DNA library | |
| CN108603181B (en) | Phytase and its use | |
| US20160304887A1 (en) | Introducing or Inactivating Female Fertility in Filamentous Fungal Cells | |
| US9605245B1 (en) | Phytases and uses thereof | |
| WO2024240965A2 (en) | Droplet-based screening method | |
| US20150307871A1 (en) | Method for generating site-specific mutations in filamentous fungi | |
| AU2019382494A1 (en) | Polypeptides having lipase activity and use thereof for wheat separation | |
| US20220267783A1 (en) | Filamentous fungal expression system | |
| CN101578367A (en) | Selection of well-expressed synthetic genes | |
| EP3263698B1 (en) | Novel phytases and uses thereof | |
| US12460186B2 (en) | Second additional phytase variants and methods | |
| EP4273249A2 (en) | Improved expression of recombinant proteins | |
| WO2025132815A1 (en) | Novel cas nucleases and polynucleotides encoding the same | |
| WO2024120767A1 (en) | Modified rna polymerase activities | |
| EP3541954A1 (en) | Yeast cell extract assisted construction of dna molecules | |
| WO2025226596A1 (en) | Methods for producing secreted polypeptides | |
| WO2017211803A1 (en) | Co-expression of heterologous polypeptides to increase yield |