[go: up one dir, main page]

WO2025219709A1 - Identifying allosteric sites in enzymes - Google Patents

Identifying allosteric sites in enzymes

Info

Publication number
WO2025219709A1
WO2025219709A1 PCT/GB2025/050815 GB2025050815W WO2025219709A1 WO 2025219709 A1 WO2025219709 A1 WO 2025219709A1 GB 2025050815 W GB2025050815 W GB 2025050815W WO 2025219709 A1 WO2025219709 A1 WO 2025219709A1
Authority
WO
WIPO (PCT)
Prior art keywords
residues
activity
variant
src
mutations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/GB2025/050815
Other languages
French (fr)
Inventor
Antoni BELTRAN MARQUÉS
Benjamin LEHNER
André Jean FAURE
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fundacio Privada Centre de Regulacio Genomica CRG
Genome Research Liimited
Original Assignee
Fundacio Privada Centre de Regulacio Genomica CRG
Genome Research Liimited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB2407633.3A external-priority patent/GB202407633D0/en
Application filed by Fundacio Privada Centre de Regulacio Genomica CRG, Genome Research Liimited filed Critical Fundacio Privada Centre de Regulacio Genomica CRG
Publication of WO2025219709A1 publication Critical patent/WO2025219709A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis

Definitions

  • the invention relates to the field of enzymes.
  • Enzymes catalyse a diverse array of reactions that are critical to life. Due to the vast number of biological processes in which enzymes are implicated, they represent important targets for therapeutic or prophylactic interventions.
  • Most drugs inhibit enzyme activity by binding to active sites within the protein that are responsible for catalytic activity.
  • the active sites of enzymes are often structurally conserved amongst related enzymes, e.g., those of an enzyme family, which makes it difficult to target a specific enzyme without also inducing off-target effects as a result of interactions with enzymes other than the intended target.
  • the human genome encodes 538 protein kinases; orthosteric inhibitors of such kinases may inhibit tens or even hundreds of different kinases. This lack of target specificity often results in undesirable off target effects and toxicity.
  • One means by which drug specificity can be increased and/or toxicity reduced is by targeting allosteric sites within an enzyme.
  • Enzyme activity can be modulated, naturally or artificially, by perturbations of the enzyme at sites that are distant to the active site but which nonetheless influence the catalytic activity of the enzyme, i.e., allosteric sites. Since allosteric sites are typically less well conserved amongst related proteins, targeting these sites may lower cross-reactivity with other off-target enzymes and thereby improve the target specificity and/or reduce toxicity of a drug.
  • the ability to target allosteric sites, or indeed multiple allosteric sites, on a protein may allow interventions that overcome drug resistance mutations that arise, e.g., in cancer.
  • Protein kinases represent an important class of therapeutic drug targets that are implicated in numerous diseases, including cancer. Most known kinase inhibitors are orthosteric inhibitors that target conserved ATP binding pockets, which results in poor specificity and is susceptible to the occurrence of resistance mutations.
  • Src is an oncogenic protein kinase that is a particularly interesting drug target due to its role in cell regulation, cell growth, cell migration, and angiogenesis.
  • Src As is the case for all proteins, Src possesses myriad pockets, or sites, that have the potential to be allosteric modulators of Src activity. However, there is a paucity of information as to which, if any, of these pockets/sites are capable of allosteric modulation of Src activity.
  • the present inventors therefore sought to determine a complete allosteric map of Src kinase in order to identify allosteric sites on the protein that may be targeted to modulate Src activity.
  • a first aspect of implementations described herein relates to a computer-implemented method of training a machine learning model.
  • the method comprises obtaining training data specifying, for a wild type variant of a target enzyme and each of a plurality of mutant variants of the target enzyme, each mutant variant having a different set of one or more mutations, an activity measure for the respective variant and a folding measure for the respective variant; and based on the training data, training model parameters of a machine learning model to output, from input data specifying the set of one or more mutations in a given variant of the target enzyme, a predicted activity measure and a predicted folding measure for the given variant.
  • An activity measure of a given variant is a representation of that variant’s catalytic activity level and may, for example, be derived from a frequency of being in a folded active state.
  • a folding measure of a given variant is a representation of that variant’s solubility and may, for example, be derived from a frequency of folding. The folding measure may also be referred to as a solubility measure.
  • the input data may comprise a set of input elements corresponding to a given site of the given variant of the target enzyme, each input element specifying whether or not a specific mutation is present at the given site.
  • a site is an amino acid position defined with respect to the wild type variant, and a mutation may comprise an amino acid substitution, an amino acid omission, or an amino acid insertion at that site.
  • a site might be the xth amino acid position and there might be y possible mutations at that site: various amino acid substitutions, omission of the amino acid, or insertion of an amino acid immediately after the amino acid at that site.
  • the input elements that correspond to the given site may be one-hot encoded for the specific mutations they specify.
  • each element of the vector represents a specific mutation at a specific site (e.g. omission of the 213 th amino acid) and takes a value of 1 if that mutation at that site is present and takes a value of zero if that mutation at that site is not present.
  • the input data may comprise a non-zero bias term which is constant for the wild type variant and each mutant variant, otherwise the one-hot encoded vector for the wild type would have only zeros and would not influence the training of the machine learning model.
  • Training the machine learning model may comprise fitting a thermodynamic model of the target enzyme to the training data. This enables the machine learning model to extract from the training data information characterising which mutations affect specific transitions of the target enzyme between thermodynamic states.
  • the thermodynamic model may comprise a three-state model of the target enzyme with unfolded, folded inactive, and folded active states. In this case, the enzyme can transition from the unfolded state to the folded inactive state and vice versa, and from the folded inactive state to the folded active state and vice versa.
  • the folding measure may be related to a probability of the variant being in either the folded inactive state or the folded active state.
  • the folding measure may be dependent on a Gibbs free energy of folding which quantifies the partitioning of the enzyme molecules between the unfolded state and the inactive folded state.
  • the activity measure may be related to a probability of being in the folded active state.
  • the activity measure may be dependent on a Gibbs free energy of activity which quantifies the partitioning of the enzyme molecules between the inactive folded state and the active folded state.
  • the Gibbs free energy of activity suitably comprises a pseudo free energy which quantifies all biophysical changes other than enzyme folding that alter enzyme activity.
  • the folding measure is independent of the Gibbs free energy of activity.
  • the activity measure is dependent on both the Gibbs free energy of activity and the Gibbs free energy of folding.
  • the model parameters may comprise a first set of weights and a second set of weights.
  • the predicted activity measure may depend on both the first set of weights and the second set of weights
  • the predicted folding measure may depend on the second set of weights but be independent of the first set of weights.
  • This network architecture enables the first set of weights to represent the influence of a target site on enzyme activity for reasons other than enzyme folding and the second set of weights to represent the influence of a target site on enzyme activity for reasons of folding.
  • the machine learning model may comprise a neural network.
  • a first neuron of the neural network may generate a first neuron output value by processing the input data using a first set of weights
  • a second neuron of the neural network may generate a second neuron output value by processing the input data using a second set of weights.
  • the predicted activity measure may depend on both the first neuron output value and the second neuron output value. This reflects the three state thermodynamic model since the target enzyme must (1) transition from unfolded to folded inactive and (2) transition from folded inactive to folded active in order to reach the folded active state.
  • the predicted activity measure may depend on at least one first activation function applied to the first neuron output value and the second neuron output value.
  • the first activation function may be non-linear.
  • the first activation function may be based on the Boltzmann distribution.
  • the first activation function may have parameters that are trained during training of the neural network.
  • the predicted folding measure may depend on the second neuron output value and be independent of the first neuron output value. This reflects the three state thermodynamic model since the target enzyme only needs to transition from the unfolded to the folded inactive state to arrive in a folded state, and it is irrelevant to the folding measure whether the enzyme additionally transitions from the folded inactive state to the folded active state.
  • the predicted folding measure may depend on at least one second activation function applied to the second neuron output value.
  • the second activation function may be non-linear.
  • the second activation function may be based on the Boltzmann distribution.
  • the second activation function may have parameters that are trained during training of the neural network.
  • a second aspect of implementations described herein relates to a computer-implemented method of identifying one or more target sites of a target enzyme.
  • the method comprises obtaining model parameters from a machine learning model trained in accordance with the method of the first aspect; and based on the model parameters, identifying the one or more target sites of the target enzyme.
  • the machine learning model may disambiguate whether a given site’s influence on a probability of the target enzyme being active is due to the target site influencing a probability of correct folding of the target enzyme or is due to other factors. This makes it possible to extract information relating to the target site’s influence on the activity of the already folded enzyme.
  • the one or more target sites that are identified may be those predicted to influence the probability of the target enzyme being active due to reasons other than influencing a probability of correct folding.
  • the one or more target sites may be selected based on a subset of the model parameters learnt in the training that express the contribution of each mutation towards the activity measure due to reasons other than influencing a probability of correct folding. This enables target sites influencing the activity of the already folded enzyme to be identified.
  • the one or more target sites may be selected based on druggability. The selection of surface sites - i.e.
  • the model parameters may comprise a first set of weights and a second set of weights.
  • the predicted activity measure may depend on both the first set of weights and the second set of weights; and the predicted folding measure may depend on the second set of weights but be independent of the first set of weights.
  • This network architecture enables the first set of weights to represent changes in activity not caused by folding and the second set of weights to represent changes in activity caused entirely by folding.
  • the first set of weights can be used to identify potentially allosteric sites that influence activity of the already folded target enzyme.
  • the computer-implemented method may comprise generating an aggregate measure of the first set of weights for each of a plurality of given target sites.
  • a given target site may be represented by a subset of the first set of weights.
  • an aggregate measure such as an average or total of the subset of the first set of weights may be generated to represent that site’s overall influence on the activity of the already folded target enzyme.
  • the computer- implemented method may comprise selecting the one or more target sites based on ranking their aggregate measures. This may help to identify the most influential sites on the activity of the folded enzyme.
  • the computer-implemented method may comprise selecting the one or more target sites by comparing their aggregate measures to a predefined threshold. This may help to identify sites with at least a minimum level of influence on the activity of the folded enzyme.
  • a third aspect of implementations described herein relates to a computer-implemented method of identifying a mutated variant of interest of an enzyme.
  • the method comprises, for each of a plurality of mutated variants of a wild type enzyme, providing an input specifying mutations in the respective mutated variant to a machine learning model trained to output a predicted activity measure and a predicted folding measure for the mutated variant; receiving from the machine learning model a predicted activity measure and a predicted folding measure for each of the plurality of mutated variants; and based on the predicted activity measures and the predicted folding measures, selecting from the plurality of mutated variants at least one mutated variant of interest.
  • This approach may be useful for identifying one or more mutated variants that have not been tested in wet lab experiments but whose predicted folding measures and predicted activity measures take desirable values, such as high activity measures for example.
  • a fourth aspect of implementations described herein relates to a method of generating training data fortraining a machine learning model.
  • the method comprises performing wet lab experiments to obtain data for deriving an activity measure and a folding measure for a wild type variant of a target enzyme and each of a plurality of mutant variants of the target enzyme.
  • the method may comprise performing a solubility assay that provides a measure of the frequency of folding for each variant.
  • the method may comprise deriving the folding measure from the frequency of folding.
  • the method may comprise performing an activity assay that provides a measure of frequency of occurrence of mutated variants in an active thermodynamic state.
  • the method may comprise deriving the activity measure from the frequency.
  • the method may comprise performing an in vivo or in vitro activity assay that provides data for deriving the activity measure.
  • Enzymatic activity and protein solubility can be quantified using any suitable assay.
  • the approach can be used to quantify allosteric regulation in any enzyme, provided that both activity and solubility can be quantified at scale.
  • the target sites may be allosteric sites of the target enzyme.
  • the active sites of enzymes are often structurally conserved amongst related enzymes. Thus, targeting the less well conserved allosteric sites within an enzyme may reduce off-target effects and/or toxicity.
  • the target sites may be located within (or form all or part of) a structurally accessible surface pocket on the target enzyme. Identification of allosterically active surface pockets enables prioritization of said pockets for drug development.
  • the target enzyme may be a protein kinase.
  • the protein kinase may be Src kinase.
  • the target enzyme is Src kinase having the amino acid sequence SEQ ID NO: 1.
  • Figure 1A is a flowchart showing a method of identifying allosteric sites in enzymes in accordance with techniques described herein.
  • Figure IB is a flowchart showing a method of identifying a mutated variant of interest of a target enzyme in accordance with techniques described herein.
  • Figure 2A is an overview of the toxicity selection assay to measure the protein kinase activity of Src kinase domain variants at scale, yes, yeast growth; no, yeast growth defect.
  • Figure 2B is an overview of the abundancePCA (aPCA) selection assay to measure in vivo abundance of Src kinase domain variants at scale, yes, yeast growth; no, yeast growth defect. DHF, dihydrofolate; THF, tetrahydrofolate.
  • Figures 2C and 2D show the correlation of activity fitness measurements to in vivo phosphotyrosine levels, and abundance fitness measurements to in vivo Src levels, respectively (Ahler et al, Mol. Cell 74, 393-408.e20 (2019)).
  • Figure 3 is a flow chart showing a method of training a machine learning model in accordance with techniques described herein.
  • Figure 4A shows the three-state equilibrium and corresponding thermodynamic model.
  • AGf Gibbs free energy of folding
  • AG a Gibbs free energy of the active state
  • Kf folding equilibrium constant
  • Ka inactive -active state equilibrium constant
  • pf fraction folded
  • pfa fraction folded and active
  • ff nonlinear function of AGf
  • ffa nonlinear function of AGf and AG a
  • R gas constant
  • T temperature in Kelvin.
  • Figure 4B is a schematic diagram showing a neural network that has been trained in accordance with techniques described herein.
  • Figure 5 is a flow chart showing a method of identifying one or more target sites of a target enzyme in accordance with techniques described herein.
  • Figures 6A to 6D are heat maps showing inferred changes in activity free energies (AAG a ) (A and B) and folding free energies (A AGf) (C and D) for all 5,111 possible single substitution variants in the Src KD.
  • Figures 6E and 6F show the structure of the Src KD coloured by the per-site weighted mean AAGf (PDB ID: 2SRC). The secondary structure elements most enriched in destabilizing mutations are annotated.
  • Figure 6G shows the enrichment of destabilizing and stabilizing mutations in secondary structure elements of Src (Fisher’s exact test).
  • Figures 7A and 7B show the structure of the Src KD coloured by the per-site weighted mean AAG a (PDB ID: 2SRC).
  • Figure 7C shows the enrichment of inactivating and activating mutations in secondary structure elements (Fisher’s exact test).
  • Figure 7D is a heatmap showing predicted activity measures for mutant variants of Src kinase having mutations at the active site.
  • Figure 7E is a heatmap showing predicted activity measures for mutant variants of Src kinase having mutations at known allosteric sites.
  • Figure 8A is a summary of the regulatory impact and druggability properties of Src surface pockets.
  • Mean AAG a average of per-site averaged AAG a of all residues in the pocket.
  • Max AAG a maximum of per- site averaged AAG a of all residues in the pocket.
  • Min AAG a minimum of per-site averaged AAG a of all residues in the pocket.
  • Pockets significantly enriched or depleted in activating and inactivating mutations (Fisher’s exact test FDR ⁇ 0.05) are labeled with stars.
  • Figures 8B to 8D are heatmaps showing AAG a of mutations in Src previously targeted pockets (B), in Src pockets homologous to pockets known to be allosteric and/or targeted by drugs in other kinases (C), and in novel Src pockets (D).
  • Figure 9 illustrates a comparison of predicted activity of allosteric and non-allosteric sites.
  • Figure 10 a flowchart showing a method of identifying a mutated variant of interest of a target enzyme in accordance with techniques described herein.
  • FIG. 11 is a block diagram showing a computer suitable for implementing techniques described herein.
  • the approach uses data from wet lab experiments relating to a target enzyme and uses this data extract insights relating to the allosteric landscape of the target enzyme.
  • a comprehensive allosteric map of an enzyme can be produced in this way that identifies regions of allosteric activity, whether these regions exert detrimental control to the enzyme’s catalytic activity or whether they increase it, and the extent to which these regions affect the enzyme’s catalytic activity.
  • the approach has been verified by accurately predicting previously known allosteric sites and can be used to identify previously unknown allosteric sites of a target enzyme.
  • the target enzyme may suitably be a protein kinase such as Src kinase.
  • the target enzyme may be one of the more than 500 protein kinases encoded in the human genome, including kinases having a role in cancer (such as Src/Abl family kinases, Raf kinases and PI3K kinases) or other diseases (such as DYRK family kinases).
  • the enzyme may be from a species other than human.
  • Wet lab experiments are performed 102 on a wild type of the target enzyme and on a set of variants of the enzyme. The variants have mutations at a range of sites of interest of the enzyme and the effects of the mutations on the enzyme’s catalytic activity and solubility are measured.
  • Preliminary data processing is performed 104 on the results from the wet lab experiments to produce an activity measure and a folding measure for the wild type and each mutated variant of the enzyme.
  • the activity measure represents the variant’s catalytic activity
  • the folding measure represents the variant’s ability to fold and is derived from measurements from solubility assays.
  • a machine learning model is then trained 106 to predict the activity and folding measures in order to fit a model of the enzyme to the experimental data. This training step is significantly facilitated by using training data relating not only to single amino acid mutations but also to double or combinatorial mutants.
  • the machine learning model When the machine learning model has been trained with sufficient data, it is used to quantify the extent to which the mutations affect the enzyme’s catalytic activity independently of their effects on enzyme folding. As discussed further below, his enables target sites of the enzyme to be identified 108 that may be allosteric.
  • FIG. IB a method 100B of identifying a mutated variant of interest of a target enzyme is shown.
  • the steps 102, 104 and 106 are in common with those of method 100A of Figure 1A.
  • the method 100B comprises identifying 110 mutated variants that were not tested in the wet lab experiments but which may have allosteric sites of interest. This provides another application of the techniques described below.
  • the approach may be repeated on different genetic backgrounds in order to achieve doubly or multiply mutated variants.
  • the genetic backgrounds may be chosen to provide a range of different enzyme activities due to changes in stability or catalytic activity. This aims to resolve ambiguities where a number of causal biophysical changes could account for an observed mutational effect and allows the inference of the in vivo biophysical effects of mutations.
  • Enzymatic activity and protein solubility can be quantified using any suitable assay provided that both activity and solubility can be quantified at scale. Such assays are known in the art.
  • enzymatic activity may be quantified using a cellular toxicity assay, where the inhibition of cellular growth is directly proportional to the amount of protein phosphorylation induced by the enzyme (as shown in Figure 2A).
  • Enzymatic activity may be determined by complementation of thermosensitive alleles. This strategy uses yeast/bacteria strains containing mutations that affect fitness only at high temperatures (e.g. 42°C). The function is then rescued using the endogenous (or orthologous) WT gene, which is then mutated to select for function.
  • An alternative strategy makes use of the numerous auxotrophic markers or antibiotic/chemical resistance genes present in yeast and bacteria.
  • mutagenesis of the enzymes necessary for survival and/or growth in the absence or presence of specific components in the growth medium allows for functional selection.
  • the solubility of the enzyme may be quantified in the same cells using a protein abundance selection assay that uses protein fragment complementation to quantify soluble protein concentration over at least three orders of magnitude (e.g. AbundancePCA, as shown in Figure 2B).
  • a protein abundance selection assay that uses protein fragment complementation to quantify soluble protein concentration over at least three orders of magnitude (e.g. AbundancePCA, as shown in Figure 2B).
  • Sequencing data from wet lab experiments is subject to preliminary processing in order to produce activity and folding measures that can be used as training data for the machine learning model.
  • Sequencing data from wet lab experiments may suitably be provided in the form of FastQ fdes from paired-end sequencing of aPCA and toxicity experiments.
  • the sequencing data is processed to generate fitness scores which can be used as the activity and folding measures and also to calculate errors in the fitness scores.
  • a suitable approach for processing the sequencing data is described in: DiMSum: an error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies, Faure et al., Genome Biology (2020) 21:207 https://doi.org/10.! 186/sl 3059-020-02091-3.
  • the resulting activity and folding measures are used to train a machine learning model.
  • This process fits a model of the enzyme to the training data and enables the effects of the mutations on the enzyme’s catalytic activity to be determined independently of their effects on folding.
  • a suitable model to train the machine learning model is described in: MoCHI: neural networks to fit interpretable models and quantify energies, energetic couplings, epistasis and allostery from deep mutational scanning data, Faure and Lehner, bioRxiv (2024) https ://doi. or g/ 10.1101/2024.01.21.575681.
  • a method 400 of training a machine learning model comprises obtaining 402 training data for a wild type variant of a target enzyme and for each of a plurality of mutant variants of the target enzyme.
  • the training data specifies, for each variant, an activity measure and a folding measure for that variant.
  • Each mutated variant has a different set of one or more mutations.
  • Model parameters of the machine learning model are trained 404, based on the training data, to output a predicted activity measure and a predicted folding measure for a given variant from input data specifying the set of one or more mutations of the given variant.
  • the input data may suitably specify for a given site of a given variant of the target enzyme whether a specific mutation of an amino acid is present at that site. For example, there may be mutation at that site that comprises an amino acid substitution, an amino acid omission or an amino acid insertion.
  • the input data may comprise a set of input elements corresponding to a given site of the given variant of the target enzyme, each input element specifying whether or not a specific mutation is present at the given site.
  • one-hot encoding may suitably be used to specify the mutated variants.
  • a mutated variant may be represented by a vector having a series of elements that take values of 0 or 1.
  • Each element in the series represents whether a specific mutation (such as a specific amino acid substitution) is present at that site by taking a value of 1 to indicate the presence of that mutation and by taking a value of 0 to indicate the absence of that mutation.
  • the input elements corresponding to the given site may be one-hot encoded for the specific mutations they specify.
  • every element in the one-hot encoded vector takes a value of 0. It is therefore suitable to provide a non-zero bias term as an additional element in the input data that is constant for the wild type variant and each mutant variant to ensure that the wild type variant is taken into account during training of the machine learning model.
  • thermodynamic model of the enzyme may be fitted to the training data.
  • a suitable thermodynamic model may treat the enzyme as having distinct states depending on whether the enzyme is folded and whether, if folded, the enzyme is active.
  • the enzyme has three distinct states: unfolded, folded inactive and folded active.
  • the enzyme can transition from the unfolded state to the folded inactive state and vice versa, and can transition from the folded inactive state to the folded active state and vice versa:
  • the energy change for the transition from the unfolded state to the folded inactive state is a Gibbs free energy of folding, Gf, and similarly the energy change for the transition from the folded inactive state to the folded active state is a Gibbs free energy of activity, AG a .
  • Gf Gibbs free energy of folding
  • AG a Gibbs free energy of activity
  • the probability of a variant of the enzyme being in a folded state is related to the predicted folding measure in the machine learning model, and the probability of the variant being in the active state is related to the predicted activity measure in the machine learning model.
  • an example machine learning model comprises a neural network 500 that is configured to fit a thermodynamic model to the training data.
  • the neural network 500 is trained to output, based on input data 502 that specifies one or more mutations of a given mutant variant of a target enzyme, a predicted activity measure 504 and a predicted folding measure 506 for the given mutant variant.
  • the neural network 500 may be trained by any suitable method such as using a back propagation algorithm.
  • the input data 502 encodes the one or more mutations of the mutant variant using one-hot encoded amino acid sequences 508.
  • Each element, x t , of the training data represents an ith specific mutation of the variant, i.e. a specific type of mutation and a specific position in the wildtype amino acid sequence.
  • the element, x t takes a value of 0 or 1 depending on whether the specific mutation is present at that position. A value of zero indicates that the specific mutation is not present at that position, while a value of 1 indicates that the specific mutation is present at that position.
  • a given position in the amino acid sequence may correspond to one, two or more elements corresponding to different specific mutations that could be provided at that position (e.g.
  • the input data also includes a non-zero bias term 510 that is constant for the wild type variant and each mutant variant in order that the wild type variant be included in the training data.
  • the non-zero bias term 510 may be 1. Otherwise, the neural network cannot take the wild type into account because all its elements, x t , in the one-hot encoded vector take values of zero.
  • the neural network 500 learns model parameters using the training data.
  • the architecture of the neural network is such that the model parameters comprise a first set of weights 512 and a second set of weights 514, and the predicted activity measure 504 depends on both the first set of weights 512 and the second set of weights 514, whereas the predicted folding measure 506 depends only on the second set of weights 514 and is independent of the first set of weights 512.
  • This reflects the fact that, in the thermodynamic model, folding is independent of whether the variant is active or inactive, whereas for the variant to be active is must transition from unfolded to folded and additionally transition from folded inactive to folded active.
  • the architecture of the neural network 500 is suitable for fitting a thermodynamic model to the training data. In this arrangement, it is assumed that the mutation effects of double or combinatorial mutants combine additively in latent space which represents free energies.
  • the first set of weights 512 and the second set of weights 514 correspond respectively to a first neuron and a second neuron of the neural network 500.
  • the first neuron processes the input data 502 using the first set of weights 512 to generate a first neuron output 516
  • the second neuron processes the input data 502 using the second set of weights 514 to generate a second neuron output 518.
  • the first neuron output 516 is a sum of the products of the weights of the first set of weights 512 and the corresponding elements of the input data 512.
  • the product of the non-zero bias term 510 (which takes a value of 1 in this example) and the weight G b0 is added to the product of the element x ⁇ and the weight G bl , and so on:
  • the second neuron output 518 is a sum of the products of the weights of the second set of weights 514 and the corresponding elements of the input data 512.
  • the product of the non-zero bias term 510 (which takes a value of 1 in this example) and the weight Gf 0 is added to the product of the element x ⁇ and the weight G ⁇ . and so on:
  • the predicted activity measure 504 depends on both the first neuron output 516 and the second neuron output 518, and by contrast the predicted folding measure 506 depends only on the second neuron output 518 and is independent of the first neuron output 516.
  • the predicted activity measure 504 depends on at least one first activation function 520 applied to the first neuron output 516 and the second neuron output 518.
  • the predicted folding measure 506 depends on at least one second activation function 522 applied to the second neuron output 518.
  • one or both of the at least one first activation function 520 and the at least one second activation function 522 may comprise a non-linear function.
  • this may be any arbitrary non-linear function inferred from the training data.
  • one or both of the at least one first activation function 520 and the at least one second activation function 522 may suitably be based on the Boltzmann distribution.
  • one or both of the at least one first activation function 520 and the at least one second activation function 522 may have parameters that are trained during training of the neural network 500.
  • the non-linear activation functions 520, 522 could be defined by an equation, or by a look up table.
  • Coefficients of the equation or the look up table could be static (e.g. determined with reference to the Boltzmann distribution, or dynamically trained based on the training data).
  • a first linear activation function 524 receives an output from the first activation function 520 and outputs the predicted activity measure 504.
  • a second linear activation function 526 receives an output from the second activation function 522 and outputs the predicted folding measure 506.
  • these linear transformations 524, 526 are not needed for generating the predicted activity measure 504 and the predicted folding measure 506.
  • a method 600 of identifying one or more target sites of a target enzyme comprises obtaining 602 model parameters from a machine learning model trained in accordance with the methods of this disclosure and, based on the model parameters, identifying 604 one or more target sites, which may comprise allosteric sites, of the target enzyme.
  • a site may be defined as a specific residue, for example the ith amino acid, in an amino acid sequence of the target enzyme.
  • a target site may be defined as a site of interest, such as a specific residue in an amino acid sequence of the target enzyme that has an allosteric effect on the target enzyme’s catalytic activity.
  • the machine learning model may suitably disambiguate whether a given site’s influence on a probability of the target enzyme being active is due to the target site influencing a probability of correct folding of the target enzyme or is due to other factors. For example, this may be achieved by a machine learning model that comprises the neural network 500 of Figure 4B.
  • the target sites may be those predicted to influence the probability of the target enzyme being active due to reasons other than influencing a probability of correct folding.
  • the sites that affect only the enzyme’s transition from the folded inactive state to the folded active state in the thermodynamic model may be identified.
  • the sites that affect this particular transition are more likely to be allosteric sites.
  • the target sites may be selected based on a subset of the model parameters learnt in the training that express the contribution of each mutation towards the activity measure due to reasons other than influencing a probability of correct folding. For example, in the neural network 500 it is the first set of weights 512 that expresses the contribution of each mutation towards the activity measure 504 independently of the mutation’s effect on folding.
  • model parameters of the machine learning model may in various examples comprise a first set of weights and a second set of weights, and the predicted activity measure may depend on both the first set of weights and the second set of weights, while the predicted folding measure may depend on the second set of weights but be independent of the first set of weights.
  • identifying the one or more target sites may comprise generating an aggregate measure of the first set of weights for each of a plurality of target sites.
  • each target site is represented by a sub-set of the first set of weights 512. If there are y possible mutations of the amino acid at the target site, then there will be y weights that represent that target site.
  • An aggregate measure of the y weights such as a total or average weight, may be generated to provide a representation of the overall effect of that site on the target enzyme’s catalytic activity.
  • the aggregate measure may comprise the following average, where y is the number of possible mutations of that site:
  • the aggregate measures may be ranked to produce a list of target sites in order of their effect on catalytic activity. Additionally or alternatively, the aggregate measures may be compared to a predefined threshold to make a determination of whether the target sites are of interest.
  • target sites may also be taken into account in the selection of target sites, for example druggability and/or whether the target sites are surface sites of the enzyme.
  • the machine learning model of the present disclosure generates reliable predictions for target enzymes.
  • a machine learning model in accordance with the present disclosure predicts, in line with expectations, that active site mutations of the target enzyme Src kinase reduce catalytic activity.
  • each column of the heatmap represents a residue in the amino acid sequence of the enzyme.
  • a subset of the enzyme is represented, in particular the ATP binding site, the catalytic loop, the Mg2+ positioning loop and the substrate positioning loop.
  • the active site comprises amino acids that directly contact ATP, Mg2+ or the substrate peptide phosphosite, and these residues are marked with an asterisk at the top of their column.
  • Each row of the heatmap represents a substitution mutation of the residue at that site.
  • Each column therefore has one element that does not represent a mutation because the substitute amino acid is the same as the wild type residue at that site. These elements are marked with a dot.
  • the heatmap represents predicted activity measures for mutations that are detrimental to catalytic activity.
  • the darker the element in the heatmap corresponding to increases in the Gibbs free energy of activity representing, in the thermodynamic model, a greater barrier to the mutated variant becoming active), the more detrimental that specific mutation at that site is to catalytic activity.
  • Mutations in the active site are overwhelmingly detrimental to kinase activity, with 225 out of 247 decreasing catalytic activity. Mutations in the active site are nearly 40 times more likely to decrease enzymatic activity than mutations elsewhere in the kinase.
  • the predicted activity measure is a predicted change in a Gibbs free energy of activation, AG a . and therefore values of the predicted change, AAG a , are positive.
  • the heatmap also represents predicted activity measures that increase catalytic activity. The lighter the element in the heatmap, the more that specific mutation at that site increases catalytic activity. There is only a minority of mutations that increase catalytic activity and none of these are active site mutations.
  • a machine learning model in accordance with the present disclosure predicts, in line with previous predictions, that mutations in eleven non-active site residues of Src kinase reduce catalytic activity. These residues have previously been predicted to be part of an allosteric network that communicates between substrate and ATP binding sites.
  • Figure 7E shows predicted activity measures for mutations that are detrimental to catalytic activity. Mutations at these sites are almost all detrimental to catalytic activity, taking positive values of a Gibbs free energy of activity, G a . Thus, the predictions of the machine learning model are consistent with previous predictions.
  • Figure 7E also shows predicted activity measures that increase catalytic activity. Only one mutation at the sites shown increases catalytic activity, taking a negative value of a Gibbs free energy of activity, A G a .
  • Figure 9 shows a series of boxplots that summarise predictions made by a machine learning model in accordance with the present disclosure.
  • the boxplots represent average predicated activity measures for various sites on the enzyme and summarise the types of predictions that are shown in more detail in the heat maps described above.
  • the activity measures comprise predicted changes in a Gibbs free energy of activation, AG a .
  • Sites that are part of the active site are represented by box 902
  • sites predicted to be allosteric are represented by box 904
  • sites predicted to be non- allosteric are represented by box 906.
  • average predicted changes in the Gibbs free energy of activation, AG a for previously described allosteric sites are represented by box 908.
  • the present disclosure extends to a method 1000 of identifying a mutated variant of interest of a target enzyme.
  • This approach uses a machine learning model that has been trained to predict an activity measure and a folding measure for a variant of a target enzyme in accordance with techniques disclosed herein.
  • the machine learning model is used to identify mutant variants of interest which may be untested in wet lab experiments.
  • the method 1000 comprises, for each of a plurality of mutated variants of a wild type enzyme, providing 1002 an input specifying mutations in the mutated variant to a machine learning model trained to output a predicted activity measure and a predicted folding measure for the mutated variant.
  • the input may for example be one-hot encoded as described elsewhere herein.
  • the method comprises receiving 1004 from the machine learning model a predicted activity measure and a predicted folding measure for each of the plurality of mutated variants.
  • the method further comprises, based on the predicted activity measures and the predicted folding measures, selecting 1006 from the plurality of mutated variants at least one mutated variant of interest.
  • the at least one mutated variant of interest may be selected based on predefined thresholds for the predicted folding measure and the predicted activity measure.
  • the selected variants may be untested in the lab, and thus the machine learning model may help identify mutated variants of interest that are suitable for wet lab experimentation.
  • Figure 11 shows a computer apparatus 1100 suitable for implementing methods and according to the present disclosure.
  • the apparatus 1100 comprises a processor 1102, an input-output device 1104, a communications portal 1106 and computer memory 1108.
  • the memory 1108 may store code that, when executed by the processor 1102, causes the apparatus 1100 to perform any of the computer-implemented methods disclosed herein.
  • Proto-oncogene tyrosine-protein kinase Src (aka proto-oncogene c-Src, or c-Src), herein referred to as Src or Src kinase, is a non-receptor tyrosine kinase that belongs to the family of Src family kinases.
  • Wild-type Src kinase comprises an SH2 domain, SH3 domain, and tyrosine kinase domain, the latter of which is responsible for catalysing the phosphorylation of specific target tyrosine residues in other tyrosine kinases.
  • Wild-type human Src kinase comprises the amino acid sequence set forth in SEQ ID NO: 1 herein.
  • any and all references to positions within Src kinase are intended to encompass the equivalent positions/residues in an analogous sequence, even if said sequence is not identical to SEQ ID NO: 1.
  • E283 will refer to the glutamic acid that is present in equivalent position regardless if it is the residue is number 283 when counted from the N-terminus.
  • a method of modulating the activity of Src kinase comprising mutating one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514.
  • Src kinase By modulating the activity of Src kinase it is meant the alteration of kinase activity arising from a variant of Src as compared with the kinase activity of the unmodified, wild-type Src. It will be understood that encompassed within the present invention are variants of the wild-type and variant Src proteins that include, for example, tags for enabling purification or identification. The person skilled in the art would understand that a comparison of activity could be carried out using equivalent constructs that differ in the amino acids present within the Src sequence region defined by SEQ ID NO: 1.
  • a Src kinase comprising SEQ ID NO: 1 in addition to a protein tag may be compared with a variant Src kinase comprising a modified version of SEQ ID NO: 1 in addition to the same tag.
  • Any suitable assay may be used to reliably compare the activity of Src kinase.
  • the method of modulating the activity of Src kinase comprises mutating one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven, twenty eight, twenty nine, thirty, thirty one, thirty two, thirty three, thirty four, thirty five, thirty six, thirty seven, thirty eight, thirty nine, forty, forty one, forty two, forty three, forty four, forty five, forty six, forty seven, forty eight, forty nine, fifty, fifty one, fifty two, fifty three, fifty four, fifty five, fifty six, fifty seven, or fifty eight residues.
  • the method of modulating the activity of Src kinase comprises mutating one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven, twenty eight, twenty nine, thirty, thirty one, thirty two, thirty three, thirty four, thirty five, thirty six, thirty seven, thirty eight, thirty nine, forty, forty one, forty two, forty three, forty four, forty five, forty six, or forty seven residues.
  • the method of modulating the activity of Src kinase comprises mutating one, two, three, four, five, six, seven, eight, nine, ten, or eleven residues.
  • the method of modulating the activity of Src kinase comprises mutating one residue.
  • the present inventors have generated an array of single point mutants of Src kinase, which enabled determination of the allosteric effects of said residue/site.
  • the term site is taken to mean a region of Src kinase; the site may comprise or consist of a single amino acid residue or more than one residue (e.g., a group of residues). Where the site comprises more than one amino acid, the amino acids may be continuous or discontinuous in the primary sequence or on the surface of the folded protein (i.e., comprising a continuous or discontinuous patch or site).
  • an inhibitory effect refers to a reduction in kinase activity; an activating effect to an increase in kinase activity; and no effect to no change in activity, all as compared with the wild-type Src kinase.
  • the one or more residues is selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509.
  • the one or more residues is selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509, and wherein mutation of said residue(s) results in the inhibition or reduction of Src kinase activity.
  • the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514.
  • the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514, and wherein mutation of said residue(s) results in an increase of Src kinase activity.
  • two residue from the groups are mutated.
  • Src kinase may be modified such that groups of residues are mutated or modified; these groups may form regions on the protein surface and may represent pockets or patches that have an allosteric effect on Src activity.
  • the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 &
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L4I0 & A4I I.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: R388, P428, 1429, K430, T432, A436, A437 & F442.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A4I I.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: M317, K318, K319, L328, V331 & 1339.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, I339 & V340.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V386, R388, R412, L413, 1414, E415, N417, E418, Y419, Y439, G440, R441, F442 & T443.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: W289, R321, Q327, L328, Y329, A330 & V331.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L270, W289, T293, R294, V295, Y329, A330 & V340.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L272, E273, V274, K275, L276, V284, R294, V295, A296, 1297 & Y343.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of R422, Q423, 1429, V470, N471, R472, E473, V474, L475 & D476.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: T293, R294, V295, Y329, T341, E342, Y343, M344 & S345.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: F352, L353, K354, G358, K359, Y360, L361, R362, L366, E457, L458, T459, T460, K461, G462, P488 & E489.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: E268, S269, L270, R271, L272 & Y338.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: G481, R483, D496, M498, C499, Q500, C501, W502, R503, K504, E505 & E508.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L361, R362, L363, L366, L458, T459, T460, K461, R463, P487, P488, E489, C490, P491 & L494.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: P364, V367, D368, A371, F518, L519, E520, D521, Y522, F523 & T524.
  • more than one residue within a group of residues is mutated.
  • any number of residues within the groups defined herein may be modified or mutated.
  • the one or more residues is selected from one group of residues, as defined herein.
  • the one or more residues is selected from more than one group of residues, as defined herein.
  • the one or more residues is selected from two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven or twenty eight groups of residues.
  • modified or mutated residues may be selected from any number of groups defined herein.
  • the one or more residues is an allosteric site.
  • the one or more residues is located within an allosteric site.
  • the activity of Src kinase may be modulated.
  • Said modulation refers to the increasing (activating) or decreasing (inactivating) of Src kinase activity as compared with a wild-type control.
  • the modulating is activating or inactivating.
  • the modulating is inactivating.
  • the method of the invention decreases the kinase activity of Src kinase, relative to unmodified Src kinase.
  • the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A411; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A
  • a method of decreasing the activity of Src kinase comprising mutating one or more residues selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 &
  • the modulating is activating.
  • the method of the invention increases the kinase activity of Src kinase, relative to unmodified Src kinase.
  • the one or more residues is selected from one or more of the following groups of residues: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; and b) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
  • a method of increasing the activity of Src kinase comprising mutating one or more residues selected from one or more of the following groups of residues: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; and b) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
  • Any of the residues herein may be mutated to constitute a variant comprising any other naturally or non-naturally occurring amino acids.
  • said mutation is a conservative mutation.
  • said mutation is a non-conservative mutation.
  • a method of modulating the activity of Src kinase comprising mutating one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514; to one or more of the amino acids selected from the group consisting of: A, R, N, D,
  • the method is an in vitro method.
  • the method is an in vivo method.
  • the method is an ex vivo method. In one embodiment, the method is performed ex vivo.
  • mutating one or more residues may refer to mutational changes made directly at the amino acid level or mutational changes made at a nucleic acid level, e.g., by altering the nature of the codon encoding the corresponding Src kinase residue.
  • Said mutating may result in a stable, heritable change, e.g., as a result of mutation of genomic DNA, such as in a cell; or it may result in a transient change, e.g., as a result of the introduction of mRNA encoding a Src kinase comprising said mutated residues.
  • the mutating is carried out by means of mutating the nucleic acid sequence encoding the one or more residues.
  • a method of modulating the activity of Src kinase comprising mutating one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514, wherein said mutating is comprises the step of altering the nucleic acid sequence encoding said
  • Said mutation may be carried out, at the nucleic acid level, by PCR-based techniques, TALEN based gene editing, CRISPR/CAS based gene editing, etc.
  • the mutating is carried out using CRISPR/CAS based gene editing technologies, or variants thereof.
  • the effect of modifying a single amino acid may have a specific effect
  • the cumulative effect of modifying more than one amino acid or a group of amino acids including said single amino acid may be different from the effect of the single amino acid alone.
  • a single point mutation that allosterically increases Src kinase activity may form part of a region or pocket that, when modified altogether, allosterically decreases Src kinase activity, and vice versa.
  • Src kinase may be modified or mutated to produce variants of Src kinase, as compared to a wild-type Src kinase. Said variants may have modified activities relative to the wild-type Src kinase.
  • a polypeptide encoding a Src kinase variant wherein the polypeptide comprises a mutation, relative to a wild-type Src kinase, at one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y51
  • the one or more residues is selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509.
  • the one or more residues is selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509, and wherein the polypeptide has reduced activity relative to a wild-type Src kinase.
  • the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514.
  • the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514, and wherein the polypeptide has increased activity relative to a wild-type Src kinase.
  • the polypeptide comprises a mutation, relative to a wild-type Src kinase, at one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven, twenty eight, twenty nine, thirty, thirty one, thirty two, thirty three, thirty four, thirty five, thirty six, thirty seven, thirty eight, thirty nine, forty, forty one, forty two, forty three, forty four, forty five, forty six, forty seven, forty eight, forty nine, fifty, fifty one, fifty two, fifty three, fifty four, fifty five, fifty six, fifty seven, or fifty eight residues.
  • the polypeptide comprises a mutation, relative to a wild-type Src kinase, at one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven, twenty eight, twenty nine, thirty, thirty one, thirty two, thirty three, thirty four, thirty five, thirty six, thirty seven, thirty eight, thirty nine, forty, forty one, forty two, forty three, forty four, forty five, forty six, or forty seven residues.
  • the polypeptide comprises a mutation, relative to a wild-type Src kinase, at one, two, three, four, five, six, seven, eight, nine, ten, or eleven residues.
  • the polypeptide comprises a mutation, relative to a wild-type Src kinase, at only one residue.
  • the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A4II; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A4I I; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L39
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: R388, P428, 1429, K430, T432, A436, A437 & F442.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A411.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: M317, K318, K319, L328, V331 & 1339.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, I339 & V340.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V386, R388, R412, L413, 1414, E415, N417, E418, Y419, Y439, G440, R441, F442 & T443.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: W289, R321, Q327, L328, Y329, A330 & V331.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L270, W289, T293, R294, V295, Y329, A330 & V340.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L272, E273, V274, K275, L276, V284, R294, V295, A296, 1297 & Y343.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of R422, Q423, 1429, V470, N471, R472, E473, V474, L475 & D476.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: T293, R294, V295, Y329, T341, E342, Y343, M344 & S345.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: F352, L353, K354, G358, K359, Y360, L361, R362, L366, E457, L458, T459, T460, K461, G462, P488 & E489.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: E268, S269, L270, R271, L272 & Y338.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: G481, R483, D496, M498, C499, Q500, C501, W502, R503, K504, E505 & E508.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L361, R362, L363, L366, L458, T459, T460, K461, R463, P487, P488, E489, C490, P491 & L494.
  • the one or more residues is one, more than one, and/or all of the residues in the group consisting of: P364, V367, D368, A371, F518, L519, E520, D521, Y522, F523 & T524.
  • the polypeptide comprises a mutation at one residue within a group of residues.
  • the polypeptide comprises a mutation at more than one residue within a group of residues.
  • the polypeptide comprises a mutation at two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen or nineteen residues within a group of residues.
  • the polypeptide comprises a mutation at all residues within a group of residues.
  • the polypeptide comprises a mutation to one or more residues selected from one group of residues, as defined herein.
  • the polypeptide comprises a mutation to one or more residues selected from more than one group of residues, as defined herein.
  • the polypeptide comprises a mutation to one or more residues selected from two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven or twenty eight groups of residues.
  • the one or more residues is located within an allosteric site.
  • the kinase activity of the polypeptide is modulated relative to the kinase activity of wild-type Src kinase.
  • the modulation is an increase or a decrease in kinase activity relative to the kinase activity of wild-type Src kinase.
  • the modulation is a decrease in kinase activity relative to the kinase activity of wild-type Src kinase.
  • the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F40
  • the modulation is an increase in kinase activity relative to the kinase activity of wild-type Src kinase.
  • the one or more residues is selected from one or more of the following groups of residues: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475
  • Any of the residues herein may be mutated to constitute a variant comprising any other naturally or non-naturally occurring amino acids.
  • the variant comprises a mutation that constitutes a conservative mutation.
  • the variant comprises a mutation that constitutes a non-conservative mutation.
  • the polypeptide encoding a Src kinase variant comprises a mutation, relative to a wild-type Src kinase, that constitutes a mutation of any one or more of E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502,
  • a variant may also be referred to as a modified or mutated Src kinase, in which each will be understood to be a variant or modification of, or mutated relative to a wild-type Src kinase, e.g., comprising or consisting of SEQ ID NO: 1.
  • the present invention also provides binding molecules that target said allosteric sites.
  • Said molecules may modulate the activity of Src kinase, e.g., wild-type Src kinase.
  • a binding molecule which binds to one or more target sites on Src kinase, wherein the one or more target sites comprises one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514.
  • a binding molecule which binds to one or more target sites on Src kinase, wherein the one or more target sites comprises one or more residues selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509.
  • a binding molecule which binds to one or more target sites on Src kinase, wherein the one or more target sites comprises one or more residues selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514.
  • a binding molecule which binds to one or more target sites on Src kinase, wherein the one or more target sites comprises one or more residues selected from the group consisting of: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 &
  • the one or more target sites is one or more allosteric sites.
  • the one target sites is an allosteric site. [0239] In one embodiment, the one or more target sites is surface exposed.
  • the one or more target sites is solvent accessible.
  • At least one of the one or more target sites is solvent accessible.
  • the one or more target sites is partially solvent accessible.
  • the one or more target sites form a pocket or patch on the surface of Src kinase.
  • the one or more target sites is accessible for binding by one or more binding molecules.
  • the one or more target sites is accessible for binding by one or more binding molecules by a lock and key, or an induced fit mechanism.
  • the target sites herein may be targeted by a binding molecule.
  • Said binding molecules may have therapeutic benefit.
  • the one or more target sites is druggable.
  • the binding molecule is a small molecule or a biologic.
  • the binding molecule is a polypeptide.
  • the binding molecule is an antibody or a derivative thereof, optionally a nanobody, a Fab fragment, a scFv, or the like.
  • the binding molecule is an antibody mimetic, such as an affibody.
  • the binding molecule is a DARPIN.
  • the binding molecule is a nucleic acid.
  • the binding molecule is an aptamer.
  • the binding molecule is: a) a DNA b) an RNA c) a DNA/RNA hybrid e) a modified DNA; or f) a modified RNA.
  • modified DNA or RNA it is meant a DNA or RNA comprising non-naturally occurring modifications (e.g., chemical groups, such as phosphorothioate intemucleoside linkages) as compared with DNA and RNA found in vivo.
  • non-naturally occurring modifications e.g., chemical groups, such as phosphorothioate intemucleoside linkages
  • the binding molecule is a small molecule. In another embodiment, the binding molecule is a drug -like small molecule.
  • the binding molecule modulates the activity of Src kinase.
  • the modulating is activating or inactivating.
  • the binding molecule increases the activity of Src kinase relative to the activity of the kinase in the absence of the binding molecule.
  • the binding molecule decreases the activity of Src kinase relative to the activity of the kinase in the absence of the binding molecule.
  • the modulating is inactivating.
  • the one or more target sites comprises a group of residues selected from the group consisting of: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A4II; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 &
  • the modulating is inactivating.
  • the one or more target sites comprises a group of residues selected from the group consisting of: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A4II; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 &
  • the modulating is activating.
  • the one or more target sites comprises a group of residues selected from the group consisting of: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475
  • the one or more target sites comprises a group of residues selected from the group consisting of: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475
  • nucleic acid encoding: a) the binding molecule of the invention; and/or b) the polypeptide of the invention.
  • the nucleic acid encodes a polypeptide according to the invention.
  • the nucleic acid encodes a binding molecule according to the invention, optionally wherein said binding molecule is a polypeptide or a nucleic acid.
  • the Src kinase binding molecule of the invention is a polypeptide or a nucleic acid based binding molecule (e.g., DNA or RNA), said binding molecule may be encoded by a nucleic acid molecule according to the foregoing.
  • the nucleic acid is RNA.
  • the nucleic acid is DNA.
  • the nucleic acid is: a) a DNA b) an RNA c) a DNA/RNA hybrid e) a modified DNA; or f) a modified RNA.
  • modified DNA or RNA it is meant a DNA or RNA comprising non-naturally occurring modifications (e.g., chemical groups, such as phosphorothioate intemucleoside linkages) as compared with DNA and RNA found in vivo.
  • non-naturally occurring modifications e.g., chemical groups, such as phosphorothioate intemucleoside linkages
  • RNA and DNA can be considered to encode RNAs, DNAs, and polypeptides.
  • a DNA can be amplified to produce further DNA, or transcribed to produce an RNA, optionally wherein said RNA is then translated to produce a polypeptide; an RNA can be reverse transcribed to form a DNA, translated into protein, or amplified into further RNAs.
  • the nucleic acid is modified, unmodified, naturally occurring or synthetic.
  • an expression cassette comprising the nucleic acid of the invention.
  • a vector comprising the nucleic acid or the expression cassette of the invention.
  • Expression cassettes and/or vectors that enable the transcription and/or translation of nucleotide sequences of interest are known in the art and may be selected by the person skilled in the art dependent upon application.
  • cell-type specific promotors may be chosen to restrict expression of a payload to certain cell types.
  • a cell comprising the binding molecule, the polypeptide, the nucleic acid, the expression cassette, or the vector the invention.
  • a cell comprising the binding molecule of the invention and/or a nucleic acid encoding the binding molecule of the invention.
  • a cell comprising the polypeptide of the invention, and/or a nucleic acid encoding the polypeptide of the invention.
  • a cell comprising the nucleic acid of the invention.
  • a cell comprising the expression cassette of the invention.
  • a cell comprising the vector of the invention.
  • a cell comprising any combination of the binding molecule, the polypeptide, the nucleic acid, the expression cassette, and/or the vector the invention.
  • cell comprising the nucleic acid, the expression cassette, or the vector of the invention.
  • the cell is a prokaryotic cell, optionally a bacterial cell.
  • the cell is a eukaryotic cell.
  • the cell is a yeast cell.
  • the cell is a mammalian cell, preferably a human cell.
  • the cell is an in vitro cell.
  • the cell may be a cell derived from a human, optionally a human suffering from a disease or disorder, or susceptible to a disease or disorder, or a human with no known disease or susceptibility thereto.
  • the cell is a human cell derived from a subject suffering from or susceptible to a disease.
  • said disease is a disease associated with Src kinase.
  • the cell is an ex vivo cell.
  • the cell may be a cell that is not directly taken from a subject or multicellular organism.
  • the cell is an in vivo cell.
  • the binding molecule, the polypeptide, the nucleic acid, the expression cassette, the vector, or the cell of the invention for use in a method of treating a disease.
  • treatment is intended to include therapeutic interventions that ameliorate a disease as well as curative treatments. Further, prophylactic use is also encompassed, such that said treatment may include treating a subject that is susceptible to a disease or is otherwise showing signs of progression towards a disease state without necessarily having symptoms of the disease or a clinical diagnosis of the disease.
  • binding molecule of the invention for use in a method of treating a disease.
  • polypeptide of the invention for use in a method of treating a disease.
  • nucleic acid of the invention for use in a method of treating a disease.
  • expression cassette of the invention for use in a method of treating a disease.
  • the vector of the invention for use in a method of treating a disease.
  • the binding molecule, the polypeptide, the nucleic acid, the expression cassette, the vector, or the cell of the invention for use in a method of treating a disease.
  • a method of treating a disease comprising administering the binding molecule, the polypeptide, the nucleic acid, the expression cassette, the vector, or the cell of the invention to a patient in need thereof.
  • a method of treating a disease comprising administering the binding molecule, the polypeptide, the nucleic acid, the expression cassette, or the vector to a patient in need thereof.
  • a method of treating a disease comprising administering the binding molecule of the invention to a patient in need thereof.
  • a method of treating a disease comprising administering the polypeptide of the invention to a patient in need thereof.
  • a method of treating a disease comprising administering the nucleic acid of the invention to a patient in need thereof.
  • a method of treating a disease comprising administering the expression cassette of the invention to a patient in need thereof.
  • a method of treating a disease comprising administering the vector of the invention to a patient in need thereof.
  • said administering is administering to a cell of said patient.
  • a method of treating a disease comprising administering the cell of the invention to a patient in need thereof.
  • binding molecule for the manufacture of a medicament for use in the treatment of a disease.
  • binding molecule of the invention for the manufacture of a medicament for use in the treatment of a disease.
  • polypeptide of the invention for the manufacture of a medicament for use in the treatment of a disease.
  • nucleic acid of the invention for the manufacture of a medicament for use in the treatment of a disease.
  • the binding molecule, polypeptide, nucleic acid, vector, expression cassette, or cell of the invention for use in a method of treating a disease, wherein the disease is selected from the group consisting of: cancer, rheumatoid arthritis, chronic kidney disease, central nervous system diseases, viral diseases, aging including skin aging pulmonary fibrosis, epilepsy, tuberculosis, cardiovascular disease, macrophage-mediated inflammatory disease and bone homeostasis.
  • a method of treating a disease comprising administering the binding molecule, polypeptide, nucleic acid, vector, expression cassette, or cell of the invention to a patient in need thereof, wherein said disease is selected from the group consisting of: cancer, rheumatoid arthritis, chronic kidney disease, central nervous system diseases, viral diseases, aging including skin aging pulmonary fibrosis, epilepsy, tuberculosis, cardiovascular disease, macrophage-mediated inflammatory disease and bone homeostasis.
  • the binding molecule, polypeptide, nucleic acid, vector, expression cassette, or cell of the invention for the manufacture of a medicament for use in the treatment of a disease wherein said disease is selected from the group consisting of: cancer, rheumatoid arthritis, chronic kidney disease, central nervous system diseases, viral diseases, aging including skin aging pulmonary fibrosis, epilepsy, tuberculosis, cardiovascular disease, macrophage-mediated inflammatory disease and bone homeostasis.
  • the disease is selected from the group consisting of: cancer, rheumatoid arthritis, chronic kidney disease, central nervous system diseases, viral diseases, pulmonary fibrosis, epilepsy, tuberculosis, cardiovascular disease, macrophage-mediated inflammatory disease and bone homeostasis diseases/disorders.
  • the disease is kidney disease.
  • the disease is chronic kidney disease.
  • the chronic kidney disease is selected from the group consisting of: renal fibrosis, glomerulonephritis, diabetic nephropathy, HIV-associated nephropathy, polycystic kidney disease and obesity-induced kidney disease.
  • the disease is a central nervous system disease.
  • the central nervous system disease is migraine or neuropathic pain.
  • the disease is a cardiovascular disease.
  • the cardiovascular disease is selected from the group consisting of: hypertension, heart disease, myocardial ischemia reperfusion injury, and arrhythmia
  • the disease is cancer.
  • the disease is a cancer associated with Src kinase activity.
  • the disease is a cancer in which Src kinase has developed resistance to one or more existing therapies.
  • the disease is an infectious disease.
  • the disease is a viral disease.
  • the disease is a bacterial disease.
  • the disease is tuberculosis (TB).
  • the methods and uses of the invention may be considered therapeutic.
  • the methods and uses of the invention may be considered non- therapeutic or cosmetic.
  • the patient may be considered to be a subject.
  • the methods and uses of the invention the subject has no known disease or disorder.
  • the methods and uses of the invention the subject has no known disease or disorder.
  • the methods and uses of the invention are employed to combat ageing, optionally skin ageing.
  • the term “component” is intended to be broadly construed as hardware, firmware, or a combination of hardware and software. It will be apparent that systems and/or methods described herein may be implemented in different forms of hardware, firmware, and/or a combination of hardware and software. The actual specialized control hardware or software code used to implement these systems and/or methods is not limiting of the implementations. Thus, the operation and behavior of the systems and/or methods are described herein without reference to specific software code - it being understood that software and hardware can be used to implement the systems and/or methods based on the description herein.
  • “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiple of the same item.
  • the terms “has,” “have,” “having,” or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based, at least in part, on” unless explicitly stated otherwise. Also, as used herein, the term “or” is intended to be inclusive when used in a series and may be used interchangeably with “and/or,” unless explicitly stated otherwise (e.g., if used in combination with “either” or “only one of’).
  • a computer readable medium may include non-transitory type media such as physical storage media including storage discs and solid state devices.
  • a computer readable medium may also or alternatively include transient media such as carrier signals and transmission media.
  • a computer- readable storage medium is defined herein as a non-transitory memory device.
  • a memory device includes memory space within a single physical storage device or memory space spread across multiple physical storage devices.
  • a computer-implemented method of training a machine learning model comprising: obtaining training data specifying, for a wild type variant of a target enzyme and each of a plurality of mutant variants of the target enzyme, each mutant variant having a different set of one or more mutations, an activity measure for the respective variant and a folding measure for the respective variant; based on the training data, training model parameters of a machine learning model to output, from input data specifying the set of one or more mutations in a given variant of the target enzyme, a predicted activity measure and a predicted folding measure for the given variant.
  • training the machine learning model comprises fitting a thermodynamic model of the target enzyme to the training data.
  • thermodynamic model comprises a three-state model of the target enzyme with unfolded, folded inactive, and folded active states.
  • model parameters comprise a first set of weights and a second set of weights
  • predicted activity measure depends on both the first set of weights and the second set of weights
  • predicted folding measure depends on the second set of weights but is independent of the first set of weights
  • a computer-implemented method of identifying one or more target sites of a target enzyme comprising: obtaining model parameters from a machine learning model trained in accordance with the method of any of clauses 1 to 23; based on the model parameters, identifying the one or more target sites of the target enzyme.
  • a computer-implemented method of identifying a mutated variant of interest of an enzyme comprising: for each of a plurality of mutated variants of a target enzyme, providing an input specifying mutations in the respective mutated variant to a machine learning model trained to output a predicted activity measure and a predicted folding measure for the mutated variant; receiving from the machine learning model a predicted activity measure and a predicted folding measure for each of the plurality of mutated variants; and based on the predicted activity measures and the predicted folding measures, selecting from the plurality of mutated variants at least one mutated variant of interest.
  • a method of generating training data for training a machine learning model comprising: performing wet lab experiments to obtain data for deriving an activity measure and a folding measure for a wild type variant of a target enzyme and each of a plurality of mutant variants of the target enzyme.
  • 37. The method of clause 36 comprising performing a solubility assay that provides a measure of the frequency of folding for each variant.
  • Embodiment 1 A method of modulating the activity of Src kinase comprising mutating one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514.
  • Embodiment 2 The method of embodiment 1, wherein one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven, twenty eight, twenty nine, thirty, thirty one, thirty two, thirty three, thirty four, thirty five, thirty six, thirty seven, thirty eight, thirty nine, forty, forty one, forty two, forty three, forty four, forty five, forty six, forty seven, forty eight, forty nine, fifty, fifty one, fifty two, fifty three, fifty four, fifty five, fifty six, fifty seven, or fifty eight residues is mutated.
  • Embodiment 3 The method of embodiment 1 or embodiment 2, wherein the one or more residues is selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509.
  • Embodiment 4 The method of embodiment 1 or embodiment 2, wherein the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514. Embodiment 5.
  • the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 &
  • Embodiment 7 The method of embodiment 5, wherein two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen or nineteen residues within a group of residues is mutated.
  • Embodiment 8 The method of embodiment 5 or embodiment 7, wherein all residues within a group of residues is mutated.
  • Embodiment 9 The method of any one of embodiments 5 to 8, wherein the one or more residues is selected from one group of residues.
  • Embodiment 10 The method of any one of embodiments 5 to 8, wherein the one or more residues is selected from two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven or twenty eight groups of residues.
  • Embodiment 11 The method of any one of the preceding embodiments, wherein the one or more residues is located within an allosteric site.
  • Embodiment 12 The method of any one of the preceding embodiments, wherein the modulating is activating or inactivating.
  • Embodiment 13 The method of embodiment 12, wherein the modulating is inactivating.
  • Embodiment 14 The method of embodiment 13, wherein the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A411; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E34
  • Embodiment 15 The method of embodiment 12, wherein the modulating is activating.
  • Embodiment 16 The method of embodiment 15, wherein the one or more residues is selected from one or more of the following groups of residues: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; and b) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
  • Embodiment 17 A binding molecule which binds to one or more target sites on Src kinase, wherein the one or more target sites comprises one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514.
  • Embodiment 18 The binding molecule of embodiment 17, wherein the one or more residues is selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509.
  • Embodiment 19 The binding molecule of embodiment 17, wherein the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514.
  • Embodiment 20 The binding molecule of embodiment 17, wherein the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514.
  • Embodiment 22 The binding molecule of any one of embodiments 17 to 21, wherein the one or more target sites is surface exposed.
  • Embodiment 23 The binding molecule of any one of embodiments 17 to 22, wherein the one or more target sites is druggable.
  • Embodiment 24 The binding molecule of any one of embodiments 17 to 23, wherein the binding molecule is a polypeptide, a nucleic acid, an antibody or a small molecule.
  • Embodiment 25 The binding molecule of any one of embodiments 17 to 24, wherein the binding molecule modulates the activity of Src kinase.
  • Embodiment 26 The binding molecule of embodiment 25, wherein the modulating is activating or inactivating.
  • Embodiment 27 The binding molecule of embodiment 26, wherein the modulating is inactivating.
  • Embodiment 28 The binding molecule of embodiment 27, wherein the one or more target sites comprises a group of residues selected from the group consisting of: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 &
  • Embodiment 29 The binding molecule of embodiment 26, wherein the modulating is activating.
  • Embodiment 30 The binding molecule of embodiment 29, wherein the one or more target sites comprises a group of residues selected from the group consisting of: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; and b) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338;
  • Embodiment 31 A polypeptide encoding a Src kinase variant, wherein the polypeptide comprises a mutation, relative to a wild-type Src kinase, at one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514.
  • Embodiment 32 The polypeptide of embodiment 31, wherein the one or more residues is selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509.
  • Embodiment 33 The polypeptide of embodiment 31, wherein the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514.
  • Embodiment 34 The polypeptide of any one of embodiments 31 to 33, wherein the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A411; e) V284, W285, A296, 1297, K298, V326, 1339
  • Embodiment 35 The polypeptide of embodiment 34, wherein one residue within a group of residues is mutated.
  • Embodiment 36 The polypeptide of embodiment 34, wherein two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen or nineteen residues within a group of residues is mutated.
  • Embodiment 37 The polypeptide of embodiment 34 or 36, wherein all residues within a group of residues is mutated.
  • Embodiment 38 The polypeptide of any one of embodiments 34 to 37, wherein the one or more residues is selected from one group of residues.
  • Embodiment 39 The polypeptide of any one of embodiments 34 to 38, wherein the one or more residues is selected from two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven or twenty eight groups of residues
  • Embodiment 40 The polypeptide of any one of embodiments 31 to 39, wherein the one or more residues is located within an allosteric site.
  • Embodiment 41 The polypeptide of any one of the embodiments 31 to 40, wherein the kinase activity of the polypeptide is modulated relative to the kinase activity of wild-type Src kinase.
  • Embodiment 42 The polypeptide of embodiment 41, wherein the modulation is an increase or a decrease in kinase activity relative to the kinase activity of wild-type Src kinase.
  • Embodiment 43 The polypeptide of embodiment 41 or embodiment 42, wherein the modulation is a decrease in kinase activity relative to the kinase activity of wild-type Src kinase.
  • Embodiment 44 The polypeptide of embodiment 43, wherein the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A411; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T34
  • Embodiment 45 The polypeptide of embodiment 41 or 42, wherein the modulation is an increase in kinase activity relative to the kinase activity of wild-type Src kinase.
  • Embodiment 46 The polypeptide of embodiment 45, wherein the one or more residues is selected from one or more of the following groups of residues: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; and b) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338;
  • Embodiment 47 A nucleic acid encoding: a) the binding molecule of any one of embodiments 17 to 30; and/or b) the polypeptide of any one of embodiments 31 to 46.
  • Embodiment 48 The nucleic acid of embodiment 47, wherein the nucleic acid is RNA or DNA.
  • Embodiment 49 The nucleic acid of embodiment 47 or embodiment 48, wherein the nucleic acid is modified, unmodified, naturally occurring or synthetic.
  • Embodiment 50 An expression cassette comprising the nucleic acid of any one of embodiments 47 to 49.
  • Embodiment 51 A vector comprising the nucleic acid of any one of embodiments 47 to 49 or the expression cassette of embodiment 50.
  • Embodiment 52 A cell comprising the binding molecule of any one of embodiments 17 to 30, the polypeptide of any one of embodiments 31 to 46, acid of any one embodiments 47 to 49, the expression cassette of embodiment 50, or the vector of embodiment 51.
  • Embodiment 53 The binding molecule of any one of embodiments 17 to 30, the polypeptide of any one of embodiments 31 to 46, the nucleic acid of any one of embodiments 47 to 49, the expression cassette of embodiment 50, the vector of embodiment 51, or the cell of embodiment 52, for use in a method of treating a disease.
  • Embodiment 54 A method of treating a disease comprising administering the binding molecule of any one of embodiments 17 to 30, the polypeptide of any one of embodiments 31 to 46, the nucleic acid of any one of embodiments 47 to 49, the expression cassette of embodiment 50, the vector of embodiment 51 or the cell of embodiment 52, to a patient in need thereof.
  • Embodiment 55 Use of the binding molecule of any one of embodiments 17 to 30, the polypeptide of any one of embodiments 31 to 46, the nucleic acid of any one of embodiments 47 to 49, the expression cassette of embodiment 50, the vector of embodiment 51 or the cell of embodiment 52, for the manufacture of a medicament for use in the treatment of a disease.
  • Embodiment 56 The binding molecule, polypeptide, nucleic acid, expression cassette, vector, or cell for use of embodiment 53, the method of embodiment 54 or the use of embodiment 55, wherein the disease is selected from the group consisting of: cancer, rheumatoid arthritis, chronic kidney disease, central nervous system diseases, viral diseases, aging including skin aging pulmonary fibrosis, epilepsy, tuberculosis, cardiovascular disease, macrophage-mediated inflammatory disease and bone homeostasis.
  • the disease is selected from the group consisting of: cancer, rheumatoid arthritis, chronic kidney disease, central nervous system diseases, viral diseases, aging including skin aging pulmonary fibrosis, epilepsy, tuberculosis, cardiovascular disease, macrophage-mediated inflammatory disease and bone homeostasis.
  • Embodiment 57 The binding molecule, polypeptide, nucleic acid, expression cassette, vector, or cell for use, the method or the use of embodiment 56, wherein the chronic kidney disease is selected from renal fibrosis, glomerulonephritis, diabetic nephropathy, HIV-associated nephropathy, polycystic kidney disease and obesity -induced kidney disease.
  • Embodiment 58 The binding molecule, polypeptide, nucleic acid, expression cassette, vector, or cell for use, the method or the use of embodiment 56, wherein the central nervous system disease is selected from migraine and neuropathic pain.
  • Embodiment 59 The binding molecule, polypeptide, nucleic acid, expression cassette, vector, or cell for use, the method or the use of embodiment 56, wherein the cardiovascular disease is selected from hypertension, heart disease, myocardial ischemia reperfusion injury, and arrhythmia.
  • the approach was applied to human oncoprotein Src.
  • the enzymatic activity of variants of Src is easy to quantify using a highly-validated cellular toxicity assay where the inhibition of cellular growth is directly proportional to the amount of Src-induced protein phosphorylation.
  • the solubility of Src can also be quantified in the same cells using a highly-validated protein abundance selection assay, abundancePCA (aPCA), that uses protein fragment complementation to quantify soluble protein concentration over at least three orders of magnitude.
  • aPCA abundancePCA
  • the active state modeled here is phenomenological, and designed to quantify all changes in activity not accounted for by changes in total soluble protein abundance. Although shifts in the equilibrium between inactive and active kinase conformations will be captured as changes in AG a , so too will other molecular mechanisms that affect catalytic activity (kcat) and substrate affinity (Km) independently of the conformational state of Src.
  • the data provides the first comprehensive measurement of how mutations affect the stability of the protein kinase fold in vivo and one of the largest sets of solubility measurements for any protein.
  • the Src KD is composed of two structurally and functionally distinct subdomains - the N- and C-lobes - with the active site located in the cleft between the two.
  • the N-lobe is mostly composed of beta strands and contains the ATP binding site, whereas the C-lobe is mostly alpha helical and ends in a disordered C-terminal tail that regulates the conformational state of the kinase.
  • Mutations have a wide range of effects, with many destabilizing (703 strongly destabilizing mutations with AAGf >0.5, p ⁇ 0.05, z-test) and a large number of moderately stabilizing variants (468 with AAGf ⁇ 0, p ⁇ 0.05, z-test).
  • the two structurally distinct lobes of the Src kinase domain thus also contribute differentially to the in vivo solubility of the domain, with the more dynamic ATP -binding N-lobe displaying a higher tolerance to mutagenesis than the larger and more compact C-lobe.
  • the data provides the first complete map of the effects of mutations on protein kinase activity independently of their effects on protein abundance.
  • Mutations in the beta sheet that forms the top surface of the ATP binding pocket have a striking alternating pattern of mutational effects, with substitutions of side chains pointing towards the nucleotide detrimental for activity and substitutions of side chains facing away from the active site not reducing activity or, in the case of 3 residues in the beta strands flanking the G-loop, actually increase activity.
  • the major allosteric sites include all 11 non-active site residues previously predicted to be part of an allosteric network that communicates between the substrate and ATP binding sites of Src ( Figure 7E; Foda et al., Nat. Commun. 6, 5939 (2015)). This network was predicted via analysis of changes in electrostatic and hydrophobic contacts between active and inactive conformations in molecular dynamics simulations. Of these 11 previously predicted allosteric positions, 8 are second shell residues.
  • Inhibitory mutations are concentrated in residues on the inner surface of aC, including E313 that engages in a salt bridge with K298 in the active state (Figure 7E), the R-spine residue M317, and the hydrophobic residues F310, A314 and L320.
  • the P4 and P5 strands located between aC and the active site are also enriched for allosteric mutations, but to a lesser extent (Figure 7C).
  • Inhibitory allosteric mutations are also abundant in the activation loop ( Figure 7C) including in Y419, which locks Src in the active state when phosphorylated.
  • inhibitory allosteric mutations are enriched in the aF helix that acts as an anchor for the catalytic (C) and regulatory (R) ‘spines’.
  • the C- and R-spines are two groups of residues that are not contiguous in the primary sequence of kinases but form a bipartite hydrophobic core in catalytically active kinases. Mutations in all sites of the R-spine have strong inactivating effects. In contrast, only the sites of the C-spine in direct contact with ATP (A296, V284, L396) are enriched in inactivating mutations. Mutations in the rest of the C-spine sites have small effects on AG a , and their strong effects on kinase activity at the fitness level are almost fully explained by a loss of fold stability.
  • Activating allosteric mutations [0374] In total, 11 residues outside of the active site are enriched for activating mutations, which were defined as major activating allosteric sites (OR>1, p ⁇ 0.05, FET). In the N-lobe, major activating allosteric sites include the gatekeeper residue T341, as well as its neighboring residue Y343. E283 that flanks the G-loop and forms a salt bridge with K275 to constrain the conformation of the G-loop, and E335 in the P4-J35 loop are also major activating allosteric sites.
  • Inhibitory allosteric communication in the Src KD is thus strongly distance dependent, but also anisotropic: transmission efficiency is dependent on the direction of propagation, with at least a 6-fold difference in decay rates between the most and least efficient directions.
  • the structure of the Src KD differs between its active and inactive states, with changes in the positioning of helix aC and the activation loop and multiple residue contact rearrangements in the active site and throughout the kinase domain.
  • PDB ID: 1Y57 contact patterns in active
  • inactive PDB ID: 2SRC
  • 4 types of residues were defined: active-only (engaging in contacts only in the active state), inactive-only (only in the inactive state), swapping (residues that have different contacts in the two states), and static (residues with the same contacts in both states).
  • Swapping residues include those forming Src’s ‘electrostatic switch network’ of contacts that change during activation: D407-K298, E313-R412 and D389-Y419 in the inactive state, that break and rearrange into E313-K298 and R412-Y419. Mutations in these residues are extremely detrimental for Src activity.
  • the allosteric map thus shows that residues with contacts that change upon activation are particularly important for Src activation and enables the prioritization of which of these dynamic contacts are most important for activation.
  • AAG a allostery
  • sequence and structural features Linear modeling was used to predict AAG a from simple features: the minimum heavy atom distance of the mutated residue to the nucleotide (AMP-PNP in PDB structure 2SRC) and to the catalytic D389, the identity of the wild-type and mutant aa, solvent accessibility, contact type and dynamics (active-only, inactive-only, swapping, and static), and secondary structure element type. Distance to the catalytic site and to the nucleotide are the most predictive features when tested individually.
  • a linear model combining all predictors explains 46% of the variance in AAG a (tested on held out data, 10-fold cross-validation), which increased further to 51% when incorporating specific secondary structure elements as a feature (tested on held out data, 10-fold cross- validation). Mutation effects on activity are thus reasonably well-predicted from simple structural features alone.
  • Structural analysis of Src identifies 28 unique potentially druggable surface pockets present in at least one of 15 different Src structures. To prioritize these pockets for drug development, the comprehensive atlas of mutational effects was used to annotate each of these surface pockets by testing for the enrichment of inhibitory and activating allosteric mutations (Figure 8A). In total, 17 Src pockets are enriched for inhibitory mutations (Fisher’s exact test, FDR ⁇ 0.05). These inhibitory pockets include the orthosteric ATP binding site targeted by competitive inhibitors. Beyond the orthosteric site, two other surface pockets of Src have been targeted by small molecule inhibitors: the DFG pocket, and P7.
  • both of these pockets are enriched for allosteric inhibitory mutations in the allosteric map, genetically validating their regulatory potential (Figure 8B).
  • Pockets that are allosteric in other kinases and also strongly allosteric in Src include Pl 1, homologous to the MT3 pocket in MEK1/2 targeted by type III allosteric inhibitors, P22, homologous to the AAS site in Aurora A, where one KD activates another through binding of its activation segment to this site, and P6, homologous to the PDIG pocket in CHK1 close to the substrate binding site that is bound by small molecule inhibitors ( Figure 8 A, C).
  • novel allosteric pockets include P4, P16 and P25 located between the allosteric aC helix and the active site, Pl formed by residues in the aC-p4 loop, aE and P8, P18 and P21 in the surface of the N-lobe beta sheet, and P2, P15 and P5 located on both sides of aEF and the substrate positioning loop (see Figure 8D for examples).
  • P4, P16 and P25 located between the allosteric aC helix and the active site
  • Pl formed by residues in the aC-p4 loop
  • aE and P8 P18 and P21 in the surface of the N-lobe beta sheet
  • P2, P15 and P5 located on both sides of aEF and the substrate positioning loop
  • the comprehensive mutational effect data therefore serves to genetically prioritize which of the many potentially druggable surface pockets of Src should be the focus for inhibitory and activatory drug discovery.
  • the allosterically active pockets in Src include highly druggable novel pockets not previously demonstrated as allosteric in any kinases ( Figure 8D).
  • Src like most kinases and eukaryotic proteins, is a multi-domain protein. In addition to the catalytic KD, Src contains two additional globular domains, SH2 and SH3, disordered linkers and the dynamic SH4 region. The non-catalytic domains of Src physically interact with the KD in its inactive conformation and inhibit activity. The abundance and activity selections for the same 54,455 Src variants was repeated in the context of the full length protein to investigate how the regulatory domains of Src affect allosteric communication in the catalytic domain.
  • mutations in the inter-domain surfaces with the SH2 domain and the SH2-KD linker have stronger activating effects in the full-length kinase, consistent with these intra-molecular interactions inhibiting kinase activity.
  • Mutations in the aF helix pocket proposed to bind the SH4 region for additional inhibition of Src activity, also more strongly activate full-length Src.
  • Mutations in the dynamic C-terminal tail of Src also differ in their effects between full-length Src and the KD alone. Mutations in Y530, the inhibitory phosphosite directly involved in the interaction with the SH2 domain, have stronger activating effects in full-length Src (AAAG a ⁇ -l, FDR ⁇ 0. 1), consistent with a release of the inhibitory interaction.
  • the adjacent E527, P528, and Q531 are similarly enriched for mutations with stronger activating effects in full-length Src.
  • Q529, P532, G533, and N535 are enriched for mutations with stronger inhibitory effects in full length Src (AAAG a >1, FDR ⁇ 0. 1).
  • Inhibitory mutations in the C-terminal tail’s interface with the SH2 domain include many changes to hydrophobic and aromatic residues, which may act by increasing the affinity of the tail for the Src SH2 domain.
  • pTB043 is based on the same backbone as the aPCA plasmids, and contains a construct where full length Src is fused to the DHFR3 fragment in its N-terminus, and to the DHFR1,2 fragment in its C-terminus.
  • pTB022 To assay activity-dependent toxicity of Src, pTB022, a plasmid based on the same backbone as the aPCA plasmids but containing a yeast GAL promoter to drive the expression of Nhel-Hindlll inserts not fused to any DHFR fragment or linker, was used.
  • the KD fragment and Src gene block were cloned on pTB022, resulting in pTBl 12 and pTB023, respectively.
  • the library was ordered as two IDT oPools (Pool 1 with block 1 and Pool 2 with blocks 2-5), containing all NNK single mutants in each of the 10 backgrounds of each block.
  • 2.5 ul 0. 1 uM oPool material was used as a template in a 100 ul Q5 high-fidelity 10 cycle PCR reaction.
  • Primers specific to the constant regions of each block were used.
  • the library was assembled on pTB 112 (KD) and on pTB023 (full-length). To do so, the plasmids were linearized with primers pointing outwards from the constant regions of each block so that each linearized vector had at least 20 nt of homology to the amplified oligo pool containing the variants.
  • Each library corresponding to each of the 5 blocks was transformed in triplicate, and with a coverage of ⁇ 100x or greater.
  • Three 500 mb YPDA cultures of late log phase .S', cerevisiae BY4741 cells (OD-0.8-1) were harvested in 50 mb Falcon tubes, each resuspended in 22 mb SORB medium and incubated for 30 min on a shaker at room temperature. 437.5 ul 10 mg/mL previously boiled (5 min, 100C) ssDNA was added to the cells, and the mix was separated in 5 aliquots of 4.3 mb in 50 mb Falcon tubes, one for each library block.
  • Cells from this culture were inoculated in 100 mL SC -URA/ADE +200 ug/ml MTX to select stably expressed Src variants.
  • the remaining input cells grown SC -URA/ADE were harvested and frozen for DNA extraction.
  • the reactions were column-purified (QIAquick PCR purification kit, QIAGEN), and 40 ng DNA were used as template for a PCR2 reaction with the standard i5 and i7 primers to add the remainder of the Illumina adapter sequences and the demultiplexing indices (dual indexing) unique to each sample.
  • This PCR2 was run for 8 cycles, and the resulting amplicons were run on a 2% agarose gel to quantify and pool the samples for joint purification, and to ensure the specificity of the amplification and check for any potential excess amplification problems.
  • the final libraries were size selected by gel electrophoresis.
  • the amplicons were subjected to Illumina paired end 2x150 sequencing on a NextSeq2000 instrument at the CRG Genomics facility.
  • MoCHI was used to fit two global mechanistic models, one for the Src KD and one for full- length Src, using the corresponding 10 aPCA and toxicity assay datasets (2 molecular phenotypes x 5 blocks) simultaneously, as described above.
  • the msir package was used to fit a loess smoothing curve and the residuals to the fit across different secondary structure element types were quantified.
  • the x,y,z directions as defined in the 2SRC pdb entry were used.
  • residues at a distance of 10 or less from the active site in the two remaining orthogonal directions were considered.
  • getcontacts https://getcontacts.github.io/
  • 2SRC inactive
  • active (1Y57) states Prior to defining contacts hydrogen atoms were added to the structures using the pymol h add method. Then get static contacts.py with parameters — itypes all was used.
  • salt bridges, pi-cation interactions, side chain-side chain hydrogen bonds, and side chainbackbone hydrogen bonds were considered, as the rest of contact types did not display conformational state specificity.
  • Contacts of the same type and between the same residues were collapsed into a single contact, and duplicated contacts annotated both as salt bridge and side chain-side chain hydrogen bond were collapsed as salt bridge.
  • Linear models were fit using the base R lm() function, using as predictors the wt and mutant aa, secondary structure type in which the mutation is located, the specific secondary structure element in which the mutation is located, log distance to D389 (catalytic site), log distance to the nucleotide (AMP- PNP in 2SRC), rSASA, contact type, and the residue type classification (active, inactive, swapping, both, or none) according to their contact patterns as described above. Models were evaluated on held out data using a 10-fold cross validation strategy.
  • the Kinase Atlas was used to retrieve all possible Src surface druggable pockets (Yueh et al., J. Med. Chem. 2019, 62, 14, 6512-6524).
  • the docking analyses of all 15 available Src structures was used, and each potential Src surface pocket was defined as the set of residues located at a minimum distance of 5 A from a cluster of docked molecules, resulting in a total of 384 pockets distributed across the 15 structures.
  • After filtering out pockets with a druggability score ⁇ 5 the remaining 254 Pockets were collpased into a final set of unique pockets, as many are present in multiple structures.
  • a pairwise distance matrix between all pockets was calculated, using as a distance metric 1 minus the Szymkiewicz-Simpson overlap coefficient.
  • a hierarchical clustering was applied to the distance matrix, resulting in 28 unique pockets.
  • the total number of structures in which each is found was summarised, and the average and maximum druggability across all structures was calculated.
  • the AAG a per residue was averaged, and the mean residue-averaged AAG a (mean AAG a ) calculated, maximum residue-averaged AAG a (max AAG a ), and minimum AAG a (min AAG a ).
  • the odds ratio of enrichment of each pocket in activating and inactivating mutations relative to the rest of the KD was also calculated, and its statistical significance tested using Fisher’s exact test.
  • Interdomain interfaces were defined as residues involved in direct contacts between the kinase domain and the regulatory domains (SH3, SH2, linker) using getcontacts, and excluding water bridges.
  • sites with at least 2 mutations with AAAG a ⁇ -1 and FDR ⁇ 0. 1 were selected and hierarchical clustering was applied to a matrix with their pairwise C a -C a distances.

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Data Mining & Analysis (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medicinal Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

A computer-implemented method of training a machine learning model is provided. The method comprises obtaining training data specifying, for a wild type variant of a target enzyme and each of a plurality of mutant variants of the target enzyme, each mutant variant having a different set of one or more mutations, an activity measure for the respective variant and a folding measure for the respective variant. The method also comprises, based on the training data, training model parameters of a machine learning model to output, from input data specifying the set of one or more mutations in a given variant of the target enzyme, a predicted activity measure and a predicted folding measure for the given variant.

Description

IDENTIFYING ALLOSTERIC SITES IN ENZYMES
FIELD OF THE INVENTION [0001] The invention relates to the field of enzymes.
BACKGROUND
[0002] Enzymes catalyse a diverse array of reactions that are critical to life. Due to the vast number of biological processes in which enzymes are implicated, they represent important targets for therapeutic or prophylactic interventions.
[0003] Most drugs inhibit enzyme activity by binding to active sites within the protein that are responsible for catalytic activity. The active sites of enzymes are often structurally conserved amongst related enzymes, e.g., those of an enzyme family, which makes it difficult to target a specific enzyme without also inducing off-target effects as a result of interactions with enzymes other than the intended target. For example, the human genome encodes 538 protein kinases; orthosteric inhibitors of such kinases may inhibit tens or even hundreds of different kinases. This lack of target specificity often results in undesirable off target effects and toxicity.
[0004] One means by which drug specificity can be increased and/or toxicity reduced is by targeting allosteric sites within an enzyme. Enzyme activity can be modulated, naturally or artificially, by perturbations of the enzyme at sites that are distant to the active site but which nonetheless influence the catalytic activity of the enzyme, i.e., allosteric sites. Since allosteric sites are typically less well conserved amongst related proteins, targeting these sites may lower cross-reactivity with other off-target enzymes and thereby improve the target specificity and/or reduce toxicity of a drug.
[0005] Further, the ability to target allosteric sites, or indeed multiple allosteric sites, on a protein may allow interventions that overcome drug resistance mutations that arise, e.g., in cancer.
[0006] Whilst the potential merits of targeting allosteric sites are known, suitable allosteric sites on target proteins are often not known or not well studied.
[0007] Protein kinases represent an important class of therapeutic drug targets that are implicated in numerous diseases, including cancer. Most known kinase inhibitors are orthosteric inhibitors that target conserved ATP binding pockets, which results in poor specificity and is susceptible to the occurrence of resistance mutations.
[0008] Src is an oncogenic protein kinase that is a particularly interesting drug target due to its role in cell regulation, cell growth, cell migration, and angiogenesis.
[0009] As is the case for all proteins, Src possesses myriad pockets, or sites, that have the potential to be allosteric modulators of Src activity. However, there is a paucity of information as to which, if any, of these pockets/sites are capable of allosteric modulation of Src activity.
[0010] The present inventors therefore sought to determine a complete allosteric map of Src kinase in order to identify allosteric sites on the protein that may be targeted to modulate Src activity. SUMMARY
[0011] Particular embodiments are set out in the independent claims. Various optional examples are set out in the dependent claims.
TRAINING
[0012] A first aspect of implementations described herein relates to a computer-implemented method of training a machine learning model. The method comprises obtaining training data specifying, for a wild type variant of a target enzyme and each of a plurality of mutant variants of the target enzyme, each mutant variant having a different set of one or more mutations, an activity measure for the respective variant and a folding measure for the respective variant; and based on the training data, training model parameters of a machine learning model to output, from input data specifying the set of one or more mutations in a given variant of the target enzyme, a predicted activity measure and a predicted folding measure for the given variant. An activity measure of a given variant is a representation of that variant’s catalytic activity level and may, for example, be derived from a frequency of being in a folded active state. A folding measure of a given variant is a representation of that variant’s solubility and may, for example, be derived from a frequency of folding. The folding measure may also be referred to as a solubility measure.
Input data
[0013] The input data may comprise a set of input elements corresponding to a given site of the given variant of the target enzyme, each input element specifying whether or not a specific mutation is present at the given site. A site is an amino acid position defined with respect to the wild type variant, and a mutation may comprise an amino acid substitution, an amino acid omission, or an amino acid insertion at that site. For example, for a given variant of the target enzyme, a site might be the xth amino acid position and there might be y possible mutations at that site: various amino acid substitutions, omission of the amino acid, or insertion of an amino acid immediately after the amino acid at that site. The input elements that correspond to the given site may be one-hot encoded for the specific mutations they specify. In a one-hot encoded vector, each element of the vector represents a specific mutation at a specific site (e.g. omission of the 213th amino acid) and takes a value of 1 if that mutation at that site is present and takes a value of zero if that mutation at that site is not present. The input data may comprise a non-zero bias term which is constant for the wild type variant and each mutant variant, otherwise the one-hot encoded vector for the wild type would have only zeros and would not influence the training of the machine learning model.
Thermodynamic model
[0014] Training the machine learning model may comprise fitting a thermodynamic model of the target enzyme to the training data. This enables the machine learning model to extract from the training data information characterising which mutations affect specific transitions of the target enzyme between thermodynamic states. The thermodynamic model may comprise a three-state model of the target enzyme with unfolded, folded inactive, and folded active states. In this case, the enzyme can transition from the unfolded state to the folded inactive state and vice versa, and from the folded inactive state to the folded active state and vice versa. The folding measure may be related to a probability of the variant being in either the folded inactive state or the folded active state. The folding measure may be dependent on a Gibbs free energy of folding which quantifies the partitioning of the enzyme molecules between the unfolded state and the inactive folded state. The activity measure may be related to a probability of being in the folded active state. The activity measure may be dependent on a Gibbs free energy of activity which quantifies the partitioning of the enzyme molecules between the inactive folded state and the active folded state. The Gibbs free energy of activity suitably comprises a pseudo free energy which quantifies all biophysical changes other than enzyme folding that alter enzyme activity. The folding measure is independent of the Gibbs free energy of activity. The activity measure is dependent on both the Gibbs free energy of activity and the Gibbs free energy of folding.
[0015] The model parameters may comprise a first set of weights and a second set of weights. In this case, the predicted activity measure may depend on both the first set of weights and the second set of weights, and the predicted folding measure may depend on the second set of weights but be independent of the first set of weights. This network architecture enables the first set of weights to represent the influence of a target site on enzyme activity for reasons other than enzyme folding and the second set of weights to represent the influence of a target site on enzyme activity for reasons of folding. The machine learning model may comprise a neural network. A first neuron of the neural network may generate a first neuron output value by processing the input data using a first set of weights, and a second neuron of the neural network may generate a second neuron output value by processing the input data using a second set of weights. The predicted activity measure may depend on both the first neuron output value and the second neuron output value. This reflects the three state thermodynamic model since the target enzyme must (1) transition from unfolded to folded inactive and (2) transition from folded inactive to folded active in order to reach the folded active state. The predicted activity measure may depend on at least one first activation function applied to the first neuron output value and the second neuron output value. The first activation function may be non-linear. The first activation function may be based on the Boltzmann distribution. The first activation function may have parameters that are trained during training of the neural network. The predicted folding measure may depend on the second neuron output value and be independent of the first neuron output value. This reflects the three state thermodynamic model since the target enzyme only needs to transition from the unfolded to the folded inactive state to arrive in a folded state, and it is irrelevant to the folding measure whether the enzyme additionally transitions from the folded inactive state to the folded active state. The predicted folding measure may depend on at least one second activation function applied to the second neuron output value. The second activation function may be non-linear. The second activation function may be based on the Boltzmann distribution. The second activation function may have parameters that are trained during training of the neural network.
IDENTIFYING TARGET SITES
[0016] A second aspect of implementations described herein relates to a computer-implemented method of identifying one or more target sites of a target enzyme. The method comprises obtaining model parameters from a machine learning model trained in accordance with the method of the first aspect; and based on the model parameters, identifying the one or more target sites of the target enzyme. [0017] The machine learning model may disambiguate whether a given site’s influence on a probability of the target enzyme being active is due to the target site influencing a probability of correct folding of the target enzyme or is due to other factors. This makes it possible to extract information relating to the target site’s influence on the activity of the already folded enzyme. If a target site is influential on the activity of the folded enzyme, it may be a promising candidate site to investigate as a potential allosteric site. Thus, the one or more target sites that are identified may be those predicted to influence the probability of the target enzyme being active due to reasons other than influencing a probability of correct folding. The one or more target sites may be selected based on a subset of the model parameters learnt in the training that express the contribution of each mutation towards the activity measure due to reasons other than influencing a probability of correct folding. This enables target sites influencing the activity of the already folded enzyme to be identified. The one or more target sites may be selected based on druggability. The selection of surface sites - i.e. locations on or near the outer surfaces of the folded target enzyme - may be favoured since they are more accessible to ligands. The model parameters may comprise a first set of weights and a second set of weights. In this case, the predicted activity measure may depend on both the first set of weights and the second set of weights; and the predicted folding measure may depend on the second set of weights but be independent of the first set of weights. This network architecture enables the first set of weights to represent changes in activity not caused by folding and the second set of weights to represent changes in activity caused entirely by folding. Thus, the first set of weights can be used to identify potentially allosteric sites that influence activity of the already folded target enzyme. The computer-implemented method may comprise generating an aggregate measure of the first set of weights for each of a plurality of given target sites. For example, a given target site may be represented by a subset of the first set of weights. In this case, an aggregate measure such as an average or total of the subset of the first set of weights may be generated to represent that site’s overall influence on the activity of the already folded target enzyme. The computer- implemented method may comprise selecting the one or more target sites based on ranking their aggregate measures. This may help to identify the most influential sites on the activity of the folded enzyme. The computer-implemented method may comprise selecting the one or more target sites by comparing their aggregate measures to a predefined threshold. This may help to identify sites with at least a minimum level of influence on the activity of the folded enzyme.
INFERRING UNTESTED VARIANTS
[0018] A third aspect of implementations described herein relates to a computer-implemented method of identifying a mutated variant of interest of an enzyme. The method comprises, for each of a plurality of mutated variants of a wild type enzyme, providing an input specifying mutations in the respective mutated variant to a machine learning model trained to output a predicted activity measure and a predicted folding measure for the mutated variant; receiving from the machine learning model a predicted activity measure and a predicted folding measure for each of the plurality of mutated variants; and based on the predicted activity measures and the predicted folding measures, selecting from the plurality of mutated variants at least one mutated variant of interest. This approach may be useful for identifying one or more mutated variants that have not been tested in wet lab experiments but whose predicted folding measures and predicted activity measures take desirable values, such as high activity measures for example.
GENERATING TRAINING DATA
[0019] A fourth aspect of implementations described herein relates to a method of generating training data fortraining a machine learning model. The method comprises performing wet lab experiments to obtain data for deriving an activity measure and a folding measure for a wild type variant of a target enzyme and each of a plurality of mutant variants of the target enzyme.
[0020] The method may comprise performing a solubility assay that provides a measure of the frequency of folding for each variant. The method may comprise deriving the folding measure from the frequency of folding. The method may comprise performing an activity assay that provides a measure of frequency of occurrence of mutated variants in an active thermodynamic state. The method may comprise deriving the activity measure from the frequency. The method may comprise performing an in vivo or in vitro activity assay that provides data for deriving the activity measure. Enzymatic activity and protein solubility can be quantified using any suitable assay. Thus, the approach can be used to quantify allosteric regulation in any enzyme, provided that both activity and solubility can be quantified at scale.
MISCELLANEOUS OPTIONAL FEATURES
[0021] In any of the first to fourth aspects, the target sites may be allosteric sites of the target enzyme. The active sites of enzymes are often structurally conserved amongst related enzymes. Thus, targeting the less well conserved allosteric sites within an enzyme may reduce off-target effects and/or toxicity. The target sites may be located within (or form all or part of) a structurally accessible surface pocket on the target enzyme. Identification of allosterically active surface pockets enables prioritization of said pockets for drug development.
[0022] In any of the first to fourth aspects, the target enzyme may be a protein kinase. In any of the first to fourth aspects, the protein kinase may be Src kinase. In some embodiments, the target enzyme is Src kinase having the amino acid sequence SEQ ID NO: 1.
BRIEF DESCRIPTION OF THE DRAWINGS
[0023] Examples of the present disclosure will be described with reference to the following drawings, in which:
Figure 1A is a flowchart showing a method of identifying allosteric sites in enzymes in accordance with techniques described herein.
Figure IB is a flowchart showing a method of identifying a mutated variant of interest of a target enzyme in accordance with techniques described herein.
Figure 2A is an overview of the toxicity selection assay to measure the protein kinase activity of Src kinase domain variants at scale, yes, yeast growth; no, yeast growth defect.
Figure 2B is an overview of the abundancePCA (aPCA) selection assay to measure in vivo abundance of Src kinase domain variants at scale, yes, yeast growth; no, yeast growth defect. DHF, dihydrofolate; THF, tetrahydrofolate. Figures 2C and 2D show the correlation of activity fitness measurements to in vivo phosphotyrosine levels, and abundance fitness measurements to in vivo Src levels, respectively (Ahler et al, Mol. Cell 74, 393-408.e20 (2019)).
Figure 3 is a flow chart showing a method of training a machine learning model in accordance with techniques described herein.
Figure 4A shows the three-state equilibrium and corresponding thermodynamic model. AGf, Gibbs free energy of folding; AGa, Gibbs free energy of the active state; Kf, folding equilibrium constant; Ka, inactive -active state equilibrium constant; pf, fraction folded; pfa, fraction folded and active; ff, nonlinear function of AGf; ffa, nonlinear function of AGf and AGa; R, gas constant; T, temperature in Kelvin.
Figure 4B is a schematic diagram showing a neural network that has been trained in accordance with techniques described herein.
Figure 5 is a flow chart showing a method of identifying one or more target sites of a target enzyme in accordance with techniques described herein.
Figures 6A to 6D are heat maps showing inferred changes in activity free energies (AAGa) (A and B) and folding free energies (A AGf) (C and D) for all 5,111 possible single substitution variants in the Src KD. Figures 6E and 6F show the structure of the Src KD coloured by the per-site weighted mean AAGf (PDB ID: 2SRC). The secondary structure elements most enriched in destabilizing mutations are annotated. Figure 6G shows the enrichment of destabilizing and stabilizing mutations in secondary structure elements of Src (Fisher’s exact test).
Figures 7A and 7B show the structure of the Src KD coloured by the per-site weighted mean AAGa (PDB ID: 2SRC).
Figure 7C shows the enrichment of inactivating and activating mutations in secondary structure elements (Fisher’s exact test).
Figure 7D is a heatmap showing predicted activity measures for mutant variants of Src kinase having mutations at the active site.
Figure 7E is a heatmap showing predicted activity measures for mutant variants of Src kinase having mutations at known allosteric sites.
Figure 8A is a summary of the regulatory impact and druggability properties of Src surface pockets. Mean AAGa = average of per-site averaged AAGa of all residues in the pocket. Max AAGa = maximum of per- site averaged AAGa of all residues in the pocket. Min AAGa = minimum of per-site averaged AAGa of all residues in the pocket. Pockets significantly enriched or depleted in activating and inactivating mutations (Fisher’s exact test FDR<0.05) are labeled with stars.
Figures 8B to 8D are heatmaps showing AAGa of mutations in Src previously targeted pockets (B), in Src pockets homologous to pockets known to be allosteric and/or targeted by drugs in other kinases (C), and in novel Src pockets (D).
Figure 9 illustrates a comparison of predicted activity of allosteric and non-allosteric sites.
Figure 10 a flowchart showing a method of identifying a mutated variant of interest of a target enzyme in accordance with techniques described herein.
Figure 11 is a block diagram showing a computer suitable for implementing techniques described herein. DETAILED DESCRIPTION
[0024] The following detailed description of example implementations refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements.
OVERVIEW
[0025] Herein there is described an approach for quantifying allosteric regulation in enzymes. The approach uses data from wet lab experiments relating to a target enzyme and uses this data extract insights relating to the allosteric landscape of the target enzyme. A comprehensive allosteric map of an enzyme can be produced in this way that identifies regions of allosteric activity, whether these regions exert detrimental control to the enzyme’s catalytic activity or whether they increase it, and the extent to which these regions affect the enzyme’s catalytic activity. The approach has been verified by accurately predicting previously known allosteric sites and can be used to identify previously unknown allosteric sites of a target enzyme.
[0026] Referring to Figure 1A, a method 100A of identifying allosteric sites of a target enzyme is shown. The target enzyme may suitably be a protein kinase such as Src kinase. The target enzyme may be one of the more than 500 protein kinases encoded in the human genome, including kinases having a role in cancer (such as Src/Abl family kinases, Raf kinases and PI3K kinases) or other diseases (such as DYRK family kinases). The enzyme may be from a species other than human. Wet lab experiments are performed 102 on a wild type of the target enzyme and on a set of variants of the enzyme. The variants have mutations at a range of sites of interest of the enzyme and the effects of the mutations on the enzyme’s catalytic activity and solubility are measured.
[0027] Preliminary data processing is performed 104 on the results from the wet lab experiments to produce an activity measure and a folding measure for the wild type and each mutated variant of the enzyme. The activity measure represents the variant’s catalytic activity and the folding measure represents the variant’s ability to fold and is derived from measurements from solubility assays. A machine learning model is then trained 106 to predict the activity and folding measures in order to fit a model of the enzyme to the experimental data. This training step is significantly facilitated by using training data relating not only to single amino acid mutations but also to double or combinatorial mutants. [0028] When the machine learning model has been trained with sufficient data, it is used to quantify the extent to which the mutations affect the enzyme’s catalytic activity independently of their effects on enzyme folding. As discussed further below, his enables target sites of the enzyme to be identified 108 that may be allosteric.
[0029] Referring to Figure IB, a method 100B of identifying a mutated variant of interest of a target enzyme is shown. The steps 102, 104 and 106 are in common with those of method 100A of Figure 1A. However, after the step of training 106 the machine learning model, the method 100B comprises identifying 110 mutated variants that were not tested in the wet lab experiments but which may have allosteric sites of interest. This provides another application of the techniques described below.
WET LAB EXPERIMENTS [0030] Wet lab experiments are performed to quantify the effects on catalytic activity and solubility of mutations at all sites of interest in an enzyme. For example, the activity and solubility may be measured for a library of enzyme variants comprising a substitution to all possible amino acids mutation at every site (i.e. a library size of 20 x number of sites of interest). The sites of interest may comprise the full length protein or catalytic domain. Mutagenesis may be achieved by site-directed mutagenesis or artificial gene synthesis.
[0031] The approach may be repeated on different genetic backgrounds in order to achieve doubly or multiply mutated variants. The genetic backgrounds may be chosen to provide a range of different enzyme activities due to changes in stability or catalytic activity. This aims to resolve ambiguities where a number of causal biophysical changes could account for an observed mutational effect and allows the inference of the in vivo biophysical effects of mutations.
[0032] Enzymatic activity and protein solubility can be quantified using any suitable assay provided that both activity and solubility can be quantified at scale. Such assays are known in the art.
[0033] For example, enzymatic activity may be quantified using a cellular toxicity assay, where the inhibition of cellular growth is directly proportional to the amount of protein phosphorylation induced by the enzyme (as shown in Figure 2A).
[0034] Enzymatic activity may be determined by complementation of thermosensitive alleles. This strategy uses yeast/bacteria strains containing mutations that affect fitness only at high temperatures (e.g. 42°C). The function is then rescued using the endogenous (or orthologous) WT gene, which is then mutated to select for function.
[0035] An alternative strategy makes use of the numerous auxotrophic markers or antibiotic/chemical resistance genes present in yeast and bacteria. Here, mutagenesis of the enzymes necessary for survival and/or growth in the absence or presence of specific components in the growth medium allows for functional selection.
[0036] The solubility of the enzyme may be quantified in the same cells using a protein abundance selection assay that uses protein fragment complementation to quantify soluble protein concentration over at least three orders of magnitude (e.g. AbundancePCA, as shown in Figure 2B).
[0037] In vitro assays to quantify enzyme activity and/or solubility are also envisaged, utilizing enzymatic reactions producing a signal (e.g. fluorescence, colour precipitate). Suitable assays are described in Markin et al., 2021, Science, 373(6553); Scheele et al., 2022, Nat Commun, 13(844): and Vanella et al., 2024, Nat Commun, 15(1807).
PRELIMINARY DATA PROCESSING
[0038] Data from the wet lab experiments is subject to preliminary processing in order to produce activity and folding measures that can be used as training data for the machine learning model. Sequencing data from wet lab experiments may suitably be provided in the form of FastQ fdes from paired-end sequencing of aPCA and toxicity experiments. The sequencing data is processed to generate fitness scores which can be used as the activity and folding measures and also to calculate errors in the fitness scores. A suitable approach for processing the sequencing data is described in: DiMSum: an error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies, Faure et al., Genome Biology (2020) 21:207 https://doi.org/10.! 186/sl 3059-020-02091-3.
TRAIN ML MODEL
[0039] When the preliminary data processing is complete, the resulting activity and folding measures are used to train a machine learning model. This process fits a model of the enzyme to the training data and enables the effects of the mutations on the enzyme’s catalytic activity to be determined independently of their effects on folding. A suitable model to train the machine learning model is described in: MoCHI: neural networks to fit interpretable models and quantify energies, energetic couplings, epistasis and allostery from deep mutational scanning data, Faure and Lehner, bioRxiv (2024) https ://doi. or g/ 10.1101/2024.01.21.575681.
[0040] Referring to Figure 3, a method 400 of training a machine learning model comprises obtaining 402 training data for a wild type variant of a target enzyme and for each of a plurality of mutant variants of the target enzyme. The training data specifies, for each variant, an activity measure and a folding measure for that variant. Each mutated variant has a different set of one or more mutations.
[0041] Model parameters of the machine learning model are trained 404, based on the training data, to output a predicted activity measure and a predicted folding measure for a given variant from input data specifying the set of one or more mutations of the given variant.
[0042] The input data may suitably specify for a given site of a given variant of the target enzyme whether a specific mutation of an amino acid is present at that site. For example, there may be mutation at that site that comprises an amino acid substitution, an amino acid omission or an amino acid insertion. Thus, the input data may comprise a set of input elements corresponding to a given site of the given variant of the target enzyme, each input element specifying whether or not a specific mutation is present at the given site. In this case, one-hot encoding may suitably be used to specify the mutated variants. For example, a mutated variant may be represented by a vector having a series of elements that take values of 0 or 1. Each element in the series represents whether a specific mutation (such as a specific amino acid substitution) is present at that site by taking a value of 1 to indicate the presence of that mutation and by taking a value of 0 to indicate the absence of that mutation. Thus, the input elements corresponding to the given site may be one-hot encoded for the specific mutations they specify. In the case of the wild type variant, there are no mutations and therefore every element in the one-hot encoded vector takes a value of 0. It is therefore suitable to provide a non-zero bias term as an additional element in the input data that is constant for the wild type variant and each mutant variant to ensure that the wild type variant is taken into account during training of the machine learning model.
[0043] Training the machine learning model fits a suitable model of the enzyme to the training data. For example, a thermodynamic model of the enzyme may be fitted to the training data. In this case, a suitable thermodynamic model may treat the enzyme as having distinct states depending on whether the enzyme is folded and whether, if folded, the enzyme is active. In such a model, the enzyme has three distinct states: unfolded, folded inactive and folded active. [0044] The enzyme can transition from the unfolded state to the folded inactive state and vice versa, and can transition from the folded inactive state to the folded active state and vice versa:
Unfolded Folded inactive Folded active
[0045] The energy change for the transition from the unfolded state to the folded inactive state is a Gibbs free energy of folding, Gf, and similarly the energy change for the transition from the folded inactive state to the folded active state is a Gibbs free energy of activity, AGa. These free energies determine the respective probabilities that the enzyme is in a folded or active state. The probability, Pf. of the enzyme being in a folded state (i.e. folded inactive or folded active) depends on A Gy and is independent of AGa, while the probability, pa. of the enzyme being in the folded active state depends on Gy and AGa. These probabilities and energies may be related as follows:
[0046] The probability of a variant of the enzyme being in a folded state is related to the predicted folding measure in the machine learning model, and the probability of the variant being in the active state is related to the predicted activity measure in the machine learning model.
[0047] Thermodynamic models are not typically used to model enzymes. Rather, enzymes are normally modelled as non-equilibrium systems, for example using Michaelis-Menten kinetics. [0048] Referring to Figure 4B, an example machine learning model comprises a neural network 500 that is configured to fit a thermodynamic model to the training data. The neural network 500 is trained to output, based on input data 502 that specifies one or more mutations of a given mutant variant of a target enzyme, a predicted activity measure 504 and a predicted folding measure 506 for the given mutant variant. The neural network 500 may be trained by any suitable method such as using a back propagation algorithm.
[0049] In this example, the input data 502 encodes the one or more mutations of the mutant variant using one-hot encoded amino acid sequences 508. Each element, xt, of the training data represents an ith specific mutation of the variant, i.e. a specific type of mutation and a specific position in the wildtype amino acid sequence. The element, xt, takes a value of 0 or 1 depending on whether the specific mutation is present at that position. A value of zero indicates that the specific mutation is not present at that position, while a value of 1 indicates that the specific mutation is present at that position. A given position in the amino acid sequence may correspond to one, two or more elements corresponding to different specific mutations that could be provided at that position (e.g. each specific mutation could be a replacement of the wildtype amino acid with an alternative amino acid at that position, omission of the wildtype amino acid at that position entirely, or insertion of an additional amino acid at that position). Thus, the one-hot encoded vector xn represents the mutations that are present in the given mutant variant. [0050] The input data also includes a non-zero bias term 510 that is constant for the wild type variant and each mutant variant in order that the wild type variant be included in the training data. For example, the non-zero bias term 510 may be 1. Otherwise, the neural network cannot take the wild type into account because all its elements, xt, in the one-hot encoded vector take values of zero.
[0051] During training, the neural network 500 learns model parameters using the training data. The architecture of the neural network is such that the model parameters comprise a first set of weights 512 and a second set of weights 514, and the predicted activity measure 504 depends on both the first set of weights 512 and the second set of weights 514, whereas the predicted folding measure 506 depends only on the second set of weights 514 and is independent of the first set of weights 512. This reflects the fact that, in the thermodynamic model, folding is independent of whether the variant is active or inactive, whereas for the variant to be active is must transition from unfolded to folded and additionally transition from folded inactive to folded active. Thus, the architecture of the neural network 500 is suitable for fitting a thermodynamic model to the training data. In this arrangement, it is assumed that the mutation effects of double or combinatorial mutants combine additively in latent space which represents free energies.
[0052] The first set of weights 512 and the second set of weights 514 correspond respectively to a first neuron and a second neuron of the neural network 500. Thus, the first neuron processes the input data 502 using the first set of weights 512 to generate a first neuron output 516, and the second neuron processes the input data 502 using the second set of weights 514 to generate a second neuron output 518.
[0053] In this example, the first neuron output 516 is a sum of the products of the weights of the first set of weights 512 and the corresponding elements of the input data 512. In particular, the product of the non-zero bias term 510 (which takes a value of 1 in this example) and the weight Gb0 is added to the product of the element x± and the weight Gbl, and so on:
[0054] Similarly, the second neuron output 518 is a sum of the products of the weights of the second set of weights 514 and the corresponding elements of the input data 512. In particular, the product of the non-zero bias term 510 (which takes a value of 1 in this example) and the weight Gf0 is added to the product of the element x± and the weight G^. and so on:
[0055] As can be seen from the architecture of the neural network 500, the predicted activity measure 504 depends on both the first neuron output 516 and the second neuron output 518, and by contrast the predicted folding measure 506 depends only on the second neuron output 518 and is independent of the first neuron output 516. [0056] In particular, the predicted activity measure 504 depends on at least one first activation function 520 applied to the first neuron output 516 and the second neuron output 518. Meanwhile, the predicted folding measure 506 depends on at least one second activation function 522 applied to the second neuron output 518.
[0057] In suitable examples, one or both of the at least one first activation function 520 and the at least one second activation function 522 may comprise a non-linear function. For example, this may be any arbitrary non-linear function inferred from the training data. Alternatively, one or both of the at least one first activation function 520 and the at least one second activation function 522 may suitably be based on the Boltzmann distribution. In either case, one or both of the at least one first activation function 520 and the at least one second activation function 522 may have parameters that are trained during training of the neural network 500. The non-linear activation functions 520, 522 could be defined by an equation, or by a look up table. Coefficients of the equation or the look up table could be static (e.g. determined with reference to the Boltzmann distribution, or dynamically trained based on the training data). In the case where the first activation function 520 is based on the Boltzmann distribution, a first linear activation function 524 receives an output from the first activation function 520 and outputs the predicted activity measure 504. Similarly, in the case where the second activation function 522 is based on the Boltzmann distribution, a second linear activation function 526 receives an output from the second activation function 522 and outputs the predicted folding measure 506. By contrast, when the first activation function 520 and the second activation function 522 are arbitrary non-linear functions, these linear transformations 524, 526 are not needed for generating the predicted activity measure 504 and the predicted folding measure 506.
IDENTIFY TARGET SITE(S)
[0058] Referring to Figure 5, a method 600 of identifying one or more target sites of a target enzyme comprises obtaining 602 model parameters from a machine learning model trained in accordance with the methods of this disclosure and, based on the model parameters, identifying 604 one or more target sites, which may comprise allosteric sites, of the target enzyme.
[0059] A site may be defined as a specific residue, for example the ith amino acid, in an amino acid sequence of the target enzyme. As such, a target site may be defined as a site of interest, such as a specific residue in an amino acid sequence of the target enzyme that has an allosteric effect on the target enzyme’s catalytic activity.
[0060] The machine learning model may suitably disambiguate whether a given site’s influence on a probability of the target enzyme being active is due to the target site influencing a probability of correct folding of the target enzyme or is due to other factors. For example, this may be achieved by a machine learning model that comprises the neural network 500 of Figure 4B.
[0061] In this or other examples, the target sites may be those predicted to influence the probability of the target enzyme being active due to reasons other than influencing a probability of correct folding. By isolating target sites that influence the enzyme’s catalytic activity for reasons other than correct folding, the sites that affect only the enzyme’s transition from the folded inactive state to the folded active state in the thermodynamic model may be identified. The sites that affect this particular transition are more likely to be allosteric sites.
[0062] The target sites may be selected based on a subset of the model parameters learnt in the training that express the contribution of each mutation towards the activity measure due to reasons other than influencing a probability of correct folding. For example, in the neural network 500 it is the first set of weights 512 that expresses the contribution of each mutation towards the activity measure 504 independently of the mutation’s effect on folding.
[0063] As such, model parameters of the machine learning model may in various examples comprise a first set of weights and a second set of weights, and the predicted activity measure may depend on both the first set of weights and the second set of weights, while the predicted folding measure may depend on the second set of weights but be independent of the first set of weights.
[0064] In these various examples, identifying the one or more target sites may comprise generating an aggregate measure of the first set of weights for each of a plurality of target sites. For example, in the neural network 500, each target site is represented by a sub-set of the first set of weights 512. If there are y possible mutations of the amino acid at the target site, then there will be y weights that represent that target site. An aggregate measure of the y weights, such as a total or average weight, may be generated to provide a representation of the overall effect of that site on the target enzyme’s catalytic activity.
[0065] For example, for the first target site represented by the one-hot encoded vector in Figure 4B, the aggregate measure may comprise the following average, where y is the number of possible mutations of that site:
[0066] In the case where an aggregate measure is generated for each of a plurality of target sites, the aggregate measures may be ranked to produce a list of target sites in order of their effect on catalytic activity. Additionally or alternatively, the aggregate measures may be compared to a predefined threshold to make a determination of whether the target sites are of interest.
[0067] Other factors may also be taken into account in the selection of target sites, for example druggability and/or whether the target sites are surface sites of the enzyme.
[0068] With reference to Figures 6 to 9, the machine learning model of the present disclosure generates reliable predictions for target enzymes.
[0069] Referring to Figures 7D, a machine learning model in accordance with the present disclosure predicts, in line with expectations, that active site mutations of the target enzyme Src kinase reduce catalytic activity.
[0070] In Figure 7D, each column of the heatmap represents a residue in the amino acid sequence of the enzyme. A subset of the enzyme is represented, in particular the ATP binding site, the catalytic loop, the Mg2+ positioning loop and the substrate positioning loop. The active site comprises amino acids that directly contact ATP, Mg2+ or the substrate peptide phosphosite, and these residues are marked with an asterisk at the top of their column. [0071] Each row of the heatmap represents a substitution mutation of the residue at that site. Each column therefore has one element that does not represent a mutation because the substitute amino acid is the same as the wild type residue at that site. These elements are marked with a dot.
[0072] The heatmap represents predicted activity measures for mutations that are detrimental to catalytic activity. The darker the element in the heatmap (corresponding to increases in the Gibbs free energy of activity representing, in the thermodynamic model, a greater barrier to the mutated variant becoming active), the more detrimental that specific mutation at that site is to catalytic activity. Mutations in the active site are overwhelmingly detrimental to kinase activity, with 225 out of 247 decreasing catalytic activity. Mutations in the active site are nearly 40 times more likely to decrease enzymatic activity than mutations elsewhere in the kinase. In this heatmap, the predicted activity measure is a predicted change in a Gibbs free energy of activation, AGa. and therefore values of the predicted change, AAGa, are positive.
[0073] The heatmap also represents predicted activity measures that increase catalytic activity. The lighter the element in the heatmap, the more that specific mutation at that site increases catalytic activity. There is only a minority of mutations that increase catalytic activity and none of these are active site mutations.
[0074] Referring to Figure 7E, a machine learning model in accordance with the present disclosure predicts, in line with previous predictions, that mutations in eleven non-active site residues of Src kinase reduce catalytic activity. These residues have previously been predicted to be part of an allosteric network that communicates between substrate and ATP binding sites.
[0075] Figure 7E shows predicted activity measures for mutations that are detrimental to catalytic activity. Mutations at these sites are almost all detrimental to catalytic activity, taking positive values of a Gibbs free energy of activity, Ga. Thus, the predictions of the machine learning model are consistent with previous predictions.
[0076] Figure 7E also shows predicted activity measures that increase catalytic activity. Only one mutation at the sites shown increases catalytic activity, taking a negative value of a Gibbs free energy of activity, A Ga.
[0077] Figure 9 shows a series of boxplots that summarise predictions made by a machine learning model in accordance with the present disclosure. The boxplots represent average predicated activity measures for various sites on the enzyme and summarise the types of predictions that are shown in more detail in the heat maps described above. In this example, the activity measures comprise predicted changes in a Gibbs free energy of activation, AGa. Sites that are part of the active site are represented by box 902, while sites predicted to be allosteric are represented by box 904 and sites predicted to be non- allosteric are represented by box 906. For comparison, average predicted changes in the Gibbs free energy of activation, AGa for previously described allosteric sites are represented by box 908. The mean change in Gibbs free energy of activation, AGa, for the sites predicted to be allosteric are comparable with those of the previously described allosteric sites, thus providing evidence of the effectiveness of the predictions of the machine learning model. It is also noteworthy that the model predicts the greatest average increases in the Gibbs free energy of activation for mutations at the active site, in line with expectations. IDENTIFY MUTATED VARIANT(S)
[0078] Referring to Figure 10, the present disclosure extends to a method 1000 of identifying a mutated variant of interest of a target enzyme. This approach uses a machine learning model that has been trained to predict an activity measure and a folding measure for a variant of a target enzyme in accordance with techniques disclosed herein. However, instead of using the machine learning model to identify target sites of the target enzyme, the machine learning model is used to identify mutant variants of interest which may be untested in wet lab experiments.
[0079] Thus, the method 1000 comprises, for each of a plurality of mutated variants of a wild type enzyme, providing 1002 an input specifying mutations in the mutated variant to a machine learning model trained to output a predicted activity measure and a predicted folding measure for the mutated variant. The input may for example be one-hot encoded as described elsewhere herein.
[0080] The method comprises receiving 1004 from the machine learning model a predicted activity measure and a predicted folding measure for each of the plurality of mutated variants.
[0081] The method further comprises, based on the predicted activity measures and the predicted folding measures, selecting 1006 from the plurality of mutated variants at least one mutated variant of interest. For example, the at least one mutated variant of interest may be selected based on predefined thresholds for the predicted folding measure and the predicted activity measure. The selected variants may be untested in the lab, and thus the machine learning model may help identify mutated variants of interest that are suitable for wet lab experimentation.
[0082] Figure 11 shows a computer apparatus 1100 suitable for implementing methods and according to the present disclosure. The apparatus 1100 comprises a processor 1102, an input-output device 1104, a communications portal 1106 and computer memory 1108. The memory 1108 may store code that, when executed by the processor 1102, causes the apparatus 1100 to perform any of the computer-implemented methods disclosed herein.
SRC KINASE
[0083] Proto-oncogene tyrosine-protein kinase Src (aka proto-oncogene c-Src, or c-Src), herein referred to as Src or Src kinase, is a non-receptor tyrosine kinase that belongs to the family of Src family kinases.
[0084] Wild-type Src kinase comprises an SH2 domain, SH3 domain, and tyrosine kinase domain, the latter of which is responsible for catalysing the phosphorylation of specific target tyrosine residues in other tyrosine kinases.
[0085] Wild-type human Src kinase comprises the amino acid sequence set forth in SEQ ID NO: 1 herein.
Wild-type Src kinase amino acid sequence (SEQ ID NO: 1)
MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAEPKLF GGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGERLQIVNNTEGDWWLAHSLS TGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDF DNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGLCHRLTTVCPTSKPQTQGL AKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKL RHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLKGETGKYLRLPQLVDMAAQIASGMAYVERM NYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSD VWSFGILLTELTTKGRVPYPGMVNREVLDQVERGYRMPCPPECPESLHDLMCQCWRKEPEERPT FEYLQAFLEDYFTSTEPQYQPGENL
[0086] It will be understood that all references to Src sequence positions herein refer to the wild-type Src kinase amino acid sequence as set out in SEQ ID NO: 1 herein.
[0087] Any and all references to positions within Src kinase are intended to encompass the equivalent positions/residues in an analogous sequence, even if said sequence is not identical to SEQ ID NO: 1. For example, where a sequence is a truncated or elongated sequence relative to SEQ ID NO: 1, E283 will refer to the glutamic acid that is present in equivalent position regardless if it is the residue is number 283 when counted from the N-terminus.
[0088] The person skilled in the art is able to determine equivalent positions in analogous sequences, e.g., using widely available alignment tools such as BLAST or EMBOSS, or manually.
MODULATING THE ACTIVITY OF SRC KINASE
[0089] In accordance with the inventors discovery of allosteric sites within Src kinase, provided herein is a method of modulating the activity of Src kinase comprising mutating one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514. [0090] By modulating the activity of Src kinase it is meant the alteration of kinase activity arising from a variant of Src as compared with the kinase activity of the unmodified, wild-type Src. It will be understood that encompassed within the present invention are variants of the wild-type and variant Src proteins that include, for example, tags for enabling purification or identification. The person skilled in the art would understand that a comparison of activity could be carried out using equivalent constructs that differ in the amino acids present within the Src sequence region defined by SEQ ID NO: 1. For example, a Src kinase comprising SEQ ID NO: 1 in addition to a protein tag (e.g., a GFP tag) may be compared with a variant Src kinase comprising a modified version of SEQ ID NO: 1 in addition to the same tag.
[0091] Any suitable assay may be used to reliably compare the activity of Src kinase.
[0092] In one embodiment of the present invention, the method of modulating the activity of Src kinase comprises mutating one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven, twenty eight, twenty nine, thirty, thirty one, thirty two, thirty three, thirty four, thirty five, thirty six, thirty seven, thirty eight, thirty nine, forty, forty one, forty two, forty three, forty four, forty five, forty six, forty seven, forty eight, forty nine, fifty, fifty one, fifty two, fifty three, fifty four, fifty five, fifty six, fifty seven, or fifty eight residues.
[0093] In another embodiment, the method of modulating the activity of Src kinase comprises mutating one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven, twenty eight, twenty nine, thirty, thirty one, thirty two, thirty three, thirty four, thirty five, thirty six, thirty seven, thirty eight, thirty nine, forty, forty one, forty two, forty three, forty four, forty five, forty six, or forty seven residues.
[0094] In another embodiment, the method of modulating the activity of Src kinase comprises mutating one, two, three, four, five, six, seven, eight, nine, ten, or eleven residues.
[0095] In another embodiment, the method of modulating the activity of Src kinase comprises mutating one residue.
[0096] The present inventors have generated an array of single point mutants of Src kinase, which enabled determination of the allosteric effects of said residue/site.
[0097] Herein, the term site is taken to mean a region of Src kinase; the site may comprise or consist of a single amino acid residue or more than one residue (e.g., a group of residues). Where the site comprises more than one amino acid, the amino acids may be continuous or discontinuous in the primary sequence or on the surface of the folded protein (i.e., comprising a continuous or discontinuous patch or site).
[0098] Individual residues within Src kinase were found to have no allosteric effect, an inhibitory effect, or an activating effect. Herein an inhibitory effect refers to a reduction in kinase activity; an activating effect to an increase in kinase activity; and no effect to no change in activity, all as compared with the wild-type Src kinase.
[0099] In one embodiment, the one or more residues is selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509. [0100] In one embodiment, the one or more residues is selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509, and wherein mutation of said residue(s) results in the inhibition or reduction of Src kinase activity.
[0101] In one embodiment, the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514.
[0102] In one embodiment, the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514, and wherein mutation of said residue(s) results in an increase of Src kinase activity.
[0103] In some embodiments only one residue from the groups is mutated.
[0104] In some embodiments two residue from the groups are mutated.
[0105] Src kinase may be modified such that groups of residues are mutated or modified; these groups may form regions on the protein surface and may represent pockets or patches that have an allosteric effect on Src activity.
[0106] In one embodiment, the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 &
A411; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408; f) C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471; g) K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410; h) L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465; i) M317, K318, K319, L328, V331 & 1339; j) V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412; k) L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, 1339 & V340; l) R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442; m) L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407; n) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; o) M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408; p) V386, R388, R412, L413, 1414, E415, N417, E418, Y419, Y439, G440, R441, F442 & T443; q) 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502; r) W289, R321, Q327, L328, Y329, A330 & V331; s) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338; t) L270, W289, T293, R294, V295, Y329, A330 & V340; u) L272, E273, V274, K275, L276, V284, R294, V295, A296, 1297 & Y343; v) R422, Q423, 1429, V470, N471, R472, E473, V474, L475 & D476; w) T293, R294, V295, Y329, T341, E342, Y343, M344 & S345; x) F352, L353, K354, G358, K359, Y360, L361, R362, L366, E457, L458, T459, T460, K461, G462, P488 & E489; y) E268, S269, L270, R271, L272 & Y338; z) G481, R483, D496, M498, C499, Q500, C501, W502, R503, K504, E505 & E508; aa) L361, R362, L363, L366, L458, T459, T460, K461, R463, P487, P488, E489, C490, P491 & L494; and/or bb) P364, V367, D368, A371, F518, L519, E520, D521, Y522, F523 & T524. [0107] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447.
[0108] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L4I0 & A4I I.
[0109] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: R388, P428, 1429, K430, T432, A436, A437 & F442.
[0110] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A4I I.
[oni] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408.
[0112] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471.
[0113] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410.
[0114] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465. [0115] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: M317, K318, K319, L328, V331 & 1339.
[0116] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412.
[0117] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, I339 & V340.
[0118] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442.
[0119] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407.
[0120] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476. [0121] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408.
[0122] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V386, R388, R412, L413, 1414, E415, N417, E418, Y419, Y439, G440, R441, F442 & T443.
[0123] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502.
[0124] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: W289, R321, Q327, L328, Y329, A330 & V331.
[0125] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
[0126] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L270, W289, T293, R294, V295, Y329, A330 & V340.
[0127] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L272, E273, V274, K275, L276, V284, R294, V295, A296, 1297 & Y343.
[0128] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of R422, Q423, 1429, V470, N471, R472, E473, V474, L475 & D476.
[0129] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: T293, R294, V295, Y329, T341, E342, Y343, M344 & S345.
[0130] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: F352, L353, K354, G358, K359, Y360, L361, R362, L366, E457, L458, T459, T460, K461, G462, P488 & E489.
[0131] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: E268, S269, L270, R271, L272 & Y338.
[0132] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: G481, R483, D496, M498, C499, Q500, C501, W502, R503, K504, E505 & E508.
[0133] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L361, R362, L363, L366, L458, T459, T460, K461, R463, P487, P488, E489, C490, P491 & L494.
[0134] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: P364, V367, D368, A371, F518, L519, E520, D521, Y522, F523 & T524.
[0135] In one embodiment, only one residue within a group of residues is mutated.
[0136] In one embodiment, more than one residue within a group of residues is mutated.
[0137] In another embodiment, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen or nineteen residues within a group of residues is mutated. [0138] In a further embodiment, all residues within a group of residues is mutated.
[0139] As such, any number of residues within the groups defined herein may be modified or mutated. [0140] In one embodiment, the one or more residues is selected from one group of residues, as defined herein.
[0141] In another embodiment, the one or more residues is selected from more than one group of residues, as defined herein.
[0142] In another embodiment, the one or more residues is selected from two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven or twenty eight groups of residues.
[0143] As such, the modified or mutated residues may be selected from any number of groups defined herein.
[0144] In one embodiment, the one or more residues is an allosteric site.
[0145] In one embodiment, the one or more residues is located within an allosteric site.
[0146] As set out herein the activity of Src kinase may be modulated. Said modulation refers to the increasing (activating) or decreasing (inactivating) of Src kinase activity as compared with a wild-type control.
[0147] In one embodiment, the modulating is activating or inactivating.
[0148] In one embodiment, the modulating is inactivating.
[0149] In one embodiment, the method of the invention decreases the kinase activity of Src kinase, relative to unmodified Src kinase.
[0150] In one embodiment, the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A411; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408; f) C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471; g) K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410; h) L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465; i) M317, K318, K319, L328, V331 & 1339; j) V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412; k) L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, 1339 & V340; l) R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442; m) L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407; n) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475
& D476; o) M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408; p) 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502; and q) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
[0151] In one embodiment, there is provided a method of decreasing the activity of Src kinase comprising mutating one or more residues selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 &
A411; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408; f) C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471; g) K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410; h) L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465; i) M317, K318, K319, L328, V331 & 1339; j) V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412; k) L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, 1339 & V340; l) R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442; m) L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407; n) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475
& D476; o) M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408; p) 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502; and q) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
[0152] In one embodiment, the modulating is activating.
[0153] In one embodiment, the method of the invention increases the kinase activity of Src kinase, relative to unmodified Src kinase.
[0154] In one embodiment, the one or more residues is selected from one or more of the following groups of residues: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; and b) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
[0155] In one embodiment, there is provided a method of increasing the activity of Src kinase comprising mutating one or more residues selected from one or more of the following groups of residues: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; and b) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
[0156] Any of the residues herein may be mutated to constitute a variant comprising any other naturally or non-naturally occurring amino acids.
[0157] In one embodiment said mutation is a conservative mutation.
[0158] In one embodiment said mutation is a non-conservative mutation.
[0159] In one embodiment, there is provided a method of modulating the activity of Src kinase comprising mutating one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514; to one or more of the amino acids selected from the group consisting of: A, R, N, D, C, E, Q, G, H, I, L, K, M, F, P, S, T, W, Y, and V.
[0160] In one embodiment, the method is an in vitro method.
[0161] In one embodiment, the method is an in vivo method.
[0162] In one embodiment, the method is an ex vivo method. In one embodiment, the method is performed ex vivo.
[0163] It will be understood that mutating one or more residues may refer to mutational changes made directly at the amino acid level or mutational changes made at a nucleic acid level, e.g., by altering the nature of the codon encoding the corresponding Src kinase residue.
[0164] Said mutating may result in a stable, heritable change, e.g., as a result of mutation of genomic DNA, such as in a cell; or it may result in a transient change, e.g., as a result of the introduction of mRNA encoding a Src kinase comprising said mutated residues.
[0165] In one embodiment, the mutating is carried out by means of mutating the nucleic acid sequence encoding the one or more residues.
[0166] Thus, in one embodiment, there is provided a method of modulating the activity of Src kinase comprising mutating one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514, wherein said mutating is comprises the step of altering the nucleic acid sequence encoding said residue(s). [0167] Mutating the one or more residues according to the invention may be carried out by any suitable means.
[0168] Said mutation may be carried out, at the nucleic acid level, by PCR-based techniques, TALEN based gene editing, CRISPR/CAS based gene editing, etc.
[0169] In one embodiment, the mutating is carried out using CRISPR/CAS based gene editing technologies, or variants thereof.
[0170] In one embodiment, there is provided a Src kinase produced according to the method of the invention.
[0171] It will be understood that whilst the effect of modifying a single amino acid may have a specific effect, the cumulative effect of modifying more than one amino acid or a group of amino acids including said single amino acid may be different from the effect of the single amino acid alone. For example, a single point mutation that allosterically increases Src kinase activity may form part of a region or pocket that, when modified altogether, allosterically decreases Src kinase activity, and vice versa.
[0172] Herein, where selections from a group may be made it is to be understood that, unless otherwise stated, said selections are independent of one another.
SRC KINASE VARIANTS
[0173] Src kinase may be modified or mutated to produce variants of Src kinase, as compared to a wild-type Src kinase. Said variants may have modified activities relative to the wild-type Src kinase. [0174] In one aspect, there is provided a polypeptide encoding a Src kinase variant, wherein the polypeptide comprises a mutation, relative to a wild-type Src kinase, at one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514. [0175] In one embodiment, the one or more residues is selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509. [0176] In one embodiment, the one or more residues is selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509, and wherein the polypeptide has reduced activity relative to a wild-type Src kinase.
[0177] In one embodiment, the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514.
[0178] In one embodiment, the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514, and wherein the polypeptide has increased activity relative to a wild-type Src kinase. [0179] In one embodiment, the polypeptide comprises a mutation, relative to a wild-type Src kinase, at one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven, twenty eight, twenty nine, thirty, thirty one, thirty two, thirty three, thirty four, thirty five, thirty six, thirty seven, thirty eight, thirty nine, forty, forty one, forty two, forty three, forty four, forty five, forty six, forty seven, forty eight, forty nine, fifty, fifty one, fifty two, fifty three, fifty four, fifty five, fifty six, fifty seven, or fifty eight residues.
[0180] In one embodiment, the polypeptide comprises a mutation, relative to a wild-type Src kinase, at one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven, twenty eight, twenty nine, thirty, thirty one, thirty two, thirty three, thirty four, thirty five, thirty six, thirty seven, thirty eight, thirty nine, forty, forty one, forty two, forty three, forty four, forty five, forty six, or forty seven residues.
[0181] In one embodiment, the polypeptide comprises a mutation, relative to a wild-type Src kinase, at one, two, three, four, five, six, seven, eight, nine, ten, or eleven residues.
[0182] In one embodiment, the polypeptide comprises a mutation, relative to a wild-type Src kinase, at only one residue.
[0183] In one embodiment, the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A4II; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A4I I; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408; f) C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471; g) K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410; h) L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465; i) M317, K318, K319, L328, V331 & 1339; j) V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412; k) L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, 1339 & V340; l) R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442; m) L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407; n) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; o) M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408; p) V386, R388, R412, L413, 1414, E415, N417, E418, Y419, Y439, G440, R441, F442 & T443; q) 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502; r) W289, R321, Q327, L328, Y329, A330 & V331; s) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338; t) L270, W289, T293, R294, V295, Y329, A330 & V340; u) L272, E273, V274, K275, L276, V284, R294, V295, A296, 1297 & Y343; v) R422, Q423, 1429, V470, N471, R472, E473, V474, L475 & D476; w) T293, R294, V295, Y329, T341, E342, Y343, M344 & S345; x) F352, L353, K354, G358, K359, Y360, L361, R362, L366, E457, L458, T459, T460, K461, G462, P488 & E489; y) E268, S269, L270, R271, L272 & Y338; z) G481, R483, D496, M498, C499, Q500, C501, W502, R503, K504, E505 & E508; aa) L361, R362, L363, L366, L458, T459, T460, K461, R463, P487, P488, E489, C490, P491 & L494; and bb) P364, V367, D368, A371, F518, L519, E520, D521, Y522, F523 & T524.
[0184] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447.
[0185] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411.
[0186] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: R388, P428, 1429, K430, T432, A436, A437 & F442.
[0187] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A411.
[0188] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408.
[0189] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471.
[0190] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410. [0191] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465. [0192] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: M317, K318, K319, L328, V331 & 1339.
[0193] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412.
[0194] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, I339 & V340.
[0195] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442.
[0196] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407.
[0197] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476.
[0198] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408.
[0199] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: V386, R388, R412, L413, 1414, E415, N417, E418, Y419, Y439, G440, R441, F442 & T443.
[0200] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502.
[0201] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: W289, R321, Q327, L328, Y329, A330 & V331.
[0202] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
[0203] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L270, W289, T293, R294, V295, Y329, A330 & V340.
[0204] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L272, E273, V274, K275, L276, V284, R294, V295, A296, 1297 & Y343.
[0205] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of R422, Q423, 1429, V470, N471, R472, E473, V474, L475 & D476. [0206] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: T293, R294, V295, Y329, T341, E342, Y343, M344 & S345.
[0207] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: F352, L353, K354, G358, K359, Y360, L361, R362, L366, E457, L458, T459, T460, K461, G462, P488 & E489.
[0208] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: E268, S269, L270, R271, L272 & Y338.
[0209] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: G481, R483, D496, M498, C499, Q500, C501, W502, R503, K504, E505 & E508.
[0210] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: L361, R362, L363, L366, L458, T459, T460, K461, R463, P487, P488, E489, C490, P491 & L494.
[0211] In one embodiment, the one or more residues is one, more than one, and/or all of the residues in the group consisting of: P364, V367, D368, A371, F518, L519, E520, D521, Y522, F523 & T524.
[0212] In one embodiment, the polypeptide comprises a mutation at one residue within a group of residues.
[0213] In one embodiment, the polypeptide comprises a mutation at more than one residue within a group of residues.
[0214] In one embodiment, the polypeptide comprises a mutation at two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen or nineteen residues within a group of residues.
[0215] In a further embodiment, the polypeptide comprises a mutation at all residues within a group of residues.
[0216] In one embodiment, the polypeptide comprises a mutation to one or more residues selected from one group of residues, as defined herein.
[0217] In another embodiment, the polypeptide comprises a mutation to one or more residues selected from more than one group of residues, as defined herein.
[0218] In another embodiment, the polypeptide comprises a mutation to one or more residues selected from two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven or twenty eight groups of residues.
[0219] In one embodiment, the one or more residues is located within an allosteric site.
[0220] In one embodiment, the kinase activity of the polypeptide is modulated relative to the kinase activity of wild-type Src kinase.
[0221] In one embodiment, the modulation is an increase or a decrease in kinase activity relative to the kinase activity of wild-type Src kinase.
[0222] In one embodiment, the modulation is a decrease in kinase activity relative to the kinase activity of wild-type Src kinase. [0223] In one embodiment, the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 &
A411; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408; f) C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471; g) K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410; h) L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465; i) M317, K318, K319, L328, V331 & 1339; j) V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412; k) L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, 1339 & V340; l) R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442; m) L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407; n) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475
& D476; o) M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408; p) 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502; and q) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
[0224] In one embodiment, the modulation is an increase in kinase activity relative to the kinase activity of wild-type Src kinase.
[0225] In one embodiment, the one or more residues is selected from one or more of the following groups of residues: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475
& D476; and b) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338;
[0226] Any of the residues herein may be mutated to constitute a variant comprising any other naturally or non-naturally occurring amino acids.
[0227] In one embodiment the variant comprises a mutation that constitutes a conservative mutation.
[0228] In one embodiment the variant comprises a mutation that constitutes a non-conservative mutation. [0229] In one embodiment the polypeptide encoding a Src kinase variant comprises a mutation, relative to a wild-type Src kinase, that constitutes a mutation of any one or more of E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and/or Y514 to one or more of the amino acids selected from the group consisting of: A, R, N, D, C, E, Q, G, H, I, L, K, M, F, P, S, T, W, Y, and V.
[0230] In accordance with the invention herein providing a method of modulating the activity of Src kinase comprising mutating one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514, there is also provided herein polypeptides encoding a Src kinase variant produced according to the method of the invention.
[0231] A variant may also be referred to as a modified or mutated Src kinase, in which each will be understood to be a variant or modification of, or mutated relative to a wild-type Src kinase, e.g., comprising or consisting of SEQ ID NO: 1.
SRC KINASE BINDING MOLECULES
[0232] In accordance with the discovery of sites that allosterically regulate Src kinase activity, the present invention also provides binding molecules that target said allosteric sites. Said molecules may modulate the activity of Src kinase, e.g., wild-type Src kinase.
[0233] In one aspect of the invention there is provided a binding molecule which binds to one or more target sites on Src kinase, wherein the one or more target sites comprises one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514. [0234] In one embodiment there is provided a binding molecule which binds to one or more target sites on Src kinase, wherein the one or more target sites comprises one or more residues selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509.
[0235] In one embodiment there is provided a binding molecule which binds to one or more target sites on Src kinase, wherein the one or more target sites comprises one or more residues selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514.
[0236] In one embodiment there is provided a binding molecule which binds to one or more target sites on Src kinase, wherein the one or more target sites comprises one or more residues selected from the group consisting of: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 &
A411; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408; f) C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471; g) K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410; h) L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465; i) M317, K318, K319, L328, V331 & 1339; j) V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412; k) L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, 1339 & V340; l) R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442; m) L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407; n) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; o) M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408; p) V386, R388, R412, L413, 1414, E415, N417, E418, Y419, Y439, G440, R441, F442 & T443; q) 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502; r) W289, R321, Q327, L328, Y329, A330 & V331; s) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338; t) L270, W289, T293, R294, V295, Y329, A330 & V340; u) L272, E273, V274, K275, L276, V284, R294, V295, A296, 1297 & Y343; v) R422, Q423, 1429, V470, N471, R472, E473, V474, L475 & D476; w) T293, R294, V295, Y329, T341, E342, Y343, M344 & S345; x) F352, L353, K354, G358, K359, Y360, L361, R362, L366, E457, L458, T459, T460, K461, G462, P488 & E489; y) E268, S269, L270, R271, L272 & Y338; z) G481, R483, D496, M498, C499, Q500, C501, W502, R503, K504, E505 & E508; aa) L361, R362, L363, L366, L458, T459, T460, K461, R463, P487, P488, E489, C490, P491 & L494; and bb) P364, V367, D368, A371, F518, L519, E520, D521, Y522, F523 & T524.
[0237] In one embodiment, the one or more target sites is one or more allosteric sites.
[0238] In one embodiment, the one target sites is an allosteric site. [0239] In one embodiment, the one or more target sites is surface exposed.
[0240] In one embodiment, the one or more target sites is solvent accessible.
[0241] In one embodiment, at least one of the one or more target sites is solvent accessible.
[0242] In one embodiment, the one or more target sites is partially solvent accessible.
[0243] In one embodiment, the one or more target sites form a pocket or patch on the surface of Src kinase.
[0244] In one embodiment, the one or more target sites is accessible for binding by one or more binding molecules.
[0245] In one embodiment, the one or more target sites is accessible for binding by one or more binding molecules by a lock and key, or an induced fit mechanism.
[0246] The target sites herein may be targeted by a binding molecule. Said binding molecules may have therapeutic benefit.
[0247] In one embodiment, the one or more target sites is druggable.
[0248] In one embodiment, the binding molecule is a small molecule or a biologic.
[0249] In one embodiment, the binding molecule is a polypeptide.
[0250] Any reference to a polypeptide or protein herein is to be understood as interchangeable.
[0251] In one embodiment, the binding molecule is an antibody or a derivative thereof, optionally a nanobody, a Fab fragment, a scFv, or the like.
[0252] In one embodiment, the binding molecule is an antibody mimetic, such as an affibody.
[0253] In another embodiment, the binding molecule is a DARPIN.
[0254] In one embodiment, the binding molecule is a nucleic acid.
[0255] In one embodiment, the binding molecule is an aptamer.
[0256] In one embodiment, the binding molecule is: a) a DNA b) an RNA c) a DNA/RNA hybrid e) a modified DNA; or f) a modified RNA.
[0257] By modified DNA or RNA it is meant a DNA or RNA comprising non-naturally occurring modifications (e.g., chemical groups, such as phosphorothioate intemucleoside linkages) as compared with DNA and RNA found in vivo.
[0258] In one embodiment, the binding molecule is a small molecule. In another embodiment, the binding molecule is a drug -like small molecule.
[0259] In one embodiment, the binding molecule modulates the activity of Src kinase.
[0260] In one embodiment, the modulating is activating or inactivating.
[0261] In one embodiment, the binding molecule increases the activity of Src kinase relative to the activity of the kinase in the absence of the binding molecule.
[0262] In one embodiment, the binding molecule decreases the activity of Src kinase relative to the activity of the kinase in the absence of the binding molecule. [0263] In one embodiment, the modulating is inactivating.
[0264] In one embodiment, the one or more target sites comprises a group of residues selected from the group consisting of: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A4II; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 &
A4II; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408; f) C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471; g) K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410; h) L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465; i) M317, K318, K319, L328, V331 & 1339; j) V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412; k) L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, 1339 & V340; l) R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442; m) L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407; n) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475
& D476; o) M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408; p) 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502; and q) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
[0265] In one embodiment, the modulating is inactivating.
[0266] In one embodiment, wherein the binding molecule causes decreased Src kinase activity, the one or more target sites comprises a group of residues selected from the group consisting of: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A4II; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 &
A4II; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408; f) C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471; g) K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410; h) L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465; i) M317, K318, K319, L328, V331 & 1339; j) V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412; k) L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, 1339 & V340; l) R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442; m) L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407; n) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; o) M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408; p) 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502; and q) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
[0267] In one embodiment, the modulating is activating.
[0268] In one embodiment, the one or more target sites comprises a group of residues selected from the group consisting of: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475
& D476; and b) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
[0269] In one embodiment, wherein the binding molecule causes increased Src kinase activity, the one or more target sites comprises a group of residues selected from the group consisting of: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475
& D476; and b) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
NUCLEIC ACIDS
[0270] In one aspect, provided herein is a nucleic acid encoding: a) the binding molecule of the invention; and/or b) the polypeptide of the invention.
[0271] In one embodiment, the nucleic acid encodes a polypeptide according to the invention.
[0272] In one embodiment, the nucleic acid encodes a binding molecule according to the invention, optionally wherein said binding molecule is a polypeptide or a nucleic acid.
[0273] It will be understood that, where the Src kinase binding molecule of the invention is a polypeptide or a nucleic acid based binding molecule (e.g., DNA or RNA), said binding molecule may be encoded by a nucleic acid molecule according to the foregoing.
[0274] In one embodiment, the nucleic acid is RNA.
[0275] In one embodiment, the nucleic acid is DNA. [0276] In one embodiment, the nucleic acid is: a) a DNA b) an RNA c) a DNA/RNA hybrid e) a modified DNA; or f) a modified RNA.
[0277] By modified DNA or RNA it is meant a DNA or RNA comprising non-naturally occurring modifications (e.g., chemical groups, such as phosphorothioate intemucleoside linkages) as compared with DNA and RNA found in vivo.
[0278] It will be understood that both RNA and DNA can be considered to encode RNAs, DNAs, and polypeptides. For example, a DNA can be amplified to produce further DNA, or transcribed to produce an RNA, optionally wherein said RNA is then translated to produce a polypeptide; an RNA can be reverse transcribed to form a DNA, translated into protein, or amplified into further RNAs.
[0279] In one embodiment, the nucleic acid is modified, unmodified, naturally occurring or synthetic.
[0280] In another aspect, there is provided an expression cassette comprising the nucleic acid of the invention.
[0281] In a further aspect, there is provided a vector comprising the nucleic acid or the expression cassette of the invention.
[0282] Expression cassettes and/or vectors that enable the transcription and/or translation of nucleotide sequences of interest are known in the art and may be selected by the person skilled in the art dependent upon application. For example, cell-type specific promotors may be chosen to restrict expression of a payload to certain cell types.
CELLS
[0283] In one aspect, there is provided a cell comprising the binding molecule, the polypeptide, the nucleic acid, the expression cassette, or the vector the invention.
[0284] In one embodiment, there is provided a cell comprising the binding molecule of the invention and/or a nucleic acid encoding the binding molecule of the invention.
[0285] In one embodiment, there is provided a cell comprising the polypeptide of the invention, and/or a nucleic acid encoding the polypeptide of the invention.
[0286] In one embodiment, there is provided a cell comprising the nucleic acid of the invention.
[0287] In one embodiment, there is provided a cell comprising the expression cassette of the invention.
[0288] In one embodiment, there is provided a cell comprising the vector of the invention.
[0289] Given that the presence of the aforementioned features in a cell is not mutually exclusive, also provided is a cell comprising any combination of the binding molecule, the polypeptide, the nucleic acid, the expression cassette, and/or the vector the invention.
[0290] In one embodiment, there is provided cell comprising the nucleic acid, the expression cassette, or the vector of the invention.
[0291] In one embodiment, the cell is a prokaryotic cell, optionally a bacterial cell.
[0292] In one embodiment, the cell is a eukaryotic cell. [0293] In one embodiment the cell is a yeast cell.
[0294] In some embodiments the cell is a mammalian cell, preferably a human cell.
[0295] In one embodiment, the cell is an in vitro cell.
[0296] The cell may be a cell derived from a human, optionally a human suffering from a disease or disorder, or susceptible to a disease or disorder, or a human with no known disease or susceptibility thereto.
[0297] In one embodiment, the cell is a human cell derived from a subject suffering from or susceptible to a disease. In some embodiments, said disease is a disease associated with Src kinase. [0298] In one embodiment, the cell is an ex vivo cell.
[0299] The cell may be a cell that is not directly taken from a subject or multicellular organism.
[0300] In one embodiment, the cell is an in vivo cell.
THERAPEUTIC AND NON-THERAPEUTIC METHODS AND USES
[0301] The methods, polypeptides, binding molecules, nucleic acids, expression cassettes, vectors, and cells of the invention have therapeutic and non-therapeutic utility.
[0302] In one aspect, there is provided the binding molecule, the polypeptide, the nucleic acid, the expression cassette, the vector, or the cell of the invention for use in a method of treating a disease. [0303] As used herein treatment is intended to include therapeutic interventions that ameliorate a disease as well as curative treatments. Further, prophylactic use is also encompassed, such that said treatment may include treating a subject that is susceptible to a disease or is otherwise showing signs of progression towards a disease state without necessarily having symptoms of the disease or a clinical diagnosis of the disease.
[0304] In one embodiment, there is provided the binding molecule of the invention for use in a method of treating a disease.
[0305] In one embodiment, there is provided the polypeptide of the invention for use in a method of treating a disease.
[0306] In one embodiment, there is provided the nucleic acid of the invention for use in a method of treating a disease.
[0307] In one embodiment, there is provided expression cassette of the invention for use in a method of treating a disease.
[0308] In one embodiment, there is provided the vector of the invention for use in a method of treating a disease.
[0309] In one embodiment, there is provided the binding molecule, the polypeptide, the nucleic acid, the expression cassette, the vector, or the cell of the invention for use in a method of treating a disease. [0310] In one aspect, there is provided a method of treating a disease comprising administering the binding molecule, the polypeptide, the nucleic acid, the expression cassette, the vector, or the cell of the invention to a patient in need thereof.
[0311] In one embodiment, there is provided a method of treating a disease comprising administering the binding molecule, the polypeptide, the nucleic acid, the expression cassette, or the vector to a patient in need thereof. [0312] In one embodiment, there is provided a method of treating a disease comprising administering the binding molecule of the invention to a patient in need thereof.
[0313] In one embodiment, there is provided a method of treating a disease comprising administering the polypeptide of the invention to a patient in need thereof.
[0314] In one embodiment, there is provided a method of treating a disease comprising administering the nucleic acid of the invention to a patient in need thereof.
[0315] In one embodiment, there is provided a method of treating a disease comprising administering the expression cassette of the invention to a patient in need thereof.
[0316] In one embodiment, there is provided a method of treating a disease comprising administering the vector of the invention to a patient in need thereof.
[0317] In one embodiment, said administering is administering to a cell of said patient.
[0318] In one embodiment, there is provided a method of treating a disease comprising administering the cell of the invention to a patient in need thereof.
[0319] In one aspect, there is provided use of the binding molecule, the polypeptide, the nucleic acid, the expression cassette, the vector, or the cell of the invention for the manufacture of a medicament for use in the treatment of a disease.
[0320] In one embodiment, there is provided use of the binding molecule of the invention for the manufacture of a medicament for use in the treatment of a disease.
[0321] In one embodiment, there is provided use of the polypeptide of the invention for the manufacture of a medicament for use in the treatment of a disease.
[0322] In one embodiment, there is provided use of the nucleic acid of the invention for the manufacture of a medicament for use in the treatment of a disease.
[0323] In one embodiment, there is provided use of the vector of the invention for the manufacture of a medicament for use in the treatment of a disease.
[0324] In one embodiment, there is provided use of expression cassette of the invention for the manufacture of a medicament for use in the treatment of a disease.
[0325] In one embodiment, there is provided use of the cell of the invention for the manufacture of a medicament for use in the treatment of a disease.
[0326] In embodiment, there is provided the binding molecule, polypeptide, nucleic acid, vector, expression cassette, or cell of the invention for use in a method of treating a disease, wherein the disease is selected from the group consisting of: cancer, rheumatoid arthritis, chronic kidney disease, central nervous system diseases, viral diseases, aging including skin aging pulmonary fibrosis, epilepsy, tuberculosis, cardiovascular disease, macrophage-mediated inflammatory disease and bone homeostasis. [0327] In embodiment, there is provided a method of treating a disease comprising administering the binding molecule, polypeptide, nucleic acid, vector, expression cassette, or cell of the invention to a patient in need thereof, wherein said disease is selected from the group consisting of: cancer, rheumatoid arthritis, chronic kidney disease, central nervous system diseases, viral diseases, aging including skin aging pulmonary fibrosis, epilepsy, tuberculosis, cardiovascular disease, macrophage-mediated inflammatory disease and bone homeostasis. [0328] In one aspect, there is provided use of the binding molecule, polypeptide, nucleic acid, vector, expression cassette, or cell of the invention for the manufacture of a medicament for use in the treatment of a disease wherein said disease is selected from the group consisting of: cancer, rheumatoid arthritis, chronic kidney disease, central nervous system diseases, viral diseases, aging including skin aging pulmonary fibrosis, epilepsy, tuberculosis, cardiovascular disease, macrophage-mediated inflammatory disease and bone homeostasis.
[0329] In one embodiment of the methods and uses herein, the disease is selected from the group consisting of: cancer, rheumatoid arthritis, chronic kidney disease, central nervous system diseases, viral diseases, pulmonary fibrosis, epilepsy, tuberculosis, cardiovascular disease, macrophage-mediated inflammatory disease and bone homeostasis diseases/disorders.
[0330] In one embodiment of the methods and uses herein, the disease is kidney disease.
[0331] In one embodiment of the methods and uses herein, the disease is chronic kidney disease.
[0332] In one embodiment of the methods and uses herein, the chronic kidney disease is selected from the group consisting of: renal fibrosis, glomerulonephritis, diabetic nephropathy, HIV-associated nephropathy, polycystic kidney disease and obesity-induced kidney disease.
[0333] In one embodiment of the methods and uses herein, the disease is a central nervous system disease.
[0334] In one embodiment of the methods and uses herein, the central nervous system disease is migraine or neuropathic pain.
[0335] In one embodiment of the methods and uses herein, the disease is a cardiovascular disease.
[0336] In one embodiment of the methods and uses herein, the cardiovascular disease is selected from the group consisting of: hypertension, heart disease, myocardial ischemia reperfusion injury, and arrhythmia
[0337] In one embodiment of the methods and uses herein, the disease is cancer.
[0338] In one embodiment of the methods and uses herein, the disease is a cancer associated with Src kinase activity.
[0339] In one embodiment of the methods and uses herein, the disease is a cancer in which Src kinase has developed resistance to one or more existing therapies.
[0340] In some embodiments, the disease is an infectious disease.
[0341] In one embodiment, the disease is a viral disease.
[0342] In one embodiment, the disease is a bacterial disease.
[0343] In one embodiment, the disease is tuberculosis (TB).
[0344] In some embodiments, the methods and uses of the invention may be considered therapeutic.
[0345] In some embodiments, the methods and uses of the invention may be considered non- therapeutic or cosmetic.
[0346] In instances where the methods and uses of the invention may be considered non-therapeutic, the patient may be considered to be a subject.
[0347] In some embodiments, the methods and uses of the invention, the subject has no known disease or disorder. [0348] In some embodiments, the methods and uses of the invention, the subject has no known disease or disorder.
[0349] In some embodiments, the methods and uses of the invention are employed to combat ageing, optionally skin ageing.
STANDARD PARAGRAPHS
[0350] The foregoing disclosure provides illustration and description, but is not intended to be exhaustive or to limit the implementations to the precise forms disclosed. Modifications and variations may be made in light of the above disclosure or may be acquired from practice of the implementations. [0351] As used herein, the term “component” is intended to be broadly construed as hardware, firmware, or a combination of hardware and software. It will be apparent that systems and/or methods described herein may be implemented in different forms of hardware, firmware, and/or a combination of hardware and software. The actual specialized control hardware or software code used to implement these systems and/or methods is not limiting of the implementations. Thus, the operation and behavior of the systems and/or methods are described herein without reference to specific software code - it being understood that software and hardware can be used to implement the systems and/or methods based on the description herein.
[0352] Even though particular combinations of features are recited in the claims and/or disclosed in the specification, these combinations are not intended to limit the disclosure of various implementations. In fact, many of these features may be combined in ways not specifically recited in the claims and/or disclosed in the specification. Although each dependent claim listed below may directly depend on only one claim, the disclosure of various implementations includes each dependent claim in combination with every other claim in the claim set. As used herein, a phrase referring to “at least one of’ a list of items refers to any combination of those items, including single members. As an example, “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiple of the same item.
[0353] No element, act, or instruction used herein should be construed as critical or essential unless explicitly described as such. Also, as used herein, the articles “a” and “an” are intended to include one or more items, and may be used interchangeably with “one or more.” Further, as used herein, the article “the” is intended to include one or more items referenced in connection with the article “the” and may be used interchangeably with “the one or more.” Furthermore, as used herein, the term “set” is intended to include one or more items (e.g., related items, unrelated items, or a combination of related and unrelated items), and may be used interchangeably with “one or more.” Where only one item is intended, the phrase “only one” or similar language is used. Also, as used herein, the terms “has,” “have,” “having,” or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based, at least in part, on” unless explicitly stated otherwise. Also, as used herein, the term “or” is intended to be inclusive when used in a series and may be used interchangeably with “and/or,” unless explicitly stated otherwise (e.g., if used in combination with “either” or “only one of’).
[0354] A computer readable medium may include non-transitory type media such as physical storage media including storage discs and solid state devices. A computer readable medium may also or alternatively include transient media such as carrier signals and transmission media. A computer- readable storage medium is defined herein as a non-transitory memory device. A memory device includes memory space within a single physical storage device or memory space spread across multiple physical storage devices.
NUMBERED CLAUSES
[0355] Further examples of feature combinations taught by the present disclosure are set out in the following numbered clauses:
1. A computer-implemented method of training a machine learning model, the method comprising: obtaining training data specifying, for a wild type variant of a target enzyme and each of a plurality of mutant variants of the target enzyme, each mutant variant having a different set of one or more mutations, an activity measure for the respective variant and a folding measure for the respective variant; based on the training data, training model parameters of a machine learning model to output, from input data specifying the set of one or more mutations in a given variant of the target enzyme, a predicted activity measure and a predicted folding measure for the given variant.
2. The computer-implemented method of clause 1, wherein the input data comprises a set of input elements corresponding to a given site of the given variant of the target enzyme, each input element specifying whether or not a specific mutation is present at the given site.
3. The computer-implemented method of clause 2, wherein the input elements corresponding to the given site are one-hot encoded for the specific mutations they specify.
4. The computer-implemented method of clause 1, 2 or 3, wherein the input data comprises a nonzero bias term which is constant for the wild type variant and each mutant variant.
5. The computer-implemented method of any preceding clause, wherein training the machine learning model comprises fitting a thermodynamic model of the target enzyme to the training data.
6. The computer-implemented method of clause 5, wherein the thermodynamic model comprises a three-state model of the target enzyme with unfolded, folded inactive, and folded active states.
7. The computer-implemented method of clause 6, wherein the folding measure is related to a probability of the variant being in either the folded inactive state or the folded active state.
8. The computer-implemented method of clause 7, wherein the folding measure is dependent on a Gibbs free energy of folding.
9. The computer-implemented method of clause 6, 7 or 8, wherein the activity measure is related to a probability of being in the folded active state.
10. The computer-implemented method of clause 9, wherein the activity measure is dependent on a Gibbs free energy of activity.
11. The computer-implemented method of any preceding clause, wherein: the model parameters comprise a first set of weights and a second set of weights; the predicted activity measure depends on both the first set of weights and the second set of weights; and the predicted folding measure depends on the second set of weights but is independent of the first set of weights.
12. The computer-implemented method of any preceding clause, wherein the machine learning model comprises a neural network.
13. The computer-implemented method of clause 12, wherein: a first neuron of the neural network generates a first neuron output value by processing the input data using a first set of weights; and a second neuron of the neural network generates a second neuron output value by processing the input data using a second set of weights.
14. The computer-implemented method of clause 13, wherein the predicted activity measure depends on both the first neuron output value and the second neuron output value.
15. The computer-implemented method of clause 14, wherein the predicted activity measure depends on at least one first activation function applied to the first neuron output value and the second neuron output value.
16. The computer-implemented method of clause 15, wherein the first activation function is nonlinear.
17. The computer-implemented method of clause 15 or 16, wherein the first activation function is based on the Boltzmann distribution.
18. The computer-implemented method of clause 15, 16 or 17, wherein the first activation function has parameters that are trained during training of the neural network.
19. The computer-implemented method of any of clauses 13 to 18, wherein the predicted folding measure depends on the second neuron output value and is independent of the first neuron output value.
20. The computer-implemented method of clause 19, wherein the predicted folding measure depends on at least one second activation function applied to the second neuron output value.
21. The computer-implemented method of clause 20, wherein the second activation function is non-linear.
22. The computer-implemented method of clause 20 or 21, wherein the second activation function is based on the Boltzmann distribution.
23. The computer-implemented method of clause 20, 21 or 22, wherein the second activation function has parameters that are trained during training of the neural network.
24. A computer-implemented method of identifying one or more target sites of a target enzyme, the method comprising: obtaining model parameters from a machine learning model trained in accordance with the method of any of clauses 1 to 23; based on the model parameters, identifying the one or more target sites of the target enzyme.
25. The computer-implemented method of clause 24, wherein the machine learning model disambiguates whether a given site’s influence on a probability of the target enzyme being active is due to the target site influencing a probability of correct folding of the target enzyme or is due to other factors. 26. The computer-implemented method of clause 25, wherein the one or more target sites are those predicted to influence the probability of the target enzyme being active due to reasons other than influencing a probability of correct folding.
27. The computer-implemented method of clause 24, 25 or 26, wherein the one or more target sites are selected based on a subset of the model parameters learnt in the training that express the contribution of each mutation towards the activity measure due to reasons other than influencing a probability of correct folding.
28. The computer-implemented method of any of clauses 24 to 27, wherein the one or more target sites are selected based on druggability.
29. The computer-implemented method of any of clauses 24 to 28, wherein the selection of surface sites is favoured.
30. The computer-implemented method of any of clauses 24 to 29, wherein: the model parameters comprise a first set of weights and a second set of weights; the predicted activity measure depends on both the first set of weights and the second set of weights; and the predicted folding measure depends on the second set of weights but is independent of the first set of weights.
31. The computer-implemented method of clause 30, comprising generating an aggregate measure of the first set of weights for each of a plurality of given target sites.
32. The computer-implemented method of clause 31, comprising selecting the one or more target sites based on ranking their aggregate measures.
33. The computer-implemented method of clause 31 or 32, comprising selecting the one or more target sites by comparing their aggregate measures to a predefined threshold.
34. The method of any of clauses 24 to 33, wherein the target sites are allosteric sites of the target enzyme.
35. A computer-implemented method of identifying a mutated variant of interest of an enzyme, the method comprising: for each of a plurality of mutated variants of a target enzyme, providing an input specifying mutations in the respective mutated variant to a machine learning model trained to output a predicted activity measure and a predicted folding measure for the mutated variant; receiving from the machine learning model a predicted activity measure and a predicted folding measure for each of the plurality of mutated variants; and based on the predicted activity measures and the predicted folding measures, selecting from the plurality of mutated variants at least one mutated variant of interest.
36. A method of generating training data for training a machine learning model, the method comprising: performing wet lab experiments to obtain data for deriving an activity measure and a folding measure for a wild type variant of a target enzyme and each of a plurality of mutant variants of the target enzyme. 37. The method of clause 36, comprising performing a solubility assay that provides a measure of the frequency of folding for each variant.
38. The method of clause 37, comprising deriving the folding measure from the frequency of folding.
39. The method of clause 36, 37 or 38, comprising performing an activity assay that provides a measure of frequency of occurrence of mutated variants in an active thermodynamic state.
40. The method of clause 39, comprising deriving the activity measure from the frequency.
41. The method of any of clauses 36 to 40, comprising performing an in vivo activity assay that provides data for deriving the activity measure.
42. The method of any preceding clause, wherein the target enzyme is a protein kinase.
43. The method of clause 42, wherein the protein kinase is Src kinase.
NUMBERED EMBODIMENTS
[0356] The invention is also described by the following numbered embodiments:
Embodiment 1. A method of modulating the activity of Src kinase comprising mutating one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514.
Embodiment 2. The method of embodiment 1, wherein one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven, twenty eight, twenty nine, thirty, thirty one, thirty two, thirty three, thirty four, thirty five, thirty six, thirty seven, thirty eight, thirty nine, forty, forty one, forty two, forty three, forty four, forty five, forty six, forty seven, forty eight, forty nine, fifty, fifty one, fifty two, fifty three, fifty four, fifty five, fifty six, fifty seven, or fifty eight residues is mutated.
Embodiment 3. The method of embodiment 1 or embodiment 2, wherein the one or more residues is selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509.
Embodiment 4. The method of embodiment 1 or embodiment 2, wherein the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514. Embodiment 5. The method of any one of the preceding embodiments, wherein the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 &
A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A411; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408; f) C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471; g) K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410; h) L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465; i) M317, K318, K319, L328, V331 & 1339; j) V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412; k) L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, 1339 & V340; l) R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442; m) L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407; n) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; o) M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408; p) V386, R388, R412, L413, 1414, E415, N417, E418, Y419, Y439, G440, R441, F442 & T443; q) 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502; r) W289, R321, Q327, L328, Y329, A330 & V331; s) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338; t) L270, W289, T293, R294, V295, Y329, A330 & V340; u) L272, E273, V274, K275, L276, V284, R294, V295, A296, 1297 & Y343; v) R422, Q423, 1429, V470, N471, R472, E473, V474, L475 & D476; w) T293, R294, V295, Y329, T341, E342, Y343, M344 & S345; x) F352, L353, K354, G358, K359, Y360, L361, R362, L366, E457, L458, T459, T460, K461, G462, P488 & E489; y) E268, S269, L270, R271, L272 & Y338; z) G481, R483, D496, M498, C499, Q500, C501, W502, R503, K504, E505 & E508; aa) L361, R362, L363, L366, L458, T459, T460, K461, R463, P487, P488, E489, C490, P491 & L494; and bb) P364, V367, D368, A371, F518, L519, E520, D521, Y522, F523 & T524. Embodiment 6. The method of embodiment 5, wherein one residue within a group of residues is mutated.
Embodiment 7. The method of embodiment 5, wherein two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen or nineteen residues within a group of residues is mutated.
Embodiment 8. The method of embodiment 5 or embodiment 7, wherein all residues within a group of residues is mutated.
Embodiment 9. The method of any one of embodiments 5 to 8, wherein the one or more residues is selected from one group of residues.
Embodiment 10. The method of any one of embodiments 5 to 8, wherein the one or more residues is selected from two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven or twenty eight groups of residues.
Embodiment 11. The method of any one of the preceding embodiments, wherein the one or more residues is located within an allosteric site.
Embodiment 12. The method of any one of the preceding embodiments, wherein the modulating is activating or inactivating.
Embodiment 13. The method of embodiment 12, wherein the modulating is inactivating.
Embodiment 14. The method of embodiment 13, wherein the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A411; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408; f) C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471; g) K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410; h) L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465; i) M317, K318, K319, L328, V331 & 1339; j) V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412; k) L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, 1339 & V340; l) R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442; m) L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407; n) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; o) M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408; p) 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502; and q) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
Embodiment 15. The method of embodiment 12, wherein the modulating is activating.
Embodiment 16. The method of embodiment 15, wherein the one or more residues is selected from one or more of the following groups of residues: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; and b) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
Embodiment 17. A binding molecule which binds to one or more target sites on Src kinase, wherein the one or more target sites comprises one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514.
Embodiment 18. The binding molecule of embodiment 17, wherein the one or more residues is selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509.
Embodiment 19. The binding molecule of embodiment 17, wherein the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514. Embodiment 20. The binding molecule of any one of embodiments 17 to 19, wherein the one or more target sites comprises a group of residues selected from the group consisting of: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 &
A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A411; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408; f) C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471; g) K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410; h) L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465; i) M317, K318, K319, L328, V331 & 1339; j) V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412; k) L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, 1339 & V340; l) R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442; m) L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407; n) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; o) M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408; p) V386, R388, R412, L413, 1414, E415, N417, E418, Y419, Y439, G440, R441, F442 & T443; q) 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502; r) W289, R321, Q327, L328, Y329, A330 & V331; s) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338; t) L270, W289, T293, R294, V295, Y329, A330 & V340; u) L272, E273, V274, K275, L276, V284, R294, V295, A296, 1297 & Y343; v) R422, Q423, 1429, V470, N471, R472, E473, V474, L475 & D476; w) T293, R294, V295, Y329, T341, E342, Y343, M344 & S345; x) F352, L353, K354, G358, K359, Y360, L361, R362, L366, E457, L458, T459, T460, K461, G462, P488 & E489; y) E268, S269, L270, R271, L272 & Y338; z) G481, R483, D496, M498, C499, Q500, C501, W502, R503, K504, E505 & E508; aa) L361, R362, L363, L366, L458, T459, T460, K461, R463, P487, P488, E489, C490, P491 & L494; and bb) P364, V367, D368, A371, F518, L519, E520, D521, Y522, F523 & T524. Embodiment 21. The binding molecule of any one of embodiments 17 to 20, wherein the one or more target sites is one or more allosteric sites.
Embodiment 22. The binding molecule of any one of embodiments 17 to 21, wherein the one or more target sites is surface exposed.
Embodiment 23. The binding molecule of any one of embodiments 17 to 22, wherein the one or more target sites is druggable.
Embodiment 24. The binding molecule of any one of embodiments 17 to 23, wherein the binding molecule is a polypeptide, a nucleic acid, an antibody or a small molecule.
Embodiment 25. The binding molecule of any one of embodiments 17 to 24, wherein the binding molecule modulates the activity of Src kinase.
Embodiment 26. The binding molecule of embodiment 25, wherein the modulating is activating or inactivating.
Embodiment 27. The binding molecule of embodiment 26, wherein the modulating is inactivating.
Embodiment 28. The binding molecule of embodiment 27, wherein the one or more target sites comprises a group of residues selected from the group consisting of: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 &
A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A411; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408; f) C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471; g) K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410; h) L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465; i) M317, K318, K319, L328, V331 & 1339; j) V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412; k) L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, 1339 & V340; l) R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442; m) L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407; n) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; o) M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408; p) 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502; and q) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
Embodiment 29. The binding molecule of embodiment 26, wherein the modulating is activating.
Embodiment 30. The binding molecule of embodiment 29, wherein the one or more target sites comprises a group of residues selected from the group consisting of: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; and b) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338;
Embodiment 31. A polypeptide encoding a Src kinase variant, wherein the polypeptide comprises a mutation, relative to a wild-type Src kinase, at one or more residues selected from the group consisting of: E283, K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, E335, P336, 1337, 1339, T341, Y343, S348, V380, E381, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, K426, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V470, V474, L475, D476, W502, R509, T511, and Y514.
Embodiment 32. The polypeptide of embodiment 31, wherein the one or more residues is selected from the group consisting of: K301, M305, F310, E313, A314, M317, L320, L328, A330, V332, P336, 1337, 1339, S348, V380, V386, H387, R388, A392, A393, A406, F408, G409, L410, A411, R412, Y419, A421, G424, F427, 1429, K430, W431, T432, A433, E435, A436, D447, 1453, E457, G462, P465, Y466, V474, L475, W502, and R509.
Embodiment 33. The polypeptide of embodiment 31, wherein the one or more residues is selected from the group consisting of: E283, E335, T341, Y343, E381, K426, V470, D476, E479, T511, and Y514.
Embodiment 34. The polypeptide of any one of embodiments 31 to 33, wherein the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A411; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408; f) C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471; g) K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410; h) L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465; i) M317, K318, K319, L328, V331 & 1339; j) V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412; k) L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, 1339 & V340; l) R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442; m) L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407; n) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; o) M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408; p) V386, R388, R412, L413, 1414, E415, N417, E418, Y419, Y439, G440, R441, F442 & T443; q) 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502; r) W289, R321, Q327, L328, Y329, A330 & V331; s) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338; t) L270, W289, T293, R294, V295, Y329, A330 & V340; u) L272, E273, V274, K275, L276, V284, R294, V295, A296, 1297 & Y343; v) R422, Q423, 1429, V470, N471, R472, E473, V474, L475 & D476; w) T293, R294, V295, Y329, T341, E342, Y343, M344 & S345; x) F352, L353, K354, G358, K359, Y360, L361, R362, L366, E457, L458, T459, T460, K461, G462, P488 & E489; y) E268, S269, L270, R271, L272 & Y338; z) G481, R483, D496, M498, C499, Q500, C501, W502, R503, K504, E505 & E508; aa) L361, R362, L363, L366, L458, T459, T460, K461, R463, P487, P488, E489, C490, P491 & L494; and bb) P364, V367, D368, A371, F518, L519, E520, D521, Y522, F523 & T524.
Embodiment 35. The polypeptide of embodiment 34, wherein one residue within a group of residues is mutated.
Embodiment 36. The polypeptide of embodiment 34, wherein two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen or nineteen residues within a group of residues is mutated. Embodiment 37. The polypeptide of embodiment 34 or 36, wherein all residues within a group of residues is mutated.
Embodiment 38. The polypeptide of any one of embodiments 34 to 37, wherein the one or more residues is selected from one group of residues.
Embodiment 39. The polypeptide of any one of embodiments 34 to 38, wherein the one or more residues is selected from two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty one, twenty two, twenty three, twenty four, twenty five, twenty six, twenty seven or twenty eight groups of residues
Embodiment 40. The polypeptide of any one of embodiments 31 to 39, wherein the one or more residues is located within an allosteric site.
Embodiment 41. The polypeptide of any one of the embodiments 31 to 40, wherein the kinase activity of the polypeptide is modulated relative to the kinase activity of wild-type Src kinase.
Embodiment 42. The polypeptide of embodiment 41, wherein the modulation is an increase or a decrease in kinase activity relative to the kinase activity of wild-type Src kinase.
Embodiment 43. The polypeptide of embodiment 41 or embodiment 42, wherein the modulation is a decrease in kinase activity relative to the kinase activity of wild-type Src kinase.
Embodiment 44. The polypeptide of embodiment 43, wherein the one or more residues is selected from one or more of the following groups of residues: a) V316, M317, L320, L325, V380, Y385, V386, H387, R388, V405, A406, D407 & D447; b) K298, M305, F310, E313, A314, M317, L328, V331, 1339, T341, Y385, D407, F408, G409, L410 & A411; c) R388, P428, 1429, K430, T432, A436, A437 & F442; d) V316, M317, L320, R321, L325, V326, Q327, L328, V380, H387, V405, A406, D407, F408 & A411; e) V284, W285, A296, 1297, K298, V326, 1339, V340, T341, E342, L396, A406, D407 & F408; f) C280, R391, Y419, A425, K426, F427, P428, 1429, K430, W431, V464, P465 & N471; g) K298, V316, M317, K318, K319, L320, R321, H322, L325, V326, Q327, L328, 1339, T341, V405, A406, D407, F408 & L410; h) L350, K354, R391, A392, A393, P428, W431, E457, G462, R463, V464 & P465; i) M317, K318, K319, L328, V331 & 1339; j) V316, M317, K319, L320, R321, L325, Y379, V380, E381, M383, Y385, V386, H387, V405, F408, A411 & R412; k) L270, L272, 1297, K298, T299, L300, F310, A330, V331, V332, P336, 1337, 1339 & V340; l) R388, L410, T420, A421, R422, Q423, G424, F427, A436, A437, L438, Y439, G440 & F442; m) L276, V284, A296, V326, T341, E342, Y343, M344, S345, G347, S348, L396, A406 & D407; n) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; o) M317, L320, R321, H322, E323, K324, L325, V326, S375, G376, M377, Y379, V380, C403, K404, V405, A406 & F408; p) 1453, T456, E457, T460, G462, R463, V464, P465, Y466, P467, M484, P485, C486, P487, P488, H495, M498 & W502; and q) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338.
Embodiment 45. The polypeptide of embodiment 41 or 42, wherein the modulation is an increase in kinase activity relative to the kinase activity of wild-type Src kinase.
Embodiment 46. The polypeptide of embodiment 45, wherein the one or more residues is selected from one or more of the following groups of residues: a) A421, R422, Q423, G424, A425, K426, F427, P428, 1429, V470, N471, R472, E473, V474, L475 & D476; and b) L272, F281, G282, E283, W285, T299, L300, K301, P302, P336 & Y338;
Embodiment 47. A nucleic acid encoding: a) the binding molecule of any one of embodiments 17 to 30; and/or b) the polypeptide of any one of embodiments 31 to 46.
Embodiment 48. The nucleic acid of embodiment 47, wherein the nucleic acid is RNA or DNA.
Embodiment 49. The nucleic acid of embodiment 47 or embodiment 48, wherein the nucleic acid is modified, unmodified, naturally occurring or synthetic.
Embodiment 50. An expression cassette comprising the nucleic acid of any one of embodiments 47 to 49.
Embodiment 51. A vector comprising the nucleic acid of any one of embodiments 47 to 49 or the expression cassette of embodiment 50.
Embodiment 52. A cell comprising the binding molecule of any one of embodiments 17 to 30, the polypeptide of any one of embodiments 31 to 46, acid of any one embodiments 47 to 49, the expression cassette of embodiment 50, or the vector of embodiment 51. Embodiment 53. The binding molecule of any one of embodiments 17 to 30, the polypeptide of any one of embodiments 31 to 46, the nucleic acid of any one of embodiments 47 to 49, the expression cassette of embodiment 50, the vector of embodiment 51, or the cell of embodiment 52, for use in a method of treating a disease.
Embodiment 54. A method of treating a disease comprising administering the binding molecule of any one of embodiments 17 to 30, the polypeptide of any one of embodiments 31 to 46, the nucleic acid of any one of embodiments 47 to 49, the expression cassette of embodiment 50, the vector of embodiment 51 or the cell of embodiment 52, to a patient in need thereof.
Embodiment 55. Use of the binding molecule of any one of embodiments 17 to 30, the polypeptide of any one of embodiments 31 to 46, the nucleic acid of any one of embodiments 47 to 49, the expression cassette of embodiment 50, the vector of embodiment 51 or the cell of embodiment 52, for the manufacture of a medicament for use in the treatment of a disease.
Embodiment 56. The binding molecule, polypeptide, nucleic acid, expression cassette, vector, or cell for use of embodiment 53, the method of embodiment 54 or the use of embodiment 55, wherein the disease is selected from the group consisting of: cancer, rheumatoid arthritis, chronic kidney disease, central nervous system diseases, viral diseases, aging including skin aging pulmonary fibrosis, epilepsy, tuberculosis, cardiovascular disease, macrophage-mediated inflammatory disease and bone homeostasis.
Embodiment 57. The binding molecule, polypeptide, nucleic acid, expression cassette, vector, or cell for use, the method or the use of embodiment 56, wherein the chronic kidney disease is selected from renal fibrosis, glomerulonephritis, diabetic nephropathy, HIV-associated nephropathy, polycystic kidney disease and obesity -induced kidney disease.
Embodiment 58. The binding molecule, polypeptide, nucleic acid, expression cassette, vector, or cell for use, the method or the use of embodiment 56, wherein the central nervous system disease is selected from migraine and neuropathic pain.
Embodiment 59. The binding molecule, polypeptide, nucleic acid, expression cassette, vector, or cell for use, the method or the use of embodiment 56, wherein the cardiovascular disease is selected from hypertension, heart disease, myocardial ischemia reperfusion injury, and arrhythmia.
EXAMPLES
[0357] The invention is also described by the following examples.
RESULTS
Measuring activity and solubility of Src protein kinase variants at scale [0358] A general strategy was conceived that uses genetics to comprehensively quantify allosteric regulation of enzyme activity. The approach has three steps: first, the effects of mutations at all sites of interest in an enzyme on its catalytic activity are quantified; second, the effects of the same mutations on the solubility of the enzyme are measured; and third, changes in activity that cannot be accounted for by changes in concentration are quantified by fitting a model to the data. In practice, this last step of model fitting is greatly facilitated by quantifying not only the effects of single amino acid (aa) mutations but also those of double or combinatorial mutants. Due to the non-linear relationships between the energetic effects of mutations and molecular phenotypes, quantifying the effects of mutations in multiple protein variants with different stabilities and/or activities provides sufficient data to constrain model fitting, allowing the underlying causal effects of mutations to be determined.
[0359] The approach was applied to human oncoprotein Src. The enzymatic activity of variants of Src is easy to quantify using a highly-validated cellular toxicity assay where the inhibition of cellular growth is directly proportional to the amount of Src-induced protein phosphorylation. The solubility of Src can also be quantified in the same cells using a highly-validated protein abundance selection assay, abundancePCA (aPCA), that uses protein fragment complementation to quantify soluble protein concentration over at least three orders of magnitude.
[0360] To generate sufficient data for model fitting, five overlapping libraries of Src variants that together cover all possible single mutants in the kinase domain (KD), with each variant present in at least 10 different genetic backgrounds, were generated. The genetic backgrounds were selected to provide a range of different Src activities due to either stability or catalytic activity changes. In total, the five libraries contain a total of 54,455 genotypes. The kinase activity and solubility of each genotype was quantified in 30 separate pooled selection assays. First, kinase activity was quantified in triplicate using kinase -dependent impairment of cell growth (Figure 2A). Activity scores were highly reproducible for all five library tiles (median Pearson’s r = 0.90) and strongly correlated with Src-dependent tyrosine phosphorylation (r = -0.89, Figure 2C; Ahler et al, Mol. Cell 74, 393-408. e20 (2019)). Second, the cellular abundance of soluble kinase was quantified using aPCA (Figure 2B). Abundance measurements were also reproducible (median r = 0.75) and correlated with the in vivo abundance of SRC variants measured by western blotting (r = 0.66, Figure 2D).
Quantifying changes in activity not caused by changes in abundance
[0361] To quantify the changes in Src activity that are not accounted for by changes in the cellular abundance of soluble Src, a simple phenomenological ‘enzyme folding and activation’ model was fit to the data.
[0362] In this model, the folding of Src is explicitly modeled using a two-state thermodynamic model in which the protein can exist in two states - unfolded (U) and folded (F), with the Gibbs free energy of folding, AGf, determining the partitioning of Src molecules between the two states according to the Boltzmann distribution. All other biophysical changes that alter Src kinase activity are quantified using a second pseudo free energy which is referred to as the activity energy, AGa. The model is formally equivalent to a 3 -state model with unfolded (U), folded inactive (F), and folded active (A) states (Figure 4A). However, the active state modeled here is phenomenological, and designed to quantify all changes in activity not accounted for by changes in total soluble protein abundance. Although shifts in the equilibrium between inactive and active kinase conformations will be captured as changes in AGa, so too will other molecular mechanisms that affect catalytic activity (kcat) and substrate affinity (Km) independently of the conformational state of Src.
[0363] Fitting the enzyme folding and activation model to the data by training a machine learning model as described above (Figure 4B) provides excellent prediction of the abundance and activity of double mutants (median percent variance explained 82.6% for activity and 68.8% for abundance, evaluated by 10-fold cross validation) and a marked improvement over a 2-state (unfolded, folded active) model. In contrast, increasing the complexity of the model to four states did not improve performance. In total, the dataset quantifies folding and activity energies for all of the 5,111 possible single substitution variants in the Src KD (Figure 6A-D).
Stability of kinase fold
[0364] The data provides the first comprehensive measurement of how mutations affect the stability of the protein kinase fold in vivo and one of the largest sets of solubility measurements for any protein. [0365] The Src KD is composed of two structurally and functionally distinct subdomains - the N- and C-lobes - with the active site located in the cleft between the two. The N-lobe is mostly composed of beta strands and contains the ATP binding site, whereas the C-lobe is mostly alpha helical and ends in a disordered C-terminal tail that regulates the conformational state of the kinase. Mutations have a wide range of effects, with many destabilizing (703 strongly destabilizing mutations with AAGf >0.5, p<0.05, z-test) and a large number of moderately stabilizing variants (468 with AAGf <0, p<0.05, z-test).
[0366] Destabilizing mutations across the KD show strong structural biases. As expected, mutations in the core of the KD (relative solvent accessible surface area, rSASA<0.25) are much more likely to be destabilizing than mutations on the surface (OR = 10.10, p = 8.02e-l 16, Fisher’s exact test (FET), Figure 6E-F). However, this enrichment is much stronger for the C-lobe (OR = 25.81, p = 3.10e-l 10, FET) than for the N-lobe (OR = 2.30, p = 3.78e-6, FET). Indeed, destabilizing mutations are enriched overall in the C-lobe (OR = 1.61, p = 8.25e-7, FET, Figure 6E-F). Mutations in the C-terminal tail only have mild effects on solubility. In addition, individual secondary structure elements are particularly sensitive to mutation, with the hydrophobic aF helix buried in the C-lobe core most critical for solubility (Figure 6G). The aE helix that contacts aF extensively, aEF, and P7 and P8 that are packed against aE are also highly enriched in destabilizing mutations (Figure 6G). Together these elements form the main stabilizing core of the kinase fold (Figure 6E-G). Mutations to proline are most likely to be destabilizing, especially in helices and beta strands (Figure 6A-D), consistent with the role of proline as a disruptor of secondary structures. Finally, of all long-range side-chain to side-chain structural contacts, the salt bridge connecting R 09 in helix al with E435 in aEF is the most sensitive to mutation (maximum mean AAGf of residue pairs), with no substitutions tolerated in either residue. Mutations in E435-R509 are 1.91-fold more destabilizing than mutations in the next most sensitive contact (the K298-E313 salt bridge), 3.33-fold more than the average of all salt bridges, and 5.45-fold more than the average of all side-chain to sidechain contacts. [0367] The two structurally distinct lobes of the Src kinase domain thus also contribute differentially to the in vivo solubility of the domain, with the more dynamic ATP -binding N-lobe displaying a higher tolerance to mutagenesis than the larger and more compact C-lobe.
Stabilizing mutations and molecular frustration
[0368] Quantifying mutational effects across a range of destabilized genetic backgrounds allowed identification of a total of 468 mutations that moderately increase solubility of the Src KD. These variants are distributed in 118 positions in the Src KD (AAGf <0, p<0.05, z-test, Figure 6G). Examination of the spatial distribution of mutations increasing solubility in the structure reveals a striking continuous surface of sites spanning the ATP binding site, the activation loop and the aG helix (Figure 6E-G). Stabilizing variants are enriched in residues essential for catalysis, including the ATP binding site (especially in the flexible G-loop that positions ATP for catalysis, OR = 1.45, p = 0. 15, FET), in the proton acceptor D389 in the catalytic loop (OR = 2.66, p = 0.09, FET), and in the activation segment (OR = 9.34,p = 5.72e-84, FET). The activation segment contains over half of all mutations increasing solubility in the kinase, in particular in the Mg2+ positioning loop (OR = 14.62,p =2. Ole-32, FET), with L410 and A411 individually enriched (FDR<0.1, FET), and in the substrate positioning loop (OR = 5.04, p = 3.21e-17, FET), with many stabilizing mutations in F427, P428 and 1429 (FDR<0.1, FET). In addition, V464 in the aF-aG loop, and N471 and R472 in aG are particularly enriched in stabilizing mutations (FDR<0.1, FET).
[0369] Many residues in the kinase domain of Src are thus highly frustrated, with side chains important for catalytic activity compromising the solubility of the protein. This may reflect both the need for catalytically important side chains that are not optimal for fold stability and solubility and the inherent need for flexibility in some sites for substrate binding and product release during catalysis.
The Src active site
[0370] The data provides the first complete map of the effects of mutations on protein kinase activity independently of their effects on protein abundance. In total, 1273 mutations that modulate kinase activity much more than can be accounted for by changes in Src abundance (| AAGa|>0.5, FDR<0. 1, z-test) were identified. 1227 of these mutations are inhibitory and 46 are activating. Mutations in the active site - residues that directly contact ATP, Mg2+ or the substrate peptide phosphosite - are overwhelmingly detrimental to kinase activity, with 225/247 decreasing activity (AAGa >0.5, OR = 39.39, p = 1.63e-l 18, FET, Figure 7D). Inactivating variants are strongly enriched in the ATP binding site (OR = 3.78, p = 1.41e-24, FET), the catalytic loop (HRD motif, OR = 5.85, p = 1.32e-39, FET), the Mg2+ positioning loop (DFG motif, OR = 32.43, p = 5.92e-44, FET) and the substrate positioning loop (OR = 9.37, p = 1.00e-42, FET) (Figure 7D). Mutations in the beta sheet that forms the top surface of the ATP binding pocket have a striking alternating pattern of mutational effects, with substitutions of side chains pointing towards the nucleotide detrimental for activity and substitutions of side chains facing away from the active site not reducing activity or, in the case of 3 residues in the beta strands flanking the G-loop, actually increase activity.
Major allosteric sites [0371] 1048 mutations in 174 sites located outside of the active site that modulate kinase activity
(|AAGa|>0.5,FDR<0.1, z-test) were identified. 1002 of these allosteric mutations are inhibitory and 46 activate the kinase. Major allosteric sites were defined as residues outside the active site that are enriched for these mutations (Figure 7E). By this definition, Src has 47 major allosteric sites (OR>2, FDR<0.1, FET): 13 in the N-lobe and 34 in the C-lobe. 23/47 major allosteric sites are second shell sites contacting residues in the active site. The major allosteric sites include all 11 non-active site residues previously predicted to be part of an allosteric network that communicates between the substrate and ATP binding sites of Src (Figure 7E; Foda et al., Nat. Commun. 6, 5939 (2015)). This network was predicted via analysis of changes in electrostatic and hydrophobic contacts between active and inactive conformations in molecular dynamics simulations. Of these 11 previously predicted allosteric positions, 8 are second shell residues. The predicted allosteric network is very strongly enriched for mutations with large AAGa (OR = 20.75, p = 3.77e-80, FET, excluding active site residues, and OR = 10.53, p = 1.13e-l 6, FET, excluding active site and second shell residues), with all individual residues enriched at least 6-fold for allosteric mutations (Figure 7E).
Inhibitory allosteric mutations
[0372] Defining major inhibitory allosteric sites as sites enriched for inhibitory mutations (OR>2, FDR<0.1, FET) identifies 47 positions, all of which are also major allosteric sites. In contrast to mutations altering fold stability, inhibitory allosteric mutations are not significantly enriched in the C- lobe (OR = 1.10, p = 0.21, FET). Inhibitory allosteric mutations are, however, enriched in several structural elements (Figure 7C). One of the most enriched is helix aC, consistent with its conformational change upon Src activation. Inhibitory mutations are concentrated in residues on the inner surface of aC, including E313 that engages in a salt bridge with K298 in the active state (Figure 7E), the R-spine residue M317, and the hydrophobic residues F310, A314 and L320. The P4 and P5 strands located between aC and the active site are also enriched for allosteric mutations, but to a lesser extent (Figure 7C). Inhibitory allosteric mutations are also abundant in the activation loop (Figure 7C) including in Y419, which locks Src in the active state when phosphorylated. Interestingly most other residues in the middle region of the activation loop are tolerant to mutation, with the exceptions of 1414 and A421 that act as anchor points of the activation loop within the surface of the C-lobe in the active state. Allosteric mutations are also enriched in aEF that functions in substrate positioning and in hydrophobic residues in the aG helix that contact aEF.
[0373] Finally, inhibitory allosteric mutations are enriched in the aF helix that acts as an anchor for the catalytic (C) and regulatory (R) ‘spines’. The C- and R-spines are two groups of residues that are not contiguous in the primary sequence of kinases but form a bipartite hydrophobic core in catalytically active kinases. Mutations in all sites of the R-spine have strong inactivating effects. In contrast, only the sites of the C-spine in direct contact with ATP (A296, V284, L396) are enriched in inactivating mutations. Mutations in the rest of the C-spine sites have small effects on AGa, and their strong effects on kinase activity at the fitness level are almost fully explained by a loss of fold stability.
Activating allosteric mutations [0374] In total, 11 residues outside of the active site are enriched for activating mutations, which were defined as major activating allosteric sites (OR>1, p<0.05, FET). In the N-lobe, major activating allosteric sites include the gatekeeper residue T341, as well as its neighboring residue Y343. E283 that flanks the G-loop and forms a salt bridge with K275 to constrain the conformation of the G-loop, and E335 in the P4-J35 loop are also major activating allosteric sites. In the C-lobe, major activating allosteric sites are located in the aF pocket (E381, T511 and Y514) and in surface-exposed residues of helix aG (V470, D476, E479). Finally, activating mutations are also enriched in K426, which is adjacent to residues that position the substrate peptide Y for phosphorylation. Interestingly, all substitutions of K426 to hydrophobic residues increase activity, with the exception of K426Y which is inhibitory.
The distance dependence of allosteric regulation
[0375] The average effect of mutations on Src activity is much stronger closer to the Src active site. Indeed, considering all 252 residues in the Src KD, there is an exponential decay of mutational effects on activity away from the active site, with a decay rate k = -0.093, corresponding to a 50% reduction of allosteric effects over a distance di/2 = 7.45 . Interestingly, in contrast to what is observed for inactivating mutations (AAGa >0), whose effects scale with distance (k = -0.083) the distance dependence of activating mutations is extremely weak (k = -0.017). This effect is not driven by the smaller effect size of activating mutations as inactivating mutations with matching effect sizes had a decay almost twice as fast (median k = -0.030, simulation p = 1.3e-3, n=10,000 subsamples).
[0376] However, allosteric communication in the kinase domain is not isotropic, as mutations in particular residues and in particular directions are much more likely to be allosteric at a given distance. To illustrate this, the distance dependence when only considering the major allosteric sites is much weaker (k = -0.053, di/2 = 12.38 A). Major allosteric sites are not arranged uniformly throughout the kinase domain. Instead, they are spatially clustered and have higher connectivity than expected by chance. Indeed, quantification of the allosteric decay rate from the active site in 3 orthogonal spatial axes (x,y,z, axes as defined in PDB ID: 2SRC) in the positive and negative directions (+,-), reveals more effective transmission in the direction towards helix aC (y+, k = -0.053, di/2 = 13.08 A), and in the vertical axis of the KD (z+, k = -0.073, di/2 = 9.50 A and z-, k = -0.068, di/2 = 10.19 A), and less effective transmission towards the regulatory domain interaction surfaces (x+, k = - 0.308, di/2 = 2.25 A). Allosteric transmission also differs across secondary structure types, with faster decay for mutations occurring in beta strands, where mutations have smaller effects than expected given their distance to the active site.
[0377] Inhibitory allosteric communication in the Src KD is thus strongly distance dependent, but also anisotropic: transmission efficiency is dependent on the direction of propagation, with at least a 6-fold difference in decay rates between the most and least efficient directions.
Allosteric communication via dynamic non-covalent contacts
[0378] The structure of the Src KD differs between its active and inactive states, with changes in the positioning of helix aC and the activation loop and multiple residue contact rearrangements in the active site and throughout the kinase domain. Based on their contact patterns in active (PDB ID: 1Y57) and inactive (PDB ID: 2SRC) state structures of Src, 4 types of residues were defined: active-only (engaging in contacts only in the active state), inactive-only (only in the inactive state), swapping (residues that have different contacts in the two states), and static (residues with the same contacts in both states).
[0379] Overall, mutations in active-only and swapping residues are more likely to affect the activity of the Src KD more than those in inactive state-only residues. The differences in | AAGa| are strongest when considering contacts between side chains, including salt bridges (swapping vs inactive-only difference in mean |AAGa| (effect size, ES) = 1.05 kcal/mol, adj.p = 1.00e-l 1 Wilcoxon rank sum test), pi-cation interactions (active-only vs inactive-only ES = 1.62 kcal/mol, adj.p = 6.36e-21 and swapping vs inactive- only ES = 0.90 kcal/mol, adj.p = 9.48e-7), and side-chain to side-chain hydrogen bonds (active-only vs inactive-only ES = 0.68 kcal/mol, adj.p = 6.84e-17, swapping vs inactive ES=0.46 kcal/mol, adj.p = 3. 19e-8) . These differences are weaker but still significant for side-chain to backbone hydrogen bonds (active vs inactive ES = 0.29 kcal/mol, adj.p = 3.19e-8 and swapping vs inactive ES = 0.08 kcal/mol, adj.p = 0.64 kcal/mol. Swapping residues include those forming Src’s ‘electrostatic switch network’ of contacts that change during activation: D407-K298, E313-R412 and D389-Y419 in the inactive state, that break and rearrange into E313-K298 and R412-Y419. Mutations in these residues are extremely detrimental for Src activity.
[0380] The allosteric map thus shows that residues with contacts that change upon activation are particularly important for Src activation and enables the prioritization of which of these dynamic contacts are most important for activation.
Amino acid changes and allostery
[0381] Mutations at histidine, glycine, and several hydrophobic residues (alanine, isoleucine, valine, phenylalanine, and tryptophan) are more likely to have inhibitory effects (FDR<0. 1, FET) and substitutions to proline are the most likely to be inhibitory (FDR<0.1, FET. In Src, substitutions to charged residues (glutamate, lysine and arginine) and to tryptophan, the largest aa, are also more likely to be allosteric. Substitutions to cysteine, alanine and methionine are the least likely to be allosteric (FDR<0.1, FET). The smaller number of activating allosteric mutations makes analyses of their properties less powered. However, mutations at glutamate residues (FDR<0. 1, FET) and substitutions to hydrophobic aa are more likely to be allosteric activating mutations (p = 7.6e-3, FET for hydrophobics as a group).
Predicting allosteric mutations
[0382] It was investigated how much of the variance in allostery (AAGa) for mutations outside of the active site can be accounted for by sequence and structural features. Linear modeling was used to predict AAGa from simple features: the minimum heavy atom distance of the mutated residue to the nucleotide (AMP-PNP in PDB structure 2SRC) and to the catalytic D389, the identity of the wild-type and mutant aa, solvent accessibility, contact type and dynamics (active-only, inactive-only, swapping, and static), and secondary structure element type. Distance to the catalytic site and to the nucleotide are the most predictive features when tested individually. A linear model combining all predictors explains 46% of the variance in AAGa (tested on held out data, 10-fold cross-validation), which increased further to 51% when incorporating specific secondary structure elements as a feature (tested on held out data, 10-fold cross- validation). Mutation effects on activity are thus reasonably well-predicted from simple structural features alone.
Genetic prioritization of allosteric surface pockets
[0383] Structural analysis of Src identifies 28 unique potentially druggable surface pockets present in at least one of 15 different Src structures. To prioritize these pockets for drug development, the comprehensive atlas of mutational effects was used to annotate each of these surface pockets by testing for the enrichment of inhibitory and activating allosteric mutations (Figure 8A). In total, 17 Src pockets are enriched for inhibitory mutations (Fisher’s exact test, FDR<0.05). These inhibitory pockets include the orthosteric ATP binding site targeted by competitive inhibitors. Beyond the orthosteric site, two other surface pockets of Src have been targeted by small molecule inhibitors: the DFG pocket, and P7.
Consistent with this, both of these pockets are enriched for allosteric inhibitory mutations in the allosteric map, genetically validating their regulatory potential (Figure 8B).
[0384] Across 538 protein kinases, 12 different pockets have been reported as allosteric. Of these, 7 have a structurally homologous pocket in Src. It was investigated whether these structurally homologous pockets in Src are also allosterically active in Src. In total, 4 of these 7 pockets are enriched for inhibitory allosteric mutations (FDR<0.05, FET, Figure 8A, C). Pockets that are allosteric in other kinases and also strongly allosteric in Src include Pl 1, homologous to the MT3 pocket in MEK1/2 targeted by type III allosteric inhibitors, P22, homologous to the AAS site in Aurora A, where one KD activates another through binding of its activation segment to this site, and P6, homologous to the PDIG pocket in CHK1 close to the substrate binding site that is bound by small molecule inhibitors (Figure 8 A, C).
[0385] In contrast, 3 other pockets homologous to allosteric pockets in other kinases show little evidence of allostery in Src. These include pocket P3 (Figure 8A), homologous to the EDI site that is part of the EGFR dimerization interface and P8 (Figure 8A, C), homologous to the Bcr-Abl myristoyl pocket (CMP) that has been targeted by small molecule inhibitors. Interestingly, three of the residues that form the myristoyl pocket in Abl are substituted by bulky residues in Src, likely rendering the pocket unable to bind myristate, consistent with the pocket having little evidence of allosteric activity in the dataset.
[0386] Seventeen surface pockets of Src have not been previously reported as allosteric in any kinases. A total of 10 of these 17 pockets are enriched for allosteric mutations in Src (Figure 8A, D). All 10 of these novel pockets are enriched for inhibitory allosteric mutations and 2 of them (Pl 5 and Pl 8) are also enriched for activating allosteric mutations. These novel allosteric pockets include P4, P16 and P25 located between the allosteric aC helix and the active site, Pl formed by residues in the aC-p4 loop, aE and P8, P18 and P21 in the surface of the N-lobe beta sheet, and P2, P15 and P5 located on both sides of aEF and the substrate positioning loop (see Figure 8D for examples). A summary of all Src surface pockets with their druggability scores and AAGa are provided in Table 1.
Table 1: Surface pocket residues
[0387] The comprehensive mutational effect data therefore serves to genetically prioritize which of the many potentially druggable surface pockets of Src should be the focus for inhibitory and activatory drug discovery. The allosterically active pockets in Src include highly druggable novel pockets not previously demonstrated as allosteric in any kinases (Figure 8D).
Modulation of the allosteric landscape by the Src regulatory domains
[0388] Src, like most kinases and eukaryotic proteins, is a multi-domain protein. In addition to the catalytic KD, Src contains two additional globular domains, SH2 and SH3, disordered linkers and the dynamic SH4 region. The non-catalytic domains of Src physically interact with the KD in its inactive conformation and inhibit activity. The abundance and activity selections for the same 54,455 Src variants was repeated in the context of the full length protein to investigate how the regulatory domains of Src affect allosteric communication in the catalytic domain.
[0389] The selections were highly reproducible (median r = 0.93 for both assays) and the fitness scores correlated well with independent in vivo activity (r = 0.89) and abundance (r = 0.80) measurements.
Fitting the same 3 -state folding and activation model to the full length Src data allowed quantitative comparison of the effects of all 5,111 single amino acid substitutions on stability and activation in the presence and absence of the Src regulatory domains. Overall the mutational effects on folding energies and active state energies correlate very well between the kinase domain and full length Src constructs (r = 0.85 and r = 0.86, respectively). However, activating mutations are more frequent in full-length Src than in the kinase domain alone, which is consistent with the regulatory domains having an overall inhibitory function. In particular, mutations in the inter-domain surfaces with the SH2 domain and the SH2-KD linker have stronger activating effects in the full-length kinase, consistent with these intra-molecular interactions inhibiting kinase activity. Mutations in the aF helix pocket, proposed to bind the SH4 region for additional inhibition of Src activity, also more strongly activate full-length Src. These differences are not driven by changes in the effects of mutations on abundance between full-length Src and the KD alone, as fitting the 3 -state folding and activation models exchanging the underlying abundance data does not affect the conclusions.
[0390] Mutations in the dynamic C-terminal tail of Src also differ in their effects between full-length Src and the KD alone. Mutations in Y530, the inhibitory phosphosite directly involved in the interaction with the SH2 domain, have stronger activating effects in full-length Src (AAAGa <-l, FDR<0. 1), consistent with a release of the inhibitory interaction. The adjacent E527, P528, and Q531 are similarly enriched for mutations with stronger activating effects in full-length Src. In contrast, Q529, P532, G533, and N535 are enriched for mutations with stronger inhibitory effects in full length Src (AAAGa >1, FDR<0. 1). Inhibitory mutations in the C-terminal tail’s interface with the SH2 domain include many changes to hydrophobic and aromatic residues, which may act by increasing the affinity of the tail for the Src SH2 domain.
Finally, spatial clustering of mutations with stronger activating effects in full-length Src (AAAGa < -I, FDR<0. 1) reveals an additional cluster of residues in the C-lobe. This cluster includes the SH2-KD interface in the outer surface of aE, along with the internal surface of aE, a second internal layer of residues in aF pointing towards aE, and the C-spine residues 1395 and V397. Mutations in these sites at a distance from the SH2 domain interface are thus more activating in full length Src, potentially allosterically relieving inhibition by the SH2 domain.
MATERIALS & METHODS
Plasmid construction
[0391] All the Src constructs for expression in yeast and plasmid sequences were verified by whole plasmid sequencing (Plasmidsaurus). Full length and KD Src sequences are provided in Table 2.
Table 2: Full length and KD sequences
[0392] A gene block of Src codon-optimized for yeast expression and flanked by Nhel-Hindlll restriction sites (IDT) was ordered. To assay in vivo soluble expression of Src, a modified version of the pGJJ045 abundancePCA plasmid (Addgene) comprising stop codons downstream of the Nhel-Hindlll cloning sites was used (pGJJ133). The KD fragment was cloned on pGJJ133, resulting in pTB109. To assay in vivo soluble expression of full-length Src, pTB043 was generated via Gibson assembly. pTB043 is based on the same backbone as the aPCA plasmids, and contains a construct where full length Src is fused to the DHFR3 fragment in its N-terminus, and to the DHFR1,2 fragment in its C-terminus. To assay activity-dependent toxicity of Src, pTB022, a plasmid based on the same backbone as the aPCA plasmids but containing a yeast GAL promoter to drive the expression of Nhel-Hindlll inserts not fused to any DHFR fragment or linker, was used. The KD fragment and Src gene block were cloned on pTB022, resulting in pTBl 12 and pTB023, respectively. Variant library design and cloning
[0393] To fully cover the Src kinase domain from E268 to L536, the library was divided in 5 overlapping blocks or tiles of ~60 aa to be cloned and selected separately. Within each tile, 10 genetic backgrounds with a wide range of effects on Src kinase activity based on a previous deep mutagenesis dataset were chosen (Table 3). The genetic background sequences are provided in Table 4.
Table 3: Library design
Table 4: Genetic background sequences
[0394] The library was ordered as two IDT oPools (Pool 1 with block 1 and Pool 2 with blocks 2-5), containing all NNK single mutants in each of the 10 backgrounds of each block. 2.5 ul 0. 1 uM oPool material was used as a template in a 100 ul Q5 high-fidelity 10 cycle PCR reaction. Primers specific to the constant regions of each block were used. The library was assembled on pTB 112 (KD) and on pTB023 (full-length). To do so, the plasmids were linearized with primers pointing outwards from the constant regions of each block so that each linearized vector had at least 20 nt of homology to the amplified oligo pool containing the variants. 120 ng of linearized plasmid were assembled with the amplified oligo pools in a 20 ul Gibson reaction. The reaction products were dialyzed and concentrated and transformed into NEB 10p High-efficiency Electrocompetent E. coli cells according to the manufacturer’s protocol. Cells were left to recover in SOC medium (NEB 10 Stable Outgrowth Medium) for 30 minutes, a 2 ul aliquot was plated to quantify the total number of clones, and the rest of the cell volume was transferred to 100 mb of LB medium with ampicillin overnight. 100 mb of each saturated E. coli culture were harvested next morning to extract the plasmid library using the Plasmid Plus Midi Kit (QIAGEN). The assemblies were verified using Sanger sequencing (Eurofins). The assembled mutated Src library constructs were transferred into the pGJJ133 and pTB198 plasmids.
Large-scale transformation and competition assays
[0395] Each library corresponding to each of the 5 blocks was transformed in triplicate, and with a coverage of ~100x or greater. Three 500 mb YPDA cultures of late log phase .S', cerevisiae BY4741 cells (OD-0.8-1) were harvested in 50 mb Falcon tubes, each resuspended in 22 mb SORB medium and incubated for 30 min on a shaker at room temperature. 437.5 ul 10 mg/mL previously boiled (5 min, 100C) ssDNA was added to the cells, and the mix was separated in 5 aliquots of 4.3 mb in 50 mb Falcon tubes, one for each library block. 3 ug library plasmid DNA was added to each aliquot, followed by 17.5 mb Plate Mixture, and the mix was incubated for 30 min on a shaker at room temperature. 1.75 mb DMSO were then added, and the cells were incubated at 42C for 20 min. Following incubation, cells were pelleted and resuspended in Recovery Media and incubated at 30C for Ih. Cells were then transferred into 100 mb SC -URA. 10 ul of this culture were immediately plated onto SC -URA selective plates to monitor transformation efficiency. The rest of the culture was incubated overnight at 30°C.
[0396] For the activity-dependent toxicity assays, the overnight SC -URA cultures were used to inoculate the next day a 100 mb culture of SC -URA with 2% raffinose and 0.1% glucose at an OD=0.2- 0.4, which was grown overnight. Cells from this culture were inoculated the next day in 100 mb SC - URA with 2% galactose and 0.1% glucose at an OD=0.05, to induce overexpression of the Src variant library. The remaining input cells grown in 2% raffinose 0.1% glucose were harvested and frozen for DNA extraction (inputs). The galactose cultures were left to grow overnight to an OD=1.6-2.5, corresponding approximately to 5 generations, harvested, and frozen for DNA extraction (outputs). [0397] For the aPCA and sandwichPCA assays, the overnight SC -URA cultures were used to inoculate the next day a 100 mL culture of SC -URA-ADE at an OD=0.2-0.4, which was grown overnight (input culture). Cells from this culture were inoculated in 100 mL SC -URA/ADE +200 ug/ml MTX to select stably expressed Src variants. The remaining input cells grown SC -URA/ADE were harvested and frozen for DNA extraction. The MTX cultures were left to grow overnight to an OD=1.6-2.5, corresponding approximately to 5 generations, harvested, and frozen for DNA extraction (outputs).
DNA extraction, plasmid quantification and sequencing library preparation
[0398] Total DNA was extracted from yeast pellets equivalent to 50 mL of cells at OD=1.6. Plasmid concentrations in the resulting samples were quantified by against a standard curve of known concentrations by qPCR, using primers that amplify in the origin of replication of both toxicity and aPCAa assay plasmids.
[0399] To generate the sequencing libraries, two rounds of PCR amplification were performed. In the first round, primers flanking the mutated regions in each sample were used (5 pairs of PCR1 primers, one for each of the Src blocks). This PCR1 reaction allows increasing the nucleotide complexity of the first sequenced bases by introducing frame-shift bases between the Illumina adapters and the sequencing region of interest. For block 5, a different reverse frameshifting pool was used for the sandwichPCA libraries as they differ in the region downstream of the STOP codon of Src. 125 million plasmid molecules were used as templates and were amplified for 8 cycles. The reactions were column-purified (QIAquick PCR purification kit, QIAGEN), and 40 ng DNA were used as template for a PCR2 reaction with the standard i5 and i7 primers to add the remainder of the Illumina adapter sequences and the demultiplexing indices (dual indexing) unique to each sample. This PCR2 was run for 8 cycles, and the resulting amplicons were run on a 2% agarose gel to quantify and pool the samples for joint purification, and to ensure the specificity of the amplification and check for any potential excess amplification problems. The final libraries were size selected by gel electrophoresis. The amplicons were subjected to Illumina paired end 2x150 sequencing on a NextSeq2000 instrument at the CRG Genomics facility.
Sequencing data processing and thermodynamic modelling
[0400] FastQ files from paired-end sequencing of all aPCA and toxicity experiments were processed with DiMSum vl .2. 11, as described above. In further detail, a minimum input read count threshold was set for 1 -nucleotide substitutions using the “fitnessMinlnputCountAny” option, in order to minimize the fraction of reads per variant related to sequencing error-induced “variant flow” from the wild type. The option “barcodeldentityPath” was used to specify a variants file in order to recover only the variants corresponding to the designed library (NNK mutations in one of the predefined genetic backgrounds). [0401] MoCHI was used to fit two global mechanistic models, one for the Src KD and one for full- length Src, using the corresponding 10 aPCA and toxicity assay datasets (2 molecular phenotypes x 5 blocks) simultaneously, as described above.
Identification of activating, inactivating, stabilizing and destabilizing mutations, and enrichments in secondary structure elements or functional regions
[0402] The mean weights (‘mean kcal/mol’) and standard deviations (‘sd kcal/mol’) from MoCHI fits were used for statistical testing to identify mutations with changes in stability or activity, using z-tests, where Z = ( ef. value - mean ) / sd . P-values were calculated on the basis of a normal distribution. Enrichments of particular mutation classes in individual sites, or in subgroups of residues based on structural or functional annotations were tested using Fisher’s exact test and comparing to the rest of the KD as background unless specified otherwise.
Quantification of the distance dependence of mutation effects
[0403] To quantify the dependence of mutation effects on the distance from the active site, the minimum distance between all atoms of each residue of Src and the active site were computed using the 2SRC structure. Distances to (1) the nucleotide (the non-hydrolyzable ATP analog AMP-PNP in 2SRC), and (2) the catalytic D388, and (3) the proposed phosphosite substrate positioning residue P428 were computed, and the minimum distance for each residue to these 3 reference points was taken. To fit an exponential curve to the data, the R stats package was used. First, the optim() function was used to select reasonable starting values, to then fit the y = a • e bx curve using the nls() function, where a is the estimate of | AAGa | at the active site (| AAGa | 0 ), b is the decay rate (k), and x is the minimum heavy atom residue distance to the active site. To estimate distance-corrected mutation effects, the msir package was used to fit a loess smoothing curve and the residuals to the fit across different secondary structure element types were quantified. To quantify allosteric decay rate variation across different spatial directions, the x,y,z directions as defined in the 2SRC pdb entry were used. To quantify decay in each direction, residues at a distance of 10 or less from the active site in the two remaining orthogonal directions were considered.
[0404] To compare the distance dependence of activating and inactivating mutations independently of their effect sizes, the set of mutations with positive AAGa to match the number and distribution of effects of those with negative AAGa were subsampled (n= 10,000 subsamples, with replacement), and exponential decays were fit to these. A p-value was calculated as the fraction of subsamples in which the decay is lower or equal than that of activating mutations.
[0405] To test for clustering of allosteric sites, the distribution of Ca-Ca pairwise distances between all allosteric sites in the Src KD was computed, and compared to a null distribution obtained from 1000 subsets of randomly sampled residues of equal number. A p-value was calculated as the fraction of subsets in which the median distance is lower or equal to the observed median distance, a similar approach but computing the distribution of minimum distances from each allosteric site to any other allosteric site was taken to test for connectivity.
Quantification of structural features and contacts
[0406] The locations of individual secondary structure elements and functional annotations were obtained from Roskoski, Biochem. Biophys. Res. Commun. 331, 1-14 (2005) and Gonfloni et al. EMBO J. 16, 7261-7271 (1997). Solvent accessible surface area were calculated using freeSASA v2.0.3 66 with parameters -n 20 — format=rsa — radii=naccess, and with 2SRC as a reference structure, both using the full length Src structure, and using the KD residues alone. Secondary structure type annotations (helix, beta sheet, turn, loop) from uniprot were used. To define active and inactive state contacts, getcontacts (https://getcontacts.github.io/) was used on representative structures of the inactive (2SRC) and active (1Y57) states. Prior to defining contacts hydrogen atoms were added to the structures using the pymol h add method. Then get static contacts.py with parameters — itypes all was used. For the following analyses only salt bridges, pi-cation interactions, side chain-side chain hydrogen bonds, and side chainbackbone hydrogen bonds were considered, as the rest of contact types did not display conformational state specificity. Contacts of the same type and between the same residues were collapsed into a single contact, and duplicated contacts annotated both as salt bridge and side chain-side chain hydrogen bond were collapsed as salt bridge. Four types of residues were defined based on their contact patterns: active residues engaged in contacts exclusively in the active state, inactive residues engaged in contacts only in the inactive state, non-specific residues engaged in the same contact in both structures, and swapping residues that engage in different contacts in the active and inactive conformations. To display the contacts in Src structures, a pseudobond representation in ChimeraX was used, with the thickness of the contact being proportional to the averaged mutation effect of the two contacting residues. Contacts between swapping residues were represented as dashed pseudobonds.
Mutational effect prediction
[0407] Linear models were fit using the base R lm() function, using as predictors the wt and mutant aa, secondary structure type in which the mutation is located, the specific secondary structure element in which the mutation is located, log distance to D389 (catalytic site), log distance to the nucleotide (AMP- PNP in 2SRC), rSASA, contact type, and the residue type classification (active, inactive, swapping, both, or none) according to their contact patterns as described above. Models were evaluated on held out data using a 10-fold cross validation strategy.
Analysis of Src surface pockets
[0408] The Kinase Atlas was used to retrieve all possible Src surface druggable pockets (Yueh et al., J. Med. Chem. 2019, 62, 14, 6512-6524). The docking analyses of all 15 available Src structures was used, and each potential Src surface pocket was defined as the set of residues located at a minimum distance of 5 A from a cluster of docked molecules, resulting in a total of 384 pockets distributed across the 15 structures. After filtering out pockets with a druggability score <5, the remaining 254 Pockets were collpased into a final set of unique pockets, as many are present in multiple structures. To do so, a pairwise distance matrix between all pockets was calculated, using as a distance metric 1 minus the Szymkiewicz-Simpson overlap coefficient. A hierarchical clustering was applied to the distance matrix, resulting in 28 unique pockets. As each of these 28 unique pockets is a set of pockets present across several structures, the total number of structures in which each is found was summarised, and the average and maximum druggability across all structures was calculated. For each pocket in each structure, the AAGa per residue was averaged, and the mean residue-averaged AAGa (mean AAGa) calculated, maximum residue-averaged AAGa (max AAGa), and minimum AAGa (min AAGa). The odds ratio of enrichment of each pocket in activating and inactivating mutations relative to the rest of the KD was also calculated, and its statistical significance tested using Fisher’s exact test.
Quantification of changes in mutational effects by regulatory domains
[0409] To quantify the changes in AAGa that result from the presence of Src regulatory domains, a linear model was fit to the AAGa dataset in full length Src against AAGa in the kinase domain alone, and used the residuals to the fit as an estimator of AAAGa (full length minus kinase domain). To identify mutations with significant AAAGa , z-scores were calculated incorporating the errors of both full-length (sdFL) and kinase domain alone (sdKD) AAGa, as Z = (residual / (sdFL + sdKD / 2)). Interdomain interfaces were defined as residues involved in direct contacts between the kinase domain and the regulatory domains (SH3, SH2, linker) using getcontacts, and excluding water bridges. To identify spatial clusters, sites with at least 2 mutations with AAAGa < -1 and FDR<0. 1 were selected and hierarchical clustering was applied to a matrix with their pairwise Ca-Ca distances.

Claims

WHAT IS CLAIMED IS:
1. A computer-implemented method of training a machine learning model, the method comprising: obtaining training data specifying, for a wild type variant of a target enzyme and each of a plurality of mutant variants of the target enzyme, each mutant variant having a different set of one or more mutations, an activity measure for the respective variant and a folding measure for the respective variant; based on the training data, training model parameters of a machine learning model to output, from input data specifying the set of one or more mutations in a given variant of the target enzyme, a predicted activity measure and a predicted folding measure for the given variant.
2. The computer-implemented method of claim 1, wherein the input data comprises a set of input elements corresponding to a given site of the given variant of the target enzyme, each input element specifying whether or not a specific mutation is present at the given site.
3. The computer-implemented method of any preceding claim, wherein training the machine learning model comprises fitting a thermodynamic model of the target enzyme to the training data.
4. The computer-implemented method of claim 3, wherein the thermodynamic model comprises a three-state model of the target enzyme with unfolded, folded inactive, and folded active states.
5. The computer-implemented method of any preceding claim, wherein: the model parameters comprise a first set of weights and a second set of weights; the predicted activity measure depends on both the first set of weights and the second set of weights; and the predicted folding measure depends on the second set of weights but is independent of the first set of weights.
6. The computer-implemented method of any preceding claim, wherein the machine learning model comprises a neural network.
7. The computer-implemented method of claim 6, wherein: a first neuron of the neural network generates a first neuron output value by processing the input data using a first set of weights; and a second neuron of the neural network generates a second neuron output value by processing the input data using a second set of weights.
8. The computer-implemented method of claim 7, wherein the predicted activity measure depends on both the first neuron output value and the second neuron output value.
9. The computer-implemented method of any of claims 7 and 8, wherein the predicted folding measure depends on the second neuron output value and is independent of the first neuron output value.
10. A computer-implemented method of identifying one or more target sites of a target enzyme, the method comprising: obtaining model parameters from a machine learning model trained in accordance with the method of any of claims 1 to 9; based on the model parameters, identifying the one or more target sites of the target enzyme.
11. The method of claim 10, wherein the target sites are allosteric sites of the target enzyme.
12. A computer-implemented method of identifying a mutated variant of interest of an enzyme, the method comprising: for each of a plurality of mutated variants of a target enzyme, providing an input specifying mutations in the respective mutated variant to a machine learning model trained to output a predicted activity measure and a predicted folding measure for the mutated variant; receiving from the machine learning model a predicted activity measure and a predicted folding measure for each of the plurality of mutated variants; and based on the predicted activity measures and the predicted folding measures, selecting from the plurality of mutated variants at least one mutated variant of interest.
13. A method of generating training data for training a machine learning model, the method comprising: performing wet lab experiments to obtain data for deriving an activity measure and a folding measure for a wild type variant of a target enzyme and each of a plurality of mutant variants of the target enzyme.
14. The method of any preceding claim, wherein the target enzyme is a protein kinase.
15. The method of claim 14, wherein the protein kinase is Src kinase.
PCT/GB2025/050815 2024-04-19 2025-04-16 Identifying allosteric sites in enzymes Pending WO2025219709A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP24382422 2024-04-19
EP24382422.4 2024-04-19
GB2407633.3 2024-05-29
GBGB2407633.3A GB202407633D0 (en) 2024-04-19 2024-05-29 Identifying allosteric sites in enzymes

Publications (1)

Publication Number Publication Date
WO2025219709A1 true WO2025219709A1 (en) 2025-10-23

Family

ID=95517235

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2025/050815 Pending WO2025219709A1 (en) 2024-04-19 2025-04-16 Identifying allosteric sites in enzymes

Country Status (1)

Country Link
WO (1) WO2025219709A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023025263A1 (en) * 2021-08-25 2023-03-02 Ensem Therapeutics Holding (Singapore) Pte. Ltd. Systems and methods for post-translational modification-inspired drug design and screening

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023025263A1 (en) * 2021-08-25 2023-03-02 Ensem Therapeutics Holding (Singapore) Pte. Ltd. Systems and methods for post-translational modification-inspired drug design and screening

Non-Patent Citations (13)

* Cited by examiner, † Cited by third party
Title
AHLER ET AL., MOL. CELL, vol. 74, 2019, pages 393 - 408
AHLER ETHAN ET AL: "A Combined Approach Reveals a Regulatory Mechanism Coupling Src's Kinase Activity, Localization, and Phosphotransferase-Independent Functions", MOLECULAR CELL, vol. 74, no. 2, 18 April 2019 (2019-04-18), pages 393, XP085663771, ISSN: 1097-2765, [retrieved on 20250708], DOI: 10.1016/J.MOLCEL.2019.02.003 *
FAURE ANDRE J ET AL: "Mapping the energetic and allosteric landscapes of protein binding domains", NATURE,, vol. 604, no. 7904, 6 April 2022 (2022-04-06), pages 175 - 183, XP037798491, [retrieved on 20220406], DOI: 10.1038/S41586-022-04586-4 *
FAURE ANDRE J. ET AL: "Suppl.Methods: Mapping the energetic and allosteric landscapes of protein binding domains", NATURE, 6 April 2022 (2022-04-06), pages 1 - 15, XP093294022, Retrieved from the Internet <URL:https://www.nature.com/articles/s41586-022-04586-4#MOESM1> [retrieved on 20250708], DOI: s41586-022-04586-4 *
FAURE ET AL., GENOME BIOLOGY, vol. 21, 2020, pages 207, Retrieved from the Internet <URL:https:lldoi.org/10.1186/sl3059-020-02091-3>
FAURELEHNER, BIORXIV, 2024, Retrieved from the Internet <URL:https://doi.org/10.1101/2024.01.21.575681>
FODA ET AL., NAT. COMMUN., vol. 6, 2015, pages 5939
GONFLONI ET AL., EMBO J., vol. 16, 1997, pages 7261 - 7271
MARKIN ET AL., SCIENCE, vol. 373, no. 6553, 2021
ROSKOSKI, BIOCHEM. BIOPHYS. RES. COMMUN., vol. 331, 2005, pages 1 - 14
SCHEELE ET AL., NAT COMMUN, vol. 13, no. 844, 2022
VANELLA ET AL., NAT COMMUN, vol. 15, no. 1807, 2024
YUEH ET AL., J. MED. CHEM., vol. 62, no. 14, 2019, pages 6512 - 6524

Similar Documents

Publication Publication Date Title
Angus-Hill et al. Crystal structure of the histone acetyltransferase Hpa2: A tetrameric member of the Gcn5-related N-acetyltransferase superfamily
Schlessinger et al. Comparison of human solute carriers
Christians et al. Directed evolution of thymidine kinase for AZT phosphorylation using DNA family shuffling
Prisic et al. Extensive phosphorylation with overlapping specificity by Mycobacterium tuberculosis serine/threonine protein kinases
Chowdhury et al. Ribosomal oxygenases are structurally conserved from prokaryotes to humans
Zhang et al. Crystal structure of Thermus aquaticus core RNA polymerase at 3.3 Å resolution
Szklarczyk et al. Iterative orthology prediction uncovers new mitochondrial proteins and identifies C12orf62 as the human ortholog of COX14, a protein involved in the assembly of cytochrome c oxidase
Glasner et al. Evolution of structure and function in the o-succinylbenzoate synthase/N-acylamino acid racemase family of the enolase superfamily
Patikoglou et al. Crystal structure of the Escherichia coli regulator of σ70, Rsd, in complex with σ70 domain 4
Lochowska et al. Identification of activating region (AR) of Escherichia coli LysR‐type transcription factor CysB and CysB contact site on RNA polymerase alpha subunit at the cysP promoter
Jiang et al. Post-transcriptional modifications modulate rRNA structure and ligand interactions
Beltran et al. The allosteric landscape of the Src kinase
Sousa et al. AlloRep: a repository of sequence, structural and mutagenesis data for the LacI/GalR transcription regulators
Zhou et al. CIRI‐deep enables single‐cell and spatial transcriptomic analysis of circular RNAs with deep learning
Ofengand et al. Modified nucleosides of Escherichia coli ribosomal RNA
Gardino et al. The NMR solution structure of BeF3−-activated Spo0F reveals the conformational switch in a phosphorelay system
Cole et al. New models of Tetrahymena telomerase RNA from experimentally derived constraints and modeling
Sreenivasan et al. The intrinsically disordered transcriptional activation domain of CIITA is functionally tuneable by single substitutions: An exception or a new paradigm?
Jeffery Moonlighting proteins: complications and implications for proteomics research
Nannemann et al. Design and directed evolution of a dideoxy purine nucleoside phosphorylase
WO2025219709A1 (en) Identifying allosteric sites in enzymes
WO2025219721A2 (en) Kinase
Hible et al. Unique GMP‐binding site in Mycobacterium tuberculosis guanosine monophosphate kinase
Xu et al. A bacterial transcription activator dedicated to the expression of the enzyme catalyzing the first committed step in fatty acid biosynthesis
Lin et al. Mechanistic insight into the pseudouridylation of RNA

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25721038

Country of ref document: EP

Kind code of ref document: A1