[go: up one dir, main page]

HK1061911B - Method of operating a computer system to perform a discrete substructural analysis - Google Patents

Method of operating a computer system to perform a discrete substructural analysis Download PDF

Info

Publication number
HK1061911B
HK1061911B HK04104959.7A HK04104959A HK1061911B HK 1061911 B HK1061911 B HK 1061911B HK 04104959 A HK04104959 A HK 04104959A HK 1061911 B HK1061911 B HK 1061911B
Authority
HK
Hong Kong
Prior art keywords
chemical
compounds
molecules
subset
score
Prior art date
Application number
HK04104959.7A
Other languages
Chinese (zh)
Other versions
HK1061911A1 (en
Inventor
D.彻齐
J.科林格
Original Assignee
Laboratoires Serono S.A.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Laboratoires Serono S.A. filed Critical Laboratoires Serono S.A.
Priority claimed from PCT/EP2001/011955 external-priority patent/WO2002033596A2/en
Publication of HK1061911A1 publication Critical patent/HK1061911A1/en
Publication of HK1061911B publication Critical patent/HK1061911B/en

Links

Description

Method of operating a computer system for performing discrete substructure analysis
Technical Field
The present invention relates to a computer system capable of performing discrete substructure analysis (discrete substructure analysis) and a method of operating the same. Such analysis can be performed in silico to identify molecules having certain properties, such as biological and/or chemical activity. Computer-controlled analysis of discrete substructures can be used in drug discovery or other fields where identification of biologically, pharmacologically, toxicologically, pesticidally, herbicidally, catalytically, etc. active compounds is desired.
Background
Developments in fields such as medicinal chemistry depend on the recognition of biologically active molecules. Many times, research projects are directed to the synthesis of small organic molecules that interact with known enzymes or target receptors to produce the desired pharmacological effects. These compounds may at least partially mimic or inhibit the activity of known naturally occurring substances, but they are intended to provide a more effective and/or selective effect. The compounds resulting from such studies may contain certain structural features of the naturally occurring substances of interest.
Since the source of the screen is found in nature, such as a soil sample or plant extract, the research project may also be based on the naturally occurring substances that have been discovered. The active compounds found in this way can be used as leads (leads) for synthetic chemical projects.
In recent years, the identification of useful novel bioactive molecules has become more and more urgent, and methods for preparing lead compounds have been developed. In this regard, two developments are of particular importance, namely combinatorial chemistry and High Throughput Screening (HTS).
Combinatorial chemistry employs robotic or manual techniques to perform multiple small chemical reactions, each using a different combination of reagents "simultaneously" or "in parallel", resulting in a large number of diverse chemicals for screening. The collection of compounds produced by this method is called a "library". Libraries for generating new chemical leads are generally as diverse as possible. However, in some cases, libraries may be biased or targeted to a particular pharmacological target, or focused on a particular chemical domain, by selecting agents that will introduce specific structural features into the final compound.
High throughput screening involves rapidly testing the activity of a large number of compounds on one or more biological targets in vitro using biochemical assays. This method is suitable for screening large libraries of compounds produced by combinatorial chemistry.
Despite the undoubted advantages of combinatorial chemistry and high throughput screening in creating new lead structures, these methods still suffer from several drawbacks. In unbiased combinatorial libraries, most compounds have no useful activity. The discovery of useful leads depends on contingency and/or the number of compounds tested. The active compound may be present in a larger number in the target library, but depending on the selection criteria it may not even provide the optimal compound. In addition, both techniques require considerable resources and conduct extensive experimentation.
The likelihood or probability of finding an active molecule within a given set of compounds may increase with the total number of test compounds (i.e., the size of the set) or with the proportion of active compounds in the same set. It can be seen that increasing the proportion of active compounds in the compound set is more effective in increasing the probability of finding an active molecule than increasing the total number of test compounds alone. The former method reduces the number of compounds that need to be prepared and tested, and therefore advantageously saves resources required for example for the discovery of biologically active molecules.
Richard d.cramer iii et al (j.med.chem., 17(1974), pages 553 to 535) disclose a method of substructure analysis as a solution to the problem of drug design. The article states that the biological activity of a molecule or any of its properties must be explained from the combined contributions of its structural components (substructures) and its extra-and intermolecular interactions. The possible contribution of a given substructure to activity can be obtained from data on previously tested compounds containing that substructure. The first step is to prepare a substructure "empirical table" that gathers the available data. The "substructure activity frequency" (SAF) of each substructure is defined as the ratio of the number of active compounds containing that substructure to the number of test compounds containing that substructure. The frequency of substructure activity is believed to indicate the possible contribution of the substructure to the probability of a compound being active. Then, the SAF arithmetic mean of the substructures of each compound within the compound was calculated.
Although the prior art can be aligned with the SAF average of the compound, obtaining such a value requires calculating the SAF arithmetic average for each substructure present in the compound. Moreover, the SAF values required for this calculation are the result of the above calculations, involving an evaluation of each substructure of each test molecule. Thus, this approach requires a significant computational expense, which makes the technique unsuitable for existing large datasets that can be employed and used as a source of information for performing molecular structure analysis. However, the Cramer method does not actually estimate the true contribution of a substructure to activity.
Therefore, there are many other techniques in the field of chemical structure analysis.
EP 938055A describes a quantitative structure activity relationship derivation method based on data from high throughput screening by identifying structural features that confer "activity" to a compound. This method is used to build statistical models of biologically active compounds by first associating various chemical descriptors (chemical descriptors) with a given set of compounds and then using a small set of known biologically active compounds to predict whether a new compound is biologically active based on the model.
Sheridan and Kearsley (J.chem.Inf.Compout.Sci., 35(1995), p. 310-320) describe the use of genetic algorithms to select subsets of fragments for use in constructing a combinatorial library. Such methods include generating molecular clusters from molecular fragment subsets and calculating the score for each molecule based on a particular descriptor (e.g., atom pair or topological twist) using similarity probes or trending carrier methods. Other populations of molecules may also be generated using genetic algorithms and calculated. The results provide a list of fragments that appear within the maximum scoring molecule and can be used as a basis for constructing combinatorial libraries.
WO 99/26901A1 discloses a method for designing chemical substances such as molecules. A compound consists of a backbone and a plurality of sites. The method first selects candidate elements of the site and builds a predicted design array PAD. One example of PAD consists of a number of virtual compounds that satisfy certain combination conditions. These compounds were then synthesized and tested for biological activity. Next, an operation is performed to predict the overall biological activity of those compounds that have not been synthesized. To this end, property contribution values of the candidate elements are calculated, which represent the respective contributions of the various elements to the activity. In addition, the average contribution of each substituent at a particular site to biological activity was calculated. This document gives an example of how to calculate such a contribution.
The application of QSAR (quantitative structure-activity relationship) technology to solve the problem of drug discovery is described in a paper by H.Gao et al (J.chem.Inf.Compout.Sci. (39)1999, 164-168). After selecting compounds having biological activity, their biological activity is optimized. Since QSAR is based on a hypothetical relationship between biological activity and molecular structure, this technique is relevant to identify structural features that make compounds active and to predict homologs that are active or inactive.
WO 00/41060a1 discloses a method of relating the activity of a substance to the structural characteristics of the substance. The term "feature" refers to atoms and bonds of a structure that match a pattern. The first step is to identify the atoms of a set of materials that meet given structural features and property constraints. Then, a substance belonging to the class is assigned to each class of activity. After classifying the material set by activity classes, the predicted activity of any subset is calculated, and for each structural feature of the material set, an activity-property-feature bit vector (bitvector) set is constructed that specifies that a number of materials contain the feature and belong to the activity class. This document relates to biological activity and is also relevant to drug discovery.
US 6,185,506B 1 discloses a method of selecting an optimally diverse pool of small molecules based on efficient molecular structure descriptors. Multiple literature datasets containing a wide variety of chemical structures and associated activities are used. The activity may be biological and chemical. This technique is described in the section on pharmacological drugs. In addition, this patent discloses a method of selecting a subset of product molecules for all possible product molecules that may be formed from a particular reactant molecule and a common core molecule in a combinatorial synthesis. Biospecific libraries are mentioned in the background section and their design is based on the knowledge of the geometric arrangement of structural fragments taken from molecular structures known to be active. This patent discloses that it is absolutely necessary to use a reasonably designed smaller screening library that still retains the compound diversity readily accessible on the combination.
WO 00/49539A 1 discloses a screening method for subsets to identify sets of molecular features that may be associated with specific activities. The term feature relates to a chemical substructure. Grouping the sets of molecules according to the molecular structure and by taking the descriptor set as a characteristic. The groups with high activity levels are then identified and the most common substructures in each group of molecules are sought, which may be correlated with the observed activity levels. A data set is created that represents those molecules from the original data set that include the general feature subset. This technique takes the form of a computer-based system that automatically analyzes a data set.
US 5,463,564 discloses a computer-based method for automated generation of compounds by robotic synthesis and analysis of a variety of compounds. This process is repeated in order to produce a chemical substance with defined active properties. A diverse chemical library comprising a variety of compounds was synthesized. Structure-activity data were obtained by robotic analysis of the synthesized compounds. This patent discloses a number of databases, each database including a representation of a set of information (field) representing an evaluation factor (ratingfactor) specified by the respective compound. The evaluation coefficient was given for each compound according to the degree of coincidence of the activity of the compound with the desired activity.
The above-described methods are either "predictive" models or still do not sufficiently improve the production of active leads and do not increase the probability of finding an active compound within a given set of compounds. In addition, these conventional techniques do not satisfy the need for an increased number of molecular successors (hits and leads) into the development system and an improved quality.
Disclosure of Invention
It is therefore an object of the present invention to provide a method of operating a computer system and a corresponding computer system, which increases the possibilities of finding new molecules with biological and/or chemical activity.
This object is achieved by the independent claims of the invention.
Preferred embodiments are defined in the dependent claims.
It is an advantage of the present invention to provide a computer system and method of operation that can increase the proportion of active compounds in a given chemical set, without the materials in the set being known to have the desired activity. This is accomplished by using knowledge-based techniques for identifying new drug families (novel hits and leadseries), particularly by establishing a processing system for molecular discovery by computers.
Another advantage of the present invention is that expensive experiments are avoided by analyzing databases that can be searched for molecular structures and biological and/or chemical properties. Therefore, the discovery method of the invention is reasonable, and the discovery method of the invention can lower the cost of drug discovery.
Yet another advantage of the present invention is that the method is found to be faster, allowing identification of molecules with certain desired properties in a shorter time than prior art methods.
In addition, the present invention is particularly advantageous in the field of biochemistry. Previously, DNA sequencing, and in particular genomic sequencing, provided a comprehensive database of amino acid sequences that could be used as a starting point in practicing the present invention. The invention can then be used to identify known and/or orphan ligands and/or orphan ligand-receptor pairs by predicting the peptide sequence based on the results obtained from the structural table used as a chemical determinant (chemical determinant) analysis of biological activity. After the database is identified and expressed, the peptide sequence can be tested by biochemical assays. It is therefore an advantage of the present invention that by comparison with a table of chemical molecules that have been determined to be active against certain targets, biological structures can be inferred, thereby providing a technique for identification (reverse ordering).
The present invention will be described in more detail below with reference to the accompanying drawings.
Drawings
FIG. 1 is a block diagram of a computer system in accordance with a preferred embodiment of the present invention.
Fig. 2 is a flow chart illustrating the main process of performing discrete structure analysis according to a preferred embodiment of the present invention.
Fig. 3 is a schematic diagram of the iterative process of the present invention.
FIG. 4 is a flow chart illustrating a process of forming a fragment library according to a preferred embodiment of the present invention.
FIG. 5 is a diagram illustrating how segments are selected based on calculated scores.
FIG. 6 is a flowchart of a process for calculating a segment score according to a preferred embodiment of the present invention.
FIG. 7 is a flow diagram illustrating a process for iteratively analyzing a fragment library.
FIG. 8 is a flow chart showing the process of selecting new compounds in a generic substructures (genetic substructures).
FIG. 9 is a flow chart illustrating the process of generating a sub-structure for virtual screening.
FIG. 10 is a flow chart illustrating a process of analyzing a library of fragments using an annealing technique according to a preferred embodiment of the present invention as iterations are performed.
FIG. 11 shows an example of a relative contribution map that represents the annealing technique used in the process of FIG. 10.
FIG. 12 is a graph showing the effect of a compound on receptor-mediated production of inositol triphosphate.
FIG. 13 is a graph showing the effect of a compound on kinase-dependent protein phosphorylation.
FIG. 14 is a graph showing the effect of a compound on phosphatase-dependent protein dephosphorylation.
FIG. 15 is a graph showing information of relative contributions by plotting relationships of determinants to their respective scores.
Figures 16A-H show other relative contribution plots showing equivalence of scoring functions.
Now, the present invention will be described in more detail. Preferred embodiments of the present invention will also be discussed in conjunction with the attached figures. Furthermore, a number of examples are given to illustrate how the invention can be applied to various areas of compound discovery.
Detailed Description
According to the present invention, a computer system is operated to perform discrete substructure analysis. Accessing a molecular structure database. The database is searched for molecular information and biological and/or chemical properties. Molecular structure information is any information suitable for determining the molecular structure of a molecule. The biological and/or chemical properties include biochemical, pharmacological, toxicological, pesticidal, herbicidal, and catalytic properties.
The techniques of the present invention utilize a database to identify a subset of molecules having a given biological and/or chemical property. The molecular fragments within the subset are then determined. The term "fragment" refers to any one of the structural subunits (subbunit) of a molecule, including any combination of simple functional groups, two-dimensional substructures and families thereof, simple atoms or bonds, and structural descriptors in two-or three-dimensional molecular space. It will be appreciated by those skilled in the art that fragments may be molecular substructures which have no known meaning in conventional chemistry.
After the molecular structures within the subset are broken into fragments, a score is calculated for each fragment, which represents the contribution of the individual fragment to a given biological and/or chemical property. That is, the present invention can determine a score for a fragment based on prior knowledge of the biological and/or chemical properties of the molecule. A molecule, structure or substructure is said to be "active" hereinafter if it has the given properties. An inactive molecule, structure or substructure is said to be "inactive". The present invention therefore provides a substructure analysis based on discrete biological and/or chemical property information. The main process of the present invention is hereinafter referred to as Discrete Substructure Analysis (DSA).
According to the present invention, a fragment may be considered a chemodeterminant determining a given biological and/or chemical result, since the fragment is associated with a score representing a contribution to a given biological and/or chemical property. The identification of fragments follows a set of logical rules (algorithms), which are inherent in the DSA process itself. In this context, the score itself is a function of the following parameters:
(a) the ratio of the chemical determinants in the subset of active molecules (prediction), and
(b) the ratio of the same determinant in all compounds under consideration.
Based on this definition, the method then identifies one or more local extrema (localextrema) of the scoring function whose corresponding chemical determinant represents a chemical full or local solution to the desired biological outcome. Finding the maximum possible value that the scoring function can achieve in any given data set corresponds to identifying the chemical determinants contained in the subset of molecules with the strongest biological activity, with the lowest probability of occurring by chance in the same subset.
The invention will now be described with reference to the accompanying drawings, and in particular with reference to figure 1. FIG. 1 shows a preferred embodiment of a computer system of the present invention. The computer system includes a central processor 100 that is controllable by a user interface device 105. The devices 100 and 105 may be any computer system, such as a workstation or personal computer. The computer system is preferably a multiprocessor system running a multitasking operating system.
The central processor 100 is connected to a program memory 130, which stores executable program code comprising instructions for implementing the DSA process of the present invention. These instructions include a splitting function 135 to split the molecular structure into fragments, a scoring function 140 to calculate scores, a generalizing function 145 (e.g., search for isomers) to find general items within the fragment structure and replace them with general expressions to produce general substructures, a virtual screening function 150 to perform virtual screening, and an annealing function 155 to perform the fragment annealing process of the present invention. The details of the functions and the processor used to execute the functions by the cpu 100 will be described in detail below.
The central processor 100 is also connected to a structural activity database, or compound activity table 115, to receive molecular structural information and biological and/or chemical property information. Such information may be received from a data input unit 110 that may access an external data source as well.
The subset of molecular structures may be retrieved, such as from any available source, e.g., a private or public database retrieved in terms of substructures and/or biological properties, by the access devices 110 and/or 115. Common databases include, but are not limited to, those having the following names: MDDR, Pharmaprojects, Merck Index, SciFinder, Derwent. Subsets of molecules can also be obtained by synthesizing or testing compounds. Molecules typically include all compounds, but they may themselves be fragments of molecules. For any given biological or chemical property, the subset includes compounds that do not have that property, e.g., are not active (or are below a given threshold of activity), as well as compounds that have that property, e.g., are having a desired activity (or are above a given threshold). All inactive compounds were related and therefore analyzed.
After accessing internal or external data and performing a DSA process using functions stored in the program memory 130, the central processor 100 stores a fragment library 120 containing the determined molecular fragments and associated scores.
In a preferred embodiment of the present invention, the fragment library 120 is the result of the primary method of the present invention. The library of fragments 120 can then be used as a source of valuable information by, for example, a chemical or biological scientist or engineer in any subsequent discovery process.
In another preferred embodiment, the fragment library 120 is an intermediate result of the main method of the present invention, and thus can be stored in volatile and non-volatile memory. The fragment library 120 of the present embodiment can be read by the central processing unit 100 when executing other functions stored in the program memory 130 to generate the compound set 125.
The set of compounds 125 is a subset discovered by the methods of the present invention, and may or may not have desired biological and/or chemical properties. The molecules of the set of compounds 125 may be of known structure or may be of hypothetical structure that has never been synthesized before. In either case, the molecules of compound set 125 are the result of assessing the score given to the fragment based on a discrete substructure analysis.
As can be seen in FIG. 1, central processor 100 is also connected to data store 160, which stores compound set 165, fragment set 170, and score 175. The data store 160 is provided to store data, input parameters when the functions 135 and 155 are called, or return values of these functions.
Referring now to fig. 2, which illustrates a preferred embodiment of the DSA main process, an operator of the computer system shown in fig. 1 first selects an activity in step 210. As stated above, activity refers to any biological and/or chemical property, including biochemical, pharmacological, toxicological, insecticidal, herbicidal, catalytic properties. Furthermore, when the present invention is used to identify a lone ligand, the activity may be a given effect (usually binding) on the protein of interest.
Unless otherwise indicated, the present specification is described with reference to a particular property, such as biological activity, but may be extended to other types of biological and/or chemical properties. In addition, to avoid confusion, the terms "compound", "molecule" and "molecular structure" are intended to include both molecular substructures and complete compounds (complex compounds) described herein.
After selecting activity at step 210, a set of compounds 125 is selected at step 220. The selected set of compounds is the set of molecules to be examined to determine which fragments contribute to the selected activity. The set of compounds selected in step 220 includes molecules known to be active and molecules known to be inactive, as described in more detail below.
Once the set of activities and compounds is selected, a library 120 of fragments may be formed at step 230. The process of creating a library of fragments can be viewed as a process of balancing the efficacy of molecular fragments within a subset of known structures for chemical and/or biological consequences. This process comprises the following steps:
I. identifying one or more subsets of molecules having a given property associated with a relevant chemical and/or biological outcome;
forming a primary library comprising molecular fragments within said one or more subsets of molecules;
estimating the contribution of said fragments to the relevant chemical and/or biological outcome using an algorithm; and
calculating scores for each of said segments using said algorithm, arranging the scores by order of magnitude; thus, those segments that are most likely to contribute to the relevant chemical and/or biological outcome are associated with, for example, the highest ranked score.
As described above, the segment library 120 contains the segments and the resulting segment scores. Once the fragment library 120 is formed at step 230, an iterative process may or may not be performed at step 240.
Implementing the DSA process in iterative fashion can efficiently use computing resources. For example, the process preferably starts with a small segment. Since the number of possible fragments in the molecular structure increases approximately exponentially with the maximum size of the fragments to be examined, this maximum size value is initially set to be relatively low in order to be able to process as many molecular structures as possible.
The process of steps 210 to 230 is to find fragments that contribute significantly to the desired activity. The fragments found are then used in the next round (or cycle) to find fragments of larger size, i.e., of larger molecular weight. Fig. 3 shows an example of an iterative process. The first round found that fragment C ═ O contributed significantly to the desired activity. And searching for a segment which is larger than the segment obtained in the first round and comprises the segment. In the example shown in fig. 3, the second round showed that fragment N-C ═ O was the optimal fragment for this size versus the desired activity. This iterative process is then continued with increasing size of the fragments, which results in a compound that may have the desired biological and/or chemical properties and that is suitable for the desired application.
Referring again now to FIG. 2, if it is decided to proceed to the next round or loop at step 240, the fragment library 120 formed at step 230 is analyzed at step 250 and the process returns to step 220. An example of how the fragment library 120 may be analyzed at step 250 is described in detail below. It should be noted that the iterative process may apply higher-level functions such as the inductive function 145 and the annealing function 155 to further improve the discovery method with discrete substructure analysis.
Finally, when decision step 240 is not repeated, or the iterative process is completed, compound set 125 is formed at step 260.
Turning now to step 230 of forming the fragment library 120, a preferred embodiment of the substeps of the forming process is described in connection with FIGS. 4 to 6. First, having access to the internal database 115 and/or external data sources, and identified the subset of molecules, structural activity data associated with the identified molecules is received at step 410. Then, the molecular fragments within the subset are determined at step 420.
There are many conventional techniques for splitting molecules. For example, an algorithm may find any arrangement of atoms that are bonded to each other. The splitting function 135 may select the smallest size and the largest size segments. Another example is given of instructing the splitting algorithm to skip those segments whose atoms are linearly arranged. Further, the algorithm may be further defined to include or exclude certain types of keys. There are many different types of splitting functions that can be used by those skilled in the art.
That is, conceptually each molecular structure can be broken down into a series of discrete substructures or fragments (step 420). These fragments may be simple functional groups, e.g. NO2、COOH、CHO、CONH2(ii) a A precisely two-dimensional substructure, e.g., o-nitrophenol; defining a relaxed sub-structural family, such as R-OH; a common atom or bond, or any combination of structural descriptors in two-or three-dimensional chemical space.
After the molecules are split into segments at step 420, the score for each segment is calculated at step 430 and the calculated value is associated with the segment to calculate the score for the segment. The highest scoring segment is then determined at step 440 and stored at step 450.
FIG. 5 shows an example of how the highest scoring segments may be determined. In this example, the number of compounds comprising each fragment is plotted against the determined score. A dot on the graph represents a segment. More information is obtained with this curve at step 440 than simply selecting the highest scoring segment by comparing scores, because the curve also uses information about the number of compounds comprising each segment.
The process of finding the maximum possible score can be considered to correspond to the formation of a hierarchically related evolutionary network (phylogenic mesh) of molecular fragments corresponding to a given biological and/or chemical activity. In this setting, the mesh nodes are provided by the segments themselves, and the distance of the corresponding node from the origin, i.e., the basic length of the mesh itself, makes it possible for any single segment to be the basis for biological activity. The greater the score of any given fragment, the further the corresponding node is from the grid origin, and the more likely it is that the fragment represents a chemical solution to, for example, a pharmacophore (pharmacophore) that is recognized by the relevant target.
The step 430 of calculating the segment score is now described in detail in connection with FIG. 6. The use of scoring function 140 should conform to the logic rule set or calculation steps described above. In a preferred embodiment, the DSA method of the invention comprises the step of adding variables relating to the ratio (precision) occupied by each fragment to one or more mathematical functions that estimate the score of each given fragment.
The algorithm is a function of the following variables:
(a) the number of molecules x within the subset, which meet a given threshold associated with the desired result and which contain a given segment;
(b) the number of molecules y within said subset that contain said segment, regardless of whether they meet said threshold;
(c) the number of molecules z within said subset, which meet said threshold, irrespective of whether they contain said fragments or not;
(d) the total number of molecules N within the subset.
The result obtained from (a) may be any desired parameter related to the activity of the compound, including but not limited to biological, biochemical, pharmacological and/or toxicological activity. Each compound or molecule within the data set is re-analyzed according to whether it has a desired parameter, such as a particular activity level, associated with a given threshold. The threshold may be set to any desired level. As used herein, an "active" compound is a compound that meets a desired threshold, while an "inactive" compound is a compound that does not meet a desired threshold. These terms do not represent any absolute properties of the compound.
The variables x, y, z and N are substituted by either a measure of association or a scoring function 140 to determine the contribution of a given segment. As is well known to those skilled in the art, there are many possible relevance assessments, mainly divided into three categories:
and (3) subtraction evaluation: for example, Nx-yz
And (3) proportion evaluation: for example, x (N-y-z-x)/(z-x) (y-x)
And (3) mixed evaluation: for example, (x/z) - (z-x)/(N-z)
It is noted that any one of the relevance assessments may be selected, and those skilled in the art will be readily able to make the appropriate selection.
Therefore, the algorithm used in step 430 includes (see FIG. 6):
(i) estimating the number x of compounds within the subset that meet a given threshold associated with the relevant chemical or biological outcome and that contain a given chemical determinant (step 610);
(ii) estimating the number of compounds y within said subset of compounds that contain a given chemical determinant, regardless of whether they meet said threshold (step 620);
(iii) estimating the number of compounds z within said subset of compounds that meet said threshold, regardless of whether they contain a given chemical determinant (step 630);
(iv) estimating the total number of compounds N within the subset of compounds (step 640);
(v) the relevance evaluation substitutes two or more of the variables x, y, z and N (step 650), preferably three or four variables, and most preferably all four variables.
The relevance evaluation can be used directly to determine the score of the contribution corresponding to a given segment. However, it is also preferable to convert the relevance assessment into a scoring function to assess the likelihood that the sub-structure contributes to the result. This helps to more clearly determine the ranking of the scores obtained from all the fragments to be analyzed. The relevance assessment can be converted into a scoring function using methods well known in the art. For example, these methods may be selected from statistical methods, such as the critical ratio method (z); fisher's exact test (Fisher's exact test), Pearson's chi-squared test (Pearson's chi-squared); the Mantel Haenzel's chi-square assay (Mantel Haenzel's chi-squared); and methods based on, but not limited to, reasoning about slopes and the like. But other methods than statistical tests may be used. These methods include, but are not limited to, calculation and comparison of exact and approximate confidence intervals, correlation coefficients, or any function containing an assessment of relevance containing various combinations of one, two, three, or four of the variables x, y, z, or N described above.
Examples of mathematical formulas used by the present invention to represent relevance evaluation or scoring functions include:
(I) x/z
(II) x/N
(III) Nx-yz
(IV) (x/z)-(y/N)
(V) (x/z)-(z-x)/(N-z)
(VIII) e[(x/z)-(z-x)/(N-z)]
one skilled in the art will consider the scoring function (VII) as a product-moment correlation coefficient that reflects the degree of variance shared between two bisecting variables not shown in the formula.
Those skilled in the art will recognize that the scoring function (VIII) is related to an estimate of the risk ratio (risk odds ratio) made using the slope of a regression line representing the degree of shared variance present between two bisecting variables.
One skilled in the art will consider the scoring function (IX) as a chi-squared correlation statistic that is modified for various confounding coefficients. For example, the N/2 term in the second quotient numerator of the logarithmically scaled product is a conservative adjustment of the approximate normality of the binomial distribution that can be used as a correction for smaller values of x, y, z or N. Those skilled in the art will recognize that other relevance evaluating and/or scoring functions may be used for the same purpose, instead of those represented by formulas (I) and (II), where various combinations of one, two, three or four of the variables x, y, z and N are most appropriate, in the sense of the present invention.
One skilled in the art will consider the scoring function (X) as a way to estimate the lower 95% confidence interval value of measurement (III) by making the proportion distribution more nearly normal using a logarithmic transformation, and estimating the logarithmic variance of the same proportion using a one-time Taylor series approximation.
One skilled in the art will consider the scoring function (XI) as a method of comparison to a comparison to allow one to identify chemical determinants that are most likely to be selective for one target over other targets.
One skilled in the art will consider scoring function (XII) as a method of combining multiple tests with correlation, allowing one to identify chemical determinants that are most likely to affect two or more given properties simultaneously.
One skilled in the art will also recognize that the scoring function may be modified to include other variables related to molecular material, biological, chemical and/or physicochemical properties. For example, such modifications include, but are not in any way limited to, adjustments of the following variables: compound efficacy, selectivity, toxicity, bioavailability, stability (metabolic or chemical), synthetic feasibility, purity, commercial availability, availability of suitable reagents for synthesis, cost, molecular weight, molar refractive index, molar volume, logP (calculated or determined), number of hydrogen bonds to acceptor groups, number of hydrogen bonds to donor groups, charge (partial or formal), protonation constant, number of molecules containing other chemical keys or descriptors, number of rotational bonds, elasticity index, molecular morphology index, alignment similarity, and/or overlap volume.
Thus, for example, the scoring function (VIII) may be further modified to, for example, calculate the Molecular Weight (MW) of each chemodeterminant under study, as follows:
MW.e[(x/z)-(z-x)/(N-z)]similarly, scoring function (IX) may also be modified to include variables MW and S]Which respectively denote the Molecular Weight (MW) of the relevant chemical determinant and the number of occurrences of said same chemical determinant within the subset of active compounds x ([ S ]]) Expressed as follows:
so that the most likely singleton bioactive chemical determinant is more easily identified during analysis.
The result of algorithm step 650 provides a score for the segment under study. Algorithm step 610 and 650 may be repeated for each selected segment within the data. When the scores of all selected segments are calculated, the results provide a score corresponding to the potential efficacy of each segment that has been analyzed. The scores may be arranged in orders of magnitude; wherein those segments that are most likely to contribute to the relevant chemical and/or biological outcome are associated with, for example, a high ranking score. This may identify one or more local extrema of the scoring function values whose corresponding chemical determinants represent full or partial solutions to the desired chemical or biological outcome at step 440. Finding the maximum score achievable within any given data set corresponds to identifying the chemical determinants that are included in the subset of molecules having the desired property, with the lowest probability that the chemical determinants of the molecule will occur by chance within the same subset. When the desired property is a given biological activity, the highest scoring fragment or chemical determinant represents a pharmacophore (pharmacophore) with biological activity.
Returning now to FIG. 2, a preferred embodiment of the step 250 of analyzing the fragment library 120 is discussed below.
FIG. 7 illustrates one way of analyzing the fragment library 120. The process begins by selecting segments based on the scores determined in the previous round at step 710. Compounds are then extracted from the last pool containing the selected fragments at step 720. Since step 710 selects fragments that contribute significantly to the desired activity, the compound extracted in step 720 may be considered an active compound. Next, a set of inactive compounds is selected from the previous set or database or any other source at step 730. The active and inactive compounds are then combined together in step 740 to form a new set of compounds. The new compound set is then selected as the compound set for the next iteration to be generated, at step 220, and the next iteration is performed.
A preferred embodiment for performing step 730 is now described in conjunction with fig. 8. This example utilizes a common substructure to select a new set of compounds for the next round.
The process shown in FIG. 8 begins by analyzing the structure of the segment selected in step 710 in step 810. When the general substructure of the present invention is used, the segment selected in step 710 may be selected by evaluating the score calculated in the previous round. Furthermore, segments may also be selected based on other factors that affect whether a segment is suitable to become a common starting point. Such suitability may be a function of the number of atoms or bonds, the manner in which the atoms are bonded, the three-dimensional structure of the individual segments, and the like.
After analyzing the structure of the selected segment at step 810, generic terms within the segment structure are sought at step 820. The item is then replaced with a generic expression at step 830 to obtain a generic substructure (e.g., to find biological isosters). An example is:
in a given selected segment, two general terms are found and expressed in general [ Ar ]]And A is substituted with, wherein [ Ar]Represents an aromatic center, and A represents carbon or sulfur.
The generic substructures formed in step 830 are then used for virtual screening to find new compounds that match the generic substructures. The term "virtual screening" refers to any screening process that is performed using only data, such that no compound is synthesized. Next, a new compound set for the next iteration is constructed at step 850 using the new compounds found by the virtual screen.
As shown in fig. 9, the virtual screening process can be divided into out-of-domain and in-domain embellishments of the fragments derived using a common substructure. The intra-domain modifications made at step 910 include substitutions, insertions, deletions and inversions of atoms of the fragment. Starting from the exact fragment described above, and making this fragment a generic substructure, three different substitutions are obtained in the following examples:
the out-of-domain modification performed at step 920 includes altering the substituents of the fragment. These substituents may be random or aggregated, and the like:
the set of aggregated compounds is a modified set of molecules based on one or more common substructures:
the steps of performing the modifications, both intra-and out-of-domain, are performed in series, as shown in fig. 9, and it will be appreciated by those skilled in the art that the present invention may be performed with only one of these different kinds of modifications, or with the two modifications performed in a different order or even in parallel. It is to be understood that the diversity of compound sets obtained from the virtual screening results are more likely to be active because they are rich in activity-related substructures.
At step 710, a segment is selected that becomes the basis for using the inductive function 145 to obtain a generic substructure, and another preferred embodiment of the present invention is to select more high scoring segments to generate a generic substructure. For example, the fragments shown below, which contribute significantly to the desired activity, are selected at step 710:
these selected fragments are then reduced to high scoring generic substructures, such as:
these generic substructures are then used to virtually screen a commercial database or a corporate (coremate) compound set.
The iterative process described above facilitates computational reasoning because it starts with small segments and increases the size of the segments through a round of iterations, it also shows the ability to increase the discovery with generic substructures during iterative iterations, and another method of the invention further improves the discrete substructure analysis process of the invention. This method is based on an annealing technique, which will be described below in connection with fig. 10.
In the preferred embodiment shown in FIG. 10, the step 250 of analyzing the library of fragments generated in the previous round begins with steps 1010 and 1020 of selecting a first and second fragment. These two segments are selected based on the calculated scores and are considered to contribute significantly.
In a next step 1030, the first and second segments are connected using an annealing function. Joining the fragments means defining a molecular structure or substructure comprising the two fragments. To this end, a variety of different annealing functions 155 may be employed. These annealing functions vary in how certain annealing parameters are evaluated and used in particular applications. The annealing parameters are, for example, the (given) distance of the first and second segments, the three-dimensional orientation of the first and second segments, the number of atoms placed between the segments, the number of bonds used to glue the segments together, the type of bonds and atoms, etc.
Furthermore, the annealing process is preferably used in combination with the general substructure described above. For example, if steps 1010 and 1020 select segments F1 and F2 that are known to have high scores, then the method selects and proceeds at step 1030
The annealing function of step 1040 of F1- [ G ] -F2 may be used to link the fragments using a generic expression. The general expression [ G ] is a synonym for the molecular substructure and the annealing parameters of a given nature, depending on the annealing function used.
Once these fragments are combined by precise or universal expression, a new set of compounds including the two fragments can be formed at step 1040. FIG. 11 shows an example of a molecule in a new set of compounds, which is a two-dimensional map of relative contributions showing the relative contributions in relation to local coordinates. As can be seen in FIG. 11, there are two local maxima which are the approximate scores 1.2 and 1.7 for fragments F1 and F2.
The annealing process has two advantages. The first advantage is that macromolecules are obtained by joining two fragments that contribute greatly to the desired activity, which is predicted to include more than one high-score molecule. The score of the resulting structure is more likely to be higher than the highest score of the two fragments.
For example, in the structure shown in fig. 11, the resulting compound includes fragments with scores of 1.2 and 1.7, but the overall score for the entire structure may be, for example, 2.1. Therefore, annealing techniques can even find more reactive compounds.
A second advantage is that the annealing technique can avoid deadlock in the computation process. As shown in fig. 11, the relative contribution value represents two local maxima. When performing the iterative process shown in fig. 3, starting with small segments, increasing the segment size during each iteration of a round, a deadlock occurs when the segment selected by one of the intermediate steps is at a local maximum.
For example, when the segment N-C ═ O is selected at the end of the second round, and the segment is at a local maximum, the next round cannot be performed. As mentioned above, the segments of the next round are preferably made up of segments of the previous round, and the segments of the previous round are made incrementally larger in size. So the next round moves the fragment away from the local maximum regardless of which atom is added to the selected fragment. That is, in this case, the score of any one segment obtained is lower than that of the segment selected in the previous round.
To avoid deadlock, an annealing technique may be applied to select two good fragments from the previous round, join the two fragments, calculate a score, and then continue the process. This may be done periodically, round by round, or when a deadlock is detected.
The invention has been described above with reference to a number of preferred embodiments, to which it will be noted by those skilled in the art that the invention is not in any way limited. For example, the order of the method steps shown in the flow chart may be changed, or the steps shown in the figure as being performed consecutively may even be performed in parallel, see, for example, steps 1010 and 1020 in the process shown in FIG. 10.
In addition, it will be apparent to those skilled in the art that not all of the method steps given are necessarily used. For example, in the scoring process of FIG. 6, it is not necessary to calculate parameters that are not used for the scoring function. The parameters may also be calculated in parallel using a multitasking or multipath operating system.
Other embodiments of the invention will now be illustrated.
For example, the library of fragments formed in step 230 may theoretically contain all possible fragments and combinations thereof. This is practically achievable if the fragment library is formed by a computer. However, if the fragment library is artificially created, it may only contain all possible fragments that were selected. This method can therefore be repeated with combinations of fragments, in particular with high scoring fragments obtained in previous analyses.
Thus, after a preliminary analysis of the fragments, those fragments that are most likely to contribute to the associated chemical and/or biological outcome can be combined, and the contribution of the combined fragments to the associated chemical and/or biological outcome estimated using one of the algorithms described above. The resulting scores are compared to the scores of the individual fragments to determine whether the combination contributes more to the relevant chemical and/or biological outcome.
In another preferred embodiment of the invention, a common moiety may be selected from the fragments that contribute most to the relevant chemical and/or biological outcome to determine whether the contribution of the common moiety is equal to or greater than the original fragment.
The segment with the highest score represents the chemical determinant (chemical determinant) or molecular fingerprint (fingerprint) that contributes most to a given chemical or biological outcome.
Once the fingerprint is determined, a library of compounds containing the chemical determinants can be formed. These compounds can be made by synthetic procedures around the structural features. In addition, chemical determinant-containing compounds can also be identified from commercially available catalogues, and can be purchased from related sources. There is no need to prepare the compounds for pharmaceutical use and their availability is a wide variety of sources.
Once the desired libraries are combined, they can be screened for relevant targets. The results of the screening can identify compounds that have sufficient activity for further study or to provide a lead for synthetic procedures. The DSA methods of the invention can form diverse but highly aggregated libraries for a particular biological or pharmacological target. Thus greatly increasing the likelihood of successful screening for active compounds and/or useful leads.
In yet another embodiment, the present invention provides a method of identifying a molecule, such as a biologically active molecule, having certain desired properties, the method comprising:
weighting the contribution of the molecular fragments within the molecular subset to the given chemical or biological result,
identifying the one or more segments with the highest weighting, and
assembling a set of compounds comprising said one or more fragments, and
optionally testing the compound for the desired activity.
It is noted that this method can also be used to identify fragments that produce undesirable properties such as adverse biological side effects, and compounds having such fragments are not considered below.
It can thus be seen that the method of the invention produces a structurally hypothetical fragment whose likelihood of being an explanation for a given biological, biochemical, pharmacological or toxicological outcome is estimated by calculating a quantitative score. Taking into account the score of a given fragment allows the drug developer to know the most likely method for achieving the desired goal and make decisions such as identifying more potent compounds, finding a new series of active compounds, identifying more selective or bioavailable compounds or eliminating toxic effects, etc.
The method of the invention is directed to fragments that occur within a subset of related compounds, thus eliminating the need for lengthy calculations for a large but likely unrelated portion of the chemical space. Such a result is a reduction in the number of computational steps required to process a given biological result, but retains the basic level of molecules necessary to assume the presence of the biologically active chemical determinants.
As discussed above, the method of the present invention includes searching for local extrema for one or more functions selected to correspond to the probabilities given by the common statistical table. This approach provides a subtle way of assessing the potential contribution of a given fragment to a chemical or biological outcome. However, the present invention does not need to be based on analyses made by statistical theory.
The DSA method of the present invention can be widely used in drug discovery applications. As described above, the method allows the identification of drug moieties that have a high probability of contributing to a given biological activity, such as 7-TM receptor antagonists, kinase inhibitors, phosphatase inhibitors, ion channel blockers and protease inhibitors, as well as the active groups of naturally occurring peptidergic ligands.
This approach can also identify endogenous modulators of drug targets, thus helping to identify new abrogations of drug intervention, and to rationally introduce new pharmacological properties into molecules that originally lacked those properties.
This method can also be used to identify false positive and false negative results within a data set, such as those resulting from high throughput screening. DSA is used to predict selectivity of a compound by, for example, identifying a potentially undesirable second effect.
This approach in the same way can predict the toxic effects of a compound by identifying "toxic" chemical determinants in the compound, in combination with the above, allows the construction of a wide variety of sub-databases of chemical determinants for the selection of chemical families. In this context, the method also makes it possible to introduce new pharmacological properties reasonably into molecules which originally lack said properties. Finally, DSA methods can identify the most appropriate level of molecular diversity to be tested during the screening motion, thereby allowing efficient, massively parallel, automated, high-throughput screening motions, which is a significant improvement over current HTP discovery strategies.
It is noted that at least one of the steps of the above method is performed by a computer control system. Therefore, variables such as x, y, z and N taken from the database may be input and processed by a suitably programmed computer. The invention extends to such computer-controlled or computer-implemented methods.
It is apparent from the above that the present invention provides a novel method for rapidly identifying molecules, such as biologically active molecules, having certain desirable properties. In particular, the invention relates to a method for weighting the efficacy of molecular structures to identify biologically active groups of the molecular structure and use these groups to design an aggregated chemical set for faster and more cost-effective drug discovery.
The present invention provides a method of increasing the proportion of biologically active compounds within a given chemical class, where the class is not known to have the desired biological activity. The method involves the use of various mathematical methods to determine Quantitative Structure Activity Relationships (QSAR). This new approach, which may be referred to as Discrete Substructure Analysis (DSA), addresses problems such as pharmacological pattern recognition, i.e., the identification of chemical determinants responsible for any one given chemical or biological outcome of a given compound, which may be, for example, biological, biochemical, pharmacological, chemical, and/or toxicological activity.
The method of the present invention has wide application and is not limited to the field of pharmacology. In the case of biologically active compounds, the method can be used, for example, in the context of pesticides or herbicides, where the desired biological activity is pesticidal and herbicidal activity, respectively. The method can also be used in reaction modeling applications where the desired property is a chemical property rather than a biological property, such as catalyst preparation.
It is noted that the technique of the present invention is to combine the fragments within a subset or between different subsets that are most likely to contribute to the associated chemical and/or biological outcome, and to evaluate the contribution of the combined fragments to the associated chemical and/or biological outcome using an algorithm to compare the resulting scores to the scores of the individual fragments to determine whether the contribution of the combination to the associated chemical and/or biological outcome is increased.
The invention can also select a common structural moiety from the fragments that contribute most to the relevant chemical and/or biological outcome, identifying whether the contribution of the common moiety is equal to or greater than the original fragment.
Furthermore, the relevance assessment used is preferably selected from a subtractive assessment, a proportional assessment or a mixed assessment. Preferably, the relevance assessment is incorporated into the scoring function, or converted into the scoring function. The scoring function may be converted using statistical methods selected from methods such as the critical ratio method, Fisher's exact test, Pearson's chi-square assay, Mantel Haenzel's chi-square assay, methods based on inferences made about slopes, etc. Another preferred embodiment is to convert to a scoring function by a method selected from the group consisting of calculation and comparison of exact and approximate confidence intervals, correlation coefficients, or any function containing an assessment of relevance, containing any combination of one, two, three, or four variables x, y, z, or N.
The present invention preferably performs the steps of selecting molecules containing the highest ranked fragments as potential ligands and then testing them selectively as drug target modulators. The method of the invention is preferably used to identify false positive and/or false negative test results. Other suitable applications are similarity searches, diversity analysis and/or conformational analysis.
Examples of various applications of the DSA method of the present invention are given below. These are preferred embodiments of the present invention, serving to illustrate the invention, but are not to be construed as limiting the scope of the invention.
Example 1 rational recognition of novel Selective receptor ligands
The present invention develops a competitive binding assay for cell surface receptors using recombinant membrane preparations and radiolabeled peptides. The test compounds for use in the assays are combined according to the methods of the invention, tested for them, and new receptor ligands identified. The first step is to consult a table compiled from the existing scientific literature listing 208 structures of the same receptor antagonist. The second step is to identify the bioactive chemodeterminants contained in the 208 receptor ligands. To this end, a further table was prepared containing 101,130 structures having no effect on the same receptor and added to the first table. The resulting table containing 101,338 structures was then evaluated for the presence of bioactive chemical determinants by selective relevance subtraction (1) analysis, where x represents the number of active chemical structures containing the relevant chemical determinant, y represents the total number of chemical structures containing the same chemical determinant, z represents the total number of active chemical structures within the set containing N molecules (i.e., z ═ 208), and N represents the total number of chemical structures to be analyzed (i.e., N ═ 101,338).
(I) Nx-yz
The relevance estimate (I) is then converted into a scoring function (II) which will be used by those skilled in the art as an indirect estimate of the probability of an accidental occurrence corrected for various confounding coefficients. For example, the N/2 term in the second quotient numerator of the logarithmically scaled product is a conservative adjustment of the approximate normality of the binomial distribution that can be used as a correction for smaller values of x, y, z or N. The variables MW and [ S ] represent the Molecular Weight (MW) of the relevant chemodeterminant, respectively, and the number of times the same chemodeterminant occurs within the subset of active compounds x, including within the scoring function, which makes it easier to identify the most likely haplotype bioactive chemodeterminant during analysis. Those skilled in the art will recognize that other relevance evaluating and/or scoring functions may be used for the same purpose, instead of those represented by formulas (I) and (II), where various combinations of two, three or four of the variables x, y, z and N are most appropriate, in the sense of the present invention.
One skilled in the art will also recognize that scoring function (II) may be modified to include other variables related to molecular material, biological, chemical and/or physicochemical properties. For example, these modifications include, but are not in any way limited to, adjustments of the following variables: compound efficacy, selectivity, toxicity, bioavailability, stability (metabolic or chemical), synthetic feasibility, purity, commercial availability, availability of reagents for synthesis, cost, molecular weight, molar refractive index, molar volume, logP (calculated or determined), number of hydrogen bonds to acceptor groups, number of hydrogen bonds to donor groups, charge (partial or formal), protonation constant, number of molecules containing other chemical keys or descriptors, number of rotational bonds, elastic index, molecular morphology index, alignment similarity, and/or overlap volume.
Analysis of 101,338 structures resulted in the identification of 8 different chemical determinants with molecular weights from 150 to 230Da with less than 1/10,000 probability (p < 0.0001) of being included in a subset of active chemical structures based on probability only. Thus, these 8 chemical determinants are believed to represent one or more bioactive groups of 208 receptor ligands generated from the literature, and are grouped in the fourth table. The calculation is then repeated using equation (II) to determine whether a larger chemical determinant from the combination or any other extension of the 8 fragments can be identified. The molecular weight of the chemodeterminant found to be of greatest statistical significance in these additional calculations was 335Da, which was selected as a representative backbone, or as a pharmacologically active "fingerprint" for the later selection and synthesis of compounds. The third step of this method involves virtual screening and selection of compounds using the above representative scaffolds as templates. To this end, a substructural search was performed using the calculated fingerprints and fragments thereof in a database containing over 600,000 commercially available compounds. A total of 1360 compounds were obtained based on these searches, and another 1280 compounds were randomly selected and obtained from the same supplier for comparison purposes.
The fourth and fifth steps, which constitute the final stage of the method, are performed in parallel. The fourth step involves testing both compounds in a radioligand binding assay. Based on the representative backbone 205 molecules were selected which were competitive at concentrations ranging between 1 and 10 μ M, 21 compounds were selected which were active at concentrations ranging between 0.1 and 1 μ M, and 1 compound, designated compound a, which had an affinity for the receptor (Ki) at 8.1 ± 1.05nM (n-12). Each of 1280 randomly selected compounds were tested at a concentration of 10 μ M and demonstrated no receptor binding properties. Therefore, the set of compounds compiled based on representative fingerprints deliver active molecules at least 21 times more efficiently than the set of randomized compounds (p < 0.0001).
The invention discovers that the compound A represents a novel related receptor inhibitor which has not been reported. FIG. 12 is a graph showing the effect of Compound A on receptor-mediated inositol triphosphate production. Cells expressing the relevant receptor are preloaded with radiolabeled inositol and contacted with a receptor agonist in the presence of compound a, during which time the concentration of compound a is increased. Inositol triphosphate (IP) determination after elution of radiolabeled cellular inositol phosphates from affinity chromatography columns3) Is generated. Compound A with IC50Inhibition of agonist-induced IP at 20nM3Which corresponds to the affinity of the compound for the receptor.
As shown in FIG. 12, Compound A was tested in a cell-based functional assay (IC)5022nM) resulted in a significant reduction in receptor-mediated inositol triphosphate production, consistent with the affinity of the compounds for the receptor, and the use of receptor antagonists in the calculations described above. Finally, it was determined that compound A is highly selective for the relevant receptor and, as such, it has not been demonstrated to have significant inhibitory activity in more than 20 other radioligand receptor binding assays.
The fifth step, in the material composition sense and in view of the recognition of new molecules with receptor binding activity, is to guide the conceptual design and synthesis of new compounds with the above-mentioned representative backbone. To this end, a table is presented containing chemical reactants and reaction products in which the reactants are within the chemical structure or the resulting reactionsThe product should contain the above-mentioned representative skeleton having biological activity. Over 2000 combinations were selected and the corresponding reaction products tested were synthesized. Testing these compounds in receptor binding assays identifies a new class of compounds in terms of composition of matter, many of which represent the IC of the compound50s ranges from 50 to 500 nM.
Example 2 rational identification of novel selective kinase inhibitors
The present inventors have developed an enzymatic assay for human kinases involved in inflammation, and no inhibitors of human kinases have been described in the literature. The test compounds for the assay are combined according to the method of the invention, tested for them, and new kinase inhibitors are identified. The first step is to compile a table from the scientific literature that lists the chemical structures of 2367 purine nucleotide binding protein inhibitors, including compounds with the ability to inhibit other kinases, phosphodiesterases, purine nucleotide binding receptors, and purine nucleotide mediated ion channels, collectively referred to as "surrogate targets". The second step is to identify the bioactive chemical determinants contained in the 2367 chemical structures. To this end, another table was constructed containing 98,971 structures that had no effect on the same alternative objectives and added to the first table. The table containing 101,338 structures obtained from the analysis was then evaluated for the presence of bioactive chemical determinants by selecting a relevance ratio, where x represents the number of active chemical structures containing the relevant chemical determinant, y represents the total number of chemical structures containing the same chemical determinant, z represents the total number of active chemical structures within the set containing N molecules (i.e., z ═ 2367), and N represents the total number of chemical structures to be analyzed (i.e., N ═ 101,338).
The correlation estimate (III) is then converted to a scoring function (IV) which will be considered by those skilled in the art as a way to make the proportional distribution more nearly normal by a logarithmic transformation, and to estimate the logarithmic variance of the same proportion by a one-time Taylor series approximation, the scoring function (IV) can be used to estimate the 95% confidence interval lower limit for the measurement (III). In this case, the scoring function uses no more variables than x, y, z or N, and is not limited to the variables cited in example 1, although it will be apparent to those skilled in the art that equation (IV) may be modified to include other variables related to molecular materials, biological, chemical and/or physicochemical properties as described above. Those skilled in the art will also recognize that other relevance evaluation and/or scoring functions may be used for the same purpose, instead of those represented by formulas (III) and (IV), where most appropriate various combinations including two, three or four of the variables x, y, z and N are included in the sense of the present invention.
A series of chemical determinants was calculated using equation (IV) to analyze 101,338 structures annotated as having various biological activities until one or more sets of chemical determinants were identified whose elements have values greater than corresponding values with a probability of less than 1/20(p < 0.05) based on likelihood of being included in a subset of the biologically active structures alone. Thus, these chemical determinants are considered to represent one or more pharmacologically active groups that are described in the literature as alternative to the target inhibitors, and constitute a fourth table. Instead of searching for the combination with the greatest score among these determinants as described in example 1, these structures were used directly as representative scaffolds, or as pharmacological activity "fingerprints" for compounds that were later selected and synthesized.
The third step involves virtual screening and selection of compounds using the above representative scaffolds as templates. To this end, substructural searches were performed using calculated fingerprints, fragments, and combinations thereof in databases containing over 250,000 commercially available compounds. A total of 2846 compounds were obtained based on these searches, with the same collection of 1280 randomly selected compounds as controls as described in example 1.
The fourth and fifth steps, which constitute the final stage of the method, are performed in parallel. The fourth step involves testing the obtained compound in an enzymatic assay. 88 molecules were selected among 2846 molecules based on a representative backbone, and tested for inhibitory activity at a concentration of 5. mu.M. Among these molecules, there are ICs of 6 molecules50s is in the range of 0.2 to 2. mu.M, one of the compounds being designated Compound B, its IC50Was 164nM (FIG. 13).
FIG. 13 shows the effect of compound B on kinase-dependent protein phosphorylation. The relevant kinase is incubated with the radiolabeled ATP and peptidic matrix in the presence of compound B, during which the concentration of compound B is increased. Protein phosphorylation was determined using standard radiometric techniques. Compound B significantly inhibits protein matrix kinase dependent phosphorylation, IC thereof50Is 164 nM.
Of the control 1280 randomly selected compounds, only 3 had inhibitory activity in the screening assay, with the IC of the most potent compound50Only 7.8. mu.M. Thus, the efficiency of active molecule delivery based on a compiled set of representative fingerprints is at least 13.2 times (p < 0.0001) that of a randomly chosen set of compounds. The present inventors have found that compound B represents a novel class of ATP-competitive kinase inhibitors that has never been reported and that have been tested in a selectivity assay with structurally and functionally related replacement kinases that are 250-fold more selective for the relevant kinase.
The fifth step is to guide the conceptual design and synthesis of new compounds using the above-described representative backbones, in a material composition sense and with a view to identifying new molecules with kinase inhibitory activity. To this end, a table of chemical reactants and reaction products is presented, in which the reactions are carried outThe chemical structure of (a) or the resulting reaction product comprises the above-described representative backbone, or fragment thereof, having biological activity. More than 4000 combinations were selected and the corresponding reaction products tested were synthesized. Testing these compounds in a screening assay, two new classes of compounds can be identified in a matter composition sense, with the IC of many representative compounds50s ranges from 100 to 500 nM.
Example 3 rational identification of novel selective ion channel blockers
The present inventors have investigated the determination of an ion channel believed to have a role in neurodegeneration, and no prior art has described inhibitors of ion channels. The test compounds for the assay are combined, tested for them, and new inhibitors identified according to the method of the invention. The first step is to develop the necessary structural data for identifying the chemical determinants of the relevant channel inhibitors. This step was achieved by testing the first 3680 compounds in our company collection at a concentration of 5 μ M in the screening assay and noting their inhibitory activity for each structure in the table. With the 40% inhibition cut-off as the threshold for classification, 36 structures were identified as active, the remaining 3644 compounds were inactive.
The second step is to identify the bioactive chemodeterminants contained in the chemical structures of the 36 inhibitors. To this end, 3680 annotated structures are analyzed by selecting the aforementioned association assessment (I), where x represents the number of active chemical structures containing the relevant chemical determinant, y represents the total number of chemical structures containing the same chemical determinant, z represents the total number of active chemical structures within the set containing N molecules (i.e., z equals 36), and N represents the total number of chemical structures to be analyzed (i.e., N equals 3680). The relevance assessment (I) is then converted into a scoring function (V), which is considered by the skilled person as a product-moment correlation coefficient, which reflects the degree of sharing variance between two bisecting variables not shown in said formula (V).
In this case, the scoring function uses no more variables than x, y, z or N, although it will be apparent to those skilled in the art that the scoring function (V) may be modified to include other variables related to molecular material, biological, chemical and/or physicochemical properties as described above, but they are not limited to the variables cited in example 1. Those skilled in the art will also recognize that other relevance evaluation and/or scoring functions may be used for the same purpose, in place of those represented by formulas (I) and (V), particularly since scoring function (V) is not constant over different variations in the study design and/or distribution of y, (N-y), z, and (N-z). In the sense of the present invention, these alternative methods are most suitably various combinations comprising two, three or four of the variables x, y, z and N.
The following figure shows an example of the chemical determinants used for later analysis and selection. All 3680 structures annotated to inhibit channel activity were tested for the presence of biologically active substructures using a subset of chemical determinants including the 5 determinants shown in panel a. In these 5 structures, determinant 4 has the highest score, indicating that it is most likely the basis for inhibition of channel activity. Thus, the structure containing determinant 4 was iteratively calculated, and the chemical structure shown in group B was determined to be one of the most statistically significant determinants contained within the collection of 36 inhibitors for later selection. Symbol: a represents carbon, nitrogen, oxygen or sulfur; b represents hydrogen or OH.
3680 annotated structures were analyzed by calculating the scores of a series of chemical determinants using equation (V) and retaining the structure that produced the largest non-zero positive number. Group A shows some examples of chemical determinants used in the method and their calculated scores. Of these determinants, determinant 4 has the highest score, and the probability that it is contained within the subset of channel blocking structures based only on likelihood is estimated to be less than 1/100(p < 0.01). Therefore, determinant 4 was considered to represent the largest proportion of the 36 inhibitors, and the formula (V) was repeated to determine whether a larger chemical determinant could be identified. Panel B shows the most statistically significant chemodeterminants found in these additional calculations. This structure is selected as a representative backbone, or as a pharmacologically active "fingerprint" for the subsequent selection and synthesis of compounds.
The third step involves virtual screening and selection of compounds using the representative frameworks shown in group B as templates. For this purpose, a substructure search is carried out on a database containing over 400,000 commercially available compounds using fingerprints and fragments thereof calculated for this purpose. A total of 1760 compounds were obtained based on these searches, using the same collection of 1280 randomly selected compounds as controls as described in example 1.
The fourth and fifth steps, which constitute the final stage of the method, are performed in parallel. The fourth step involves testing the obtained compound in an enzymatic assay. 84 molecules were selected from 1760 molecules based on a representative backbone, and tested at a concentration of 5 μ M had at least 40% inhibitory activity. Among these molecules, there are 8 molecular ICs50s is in the sub-micromolar range, one of the compounds being designated compound C, its IC50Is 400 nM. Two examples of these channel-inhibiting compounds are shown below, both containing a pharmacologically active "fingerprint" that is identical to that shown in group B:
these two channel-inhibiting compounds were selected for testing using the methods of the invention. Both molecules significantly inhibited the relevant channel. The chemical structures of both compounds contain pharmacologically active chemical determinants recognized by the methods of the present invention, wherein the substructures are represented by bold black lines, see panel B, above.
Of the 1280 randomly selected compounds used as controls, a total of 33 molecules had at least 40% inhibitory activity in the screening assay. Thus, the set of compounds compiled on the basis of the representative fingerprints shown in group B deliver active molecules at least 1.8 times more efficiently (p < 0.005) than the randomly chosen set of compounds. The set of compounds compiled based on the representative fingerprints shown in panel B delivered active molecules at least 4.9 times more efficiently (p < 0.0001) than the first 3680 compounds of the corporate collection.
The fifth step is to guide the conceptual design and synthesis of new compounds with the representative backbones shown in group B, in a matter-composition sense and with a view to identifying new molecules with inhibitory channel activity. To this end, one of the 120 pharmacological activity inhibitors described above was selected for follow-up and chemically modified with previously combined positive and negative screening results as a source of structure-activity information. This approach leads to the synthesis and subsequent identification of a class of novel ion channel blockers, which have never been described, with ICs in which there are many representative blockers in terms of their composition of matter50s ranges from 100 to 500 nM. The selectivity test indicated that the compounds were selective for the relevant channels among the other 30 drug targets and inhibited cell death in a nerve growth factor elimination induced apoptosis model.
Example 4 rational identification of novel Selective protease inhibitors
The present inventors have developed an assay for proteases believed to have a role in ischemic injury or damage. The proteases are members of a family of closely related enzymes, which themselves are the only relevant targets for therapeutic intervention. The test compounds for the assay are combined according to the method of the invention, tested for them, and new enzyme inhibitors are identified. The first step is to develop the necessary structural data for identifying the chemical determinants of the enzyme inhibitor. A collection of 1680 compounds was tested at a concentration of 3. mu.M in the screening assay, each structure annotated with inhibitory activity, and the necessary structural data was generated. With a cut-off of 40% inhibition as the threshold for classification, 17 structures were identified as active, the remaining 1633 compounds were inactive.
The second step is to identify the bioactive chemical determinants contained within the 17 inhibitor structures. To this end, 1680 annotated structures were analyzed by selecting a mixture of associations represented by the following formula (VI), where x represents the number of active chemical structures containing the relevant chemical determinant, y represents the total number of chemical structures containing the same chemical determinant, z represents the total number of active chemical structures within a set containing N molecules (i.e., z ═ 17), and N represents the total number of chemical structures to be analyzed (i.e., N ═ 1680). In this case, the association assessment (VI) was used directly as a scoring function to identify 17 bioactive chemical determinants comprised by the relevant inhibitors.
In this context, no other variables than x, y, z or N are used for the scoring function, although it will be apparent to those skilled in the art that the scoring function (V) may also be modified to include other variables related to molecular material, biological, chemical and/or physicochemical properties as described above, but they are not limited to the variables cited in example 1.
Those skilled in the art will also recognize that other association assessments and/or scoring functions may be used for the same purpose, in place of those represented by formula (VI), particularly since only relative estimates can be made of the basis on which a given chemical determinant may be biologically active using the association assessment directly. In the sense of the present invention, these alternative methods are most suitably various combinations comprising two, three or four of the variables x, y, z and N.
1680 annotated structures were analyzed by calculating the scores of a series of chemical determinants using equation (VI), preserving the structure that produced the most positive number. Panel a below shows examples of some of the chemical determinants used in the method and their calculated scores. Of these determinants, determinants 7 and 8 are the highest scoring and are considered to represent one or more bioactive groups contained by the majority of the 17 inhibitors. Repeated calculations using formula (VI) were performed to determine if a larger chemical determinant could be identified, which would not be the case with a set containing these 17 structures, such that determinants 7 and 8 were combined to form a representative backbone, or used as a pharmacologically active "fingerprint" for the subsequent selection and synthesis of compounds as shown in group B.
The two sets of diagrams show examples of chemical determinants that may be used for later analysis and selection. The subset of chemical determinants comprising the 4 determinants shown in panel A was used to test the presence of biologically active substructures for all 1680 structures annotated for protease inhibitory activity. In these 4 structures, determinants 7 and 8 score the highest, indicating that they are most likely the basis for protease inhibitory activity. In contrast, a determinant consisting of a simple benzene ring has a score of 0.02. Since the structures with higher scores were not recognized by repeated calculations using determinants 7 and 8, these two structures were synthesized into chemical motifs as shown in panel B and used as pharmacological activity "fingerprints" for later virtual screening and selection of compounds. Symbol: a represents carbon or sulfur; b represents hydrogen, carbon, nitrogen, oxygen or any one of halogen atoms.
The third step involves virtual screening and selection of compounds using the representative frameworks described in group B as templates. For this purpose, a substructure search is carried out on a database containing over 150,000 commercially available compounds using fingerprints and fragments thereof calculated for this purpose. A total of 589 compounds were obtained based on these searches.
The fourth and last step of the method involves testing the obtained compound in an enzymatic assay. 52 molecules were selected from 589 compounds based on a representative backbone, which tested at a concentration of 3 μ M had at least 40% inhibitory activity. Of these molecules, there are 12 compounds of IC50s is in the sub-micromolar rangeWithin the enclosure, one of the compounds is designated compound D, its IC50Is 65 nM. Six examples of these protease inhibitory molecules are shown below, all containing at least one occurrence of a pharmacologically active "fingerprint" as shown in group B:
the six protease inhibiting compounds were selected for testing using the method of the present invention. Each molecule significantly inhibits the associated protein, their IC50s is in the range of 0.15 to 15. mu.M. Each of the six compounds contains a pharmacologically active chemical determinant identified by the method of the present invention, wherein the substructures are represented by bold black lines, see panel B, above. Some of these compounds actually contain more than one fingerprint variant, such as the tetracyclic structure shown in the lower right hand corner of the above figure.
Therefore, the set of compounds compiled based on the representative fingerprints shown in group B delivered active molecules at least 8.7 times more efficiently (p < 0.0001) than the set containing 1680 compounds initially tested. Furthermore, 52 rationally identified compounds were found to be selective for the relevant proteases, in that most compounds (> 90%) had no inhibitory activity against the relevant proteases belonging to the same enzyme family at the tested concentration of 5 μ M, and also against the other 12 drug targets tested under the same conditions.
Example 5 rational identification of novel Selective phosphatase inhibitors
The present invention has developed an assay for phosphatases which is believed to have some role in receptor sensitization and regulation. The test compounds for the assay are combined according to the method of the invention, tested for them, and new enzyme inhibitors are identified. The first step is to develop the necessary structural data for identifying the chemical determinants of the enzyme inhibitor. The necessary structural data can be generated by testing a collection of 12160 compounds in a screening assay at a concentration of 3 μ M to allow each structure to annotate inhibitory activity. With the 50% inhibition cut-off as the threshold for classification, 15 structures were identified as active, the remaining 12145 compounds were inactive.
The second step is to identify the bioactive chemical determinants contained within the 15 inhibitor structures. To this end, 12160 of the proposed structures were analyzed by selective relevance mixture evaluation (VII), where x represents the number of active chemical structures containing the relevant chemical determinant, y represents the total number of chemical structures containing the same chemical determinant, z represents the total number of active chemical structures within a set containing N molecules (i.e., z-15), and N represents the total number of chemical structures to be analyzed (i.e., N-12145).
(VII) (x/z)-(z-x)/(N-z)
The relevance assessment (VII) is then converted into a scoring function (VIII) which, as will be appreciated by those skilled in the art, relates to an estimate of the risk ratio (risk odds ratio) made using the slope of the regression line representing the degree of shared variance present between the two bisecting variables, which may also be corrected to include the Molecular Weight (MW) of each chemical determinant under consideration.
(VIII) Score=MW.e[(x/z)-(z-x)/(N-z)]
In this context, no other variables than x, y, z or N are used for the scoring function, although it will be apparent to those skilled in the art that equation (VIII) may also be modified to include other variables related to molecular material, biological, chemical and/or physicochemical properties as described above, but they are not limited to the variables cited in example 1. Those skilled in the art will also recognize that other association assessment and/or scoring functions may be used for the same purpose, in place of those represented by formula (VIII), particularly since the ratio of slopes may not be sufficient in some circumstances to distinguish between two closely related chemical determinants. In the sense of the present invention, these scoring functions most suitably comprise various combinations of two, three or four of the variables x, y, z and N.
The 12160 annotated structures were analyzed by calculating the score of a series of chemical determinants using equation (VIII), preserving the structure that produced the most positive number. This analysis led to the identification of 3 different chemodeterminants with molecular weights from 120 to 220Da, with a probability of inclusion in the active chemical structure subset of less than 1/10(p < 0.1) based only on likelihood. Thus, these 3 chemical determinants are considered to represent one or more bioactive groups of the 15 enzyme inhibitors identified in the screen, and constitute the fourth table. The calculations are then repeated using equation (VIII) to determine if a larger chemical determinant resulting from the combination or further expansion of any of the 8 fragments can be identified. The molecular weight of the most statistically significant chemical determinant found in these additional calculations was 255Da, which was selected as a representative backbone, or as a pharmacologically active "fingerprint" of the compound of later choice.
The third step involves virtual screening and selection of compounds using the above representative scaffolds as templates. For this purpose, a substructure search was carried out on a database containing over 800,000 commercially available and proprietary compounds using fingerprints and fragments thereof calculated for this purpose. A total of 1242 compounds were selected based on these searches, and the same pool containing 1280 randomly selected compounds was used as a control as described in example 1.
The fourth and last step of the method involves testing the obtained compound in an enzymatic assay. 34 molecules were selected among 1242 compounds based on a representative backbone, which tested at a concentration of 3 μ M had at least 50% inhibitory activity. Of these compounds, there are 8 compounds with IC50s is in the sub-micromolar range, one of the compounds being designated compound E, its IC50Was 87nM (FIG. 14).
FIG. 14 shows the effect of Compound E on phosphatase-dependent protein dephosphorylation. The relevant phosphatase is incubated with the phosphorylated peptide matrix in the presence of compound E, during which the concentration of compound E is increased. Matrix dephosphorisation as measured by malachite green release of free phosphate into the reaction mediumAnd (4) acidifying. Compound E significantly inhibits phosphatase-dependent dephosphorylation, IC thereof50Was 87 nM.
Of the control 1280 randomly selected compounds, only 2 had inhibitory activity in the screening assay, with the IC of the most potent compound50Only 1.8. mu.M. Therefore, the efficiency of active molecule delivery based on the representative fingerprint compiled compound set was at least 17.5 times higher (p < 0.0005) than the randomly selected compound set and 22.3 times higher (p < 0.00001) than the first 12160 compounds of the company compound set.
Finally, the present inventors have found that compound E represents a novel class of phosphatase inhibitors that has never been reported to be 20-fold more selective for the relevant target when tested in a selectivity assay with structurally and functionally related alternative phosphatases.
Example 6 increasing the efficacy of the chemical series
The present invention can also be used to increase the efficacy of a chemical series. To illustrate, a collection of 1251 compounds, 25 of which have at least 40% inhibitory activity, was tested at a concentration of 3 μ M in a protease assay. Analysis of these structures as described in example 1 resulted in the identification of a number of chemical determinants, one of which was found to have a probability of appearing between 7 of the 25 protease inhibitors of less than 1/10,000(p < 0.0001) based on probability alone. Unfortunately, 7 compounds containing this determinant had only moderate inhibitory activity (IC)50Mean 3.4 μ M ± 1.34 μ M, n ═ 7), making them unattractive for chemical tracking. As a result, the determinants are considered to represent the biologically active groups of the relevant inhibitor, either for direct use as a representative backbone, or a pharmaceutically active "fingerprint" for the selection of other compounds.
To do this, a database containing over 100,000 commercially available compounds was screened for relevant determinants and 142 molecules were selected for additional testing. Of these 142 compounds, 11 had inhibitory activity in the submicromolar range, IC50The average value of (2) is 0.48. mu.M. + -. 0.09. mu.M (n ═ n)11,IC50Much less than the previous value, p < 0.05). Therefore, the method of the present invention can significantly increase the pharmacological efficacy of a chemical series.
Example 7 increasing the Selectivity of the chemical series
The invention can also be used to increase the selectivity of a chemical series. To illustrate, a collection of 3360 compounds, 22 of which had at least 40% inhibitory activity, was tested at a concentration of 3 μ M in a kinase assay called kinase 1. Analysis of these structures as described in example 2 resulted in the identification of a number of chemical determinants, one of which was designated "determinant 10", with a probability of occurring between 3 of the 22 kinase inhibitors of less than 1/20(p < 0.05) based only on likelihood. Unfortunately, the selective assay performed on 4 other kinases found that determinant 10 was also an important component of another kinase, termed an inhibitor of kinase 2, indicating that selective inhibitors of kinase 1 could not be studied based on determinant 10 alone. In fact, 3 compounds containing determinant 10 are equivalent to both kinases, IC for kinases 1 and 250The mean values were 7.24 μ M ± 3.81 μ M (n ═ 3) and 21.5 μ M ± 9.29 μ M (n ═ 3), respectively, which indicated a selectivity ratio to kinase 1 of only 2.98.
In this regard, 3360 compounds that had been tested for kinase 1 were tested against kinase 2 at a concentration of 3 μ M, with 92 compounds having at least 40% inhibitory activity. The table containing 3360 structures, annotated as active on kinases 1 and 2, was subsequently analyzed according to the method of the invention using a correlation evaluation (III) which converted it into a scoring function (IX), in which x1Denotes the number of chemical structures, x, active on kinase 1 containing the relevant chemical determinant2Denotes the number of chemical structures active on kinase 2 containing the relevant chemical determinant, y denotes the total number of chemical structures containing the chemical determinant, z1Indicates the total number of chemical structures (i.e., z) active on kinase 1 within a set of N molecules1=22),z2Representing a set of N moleculesThe total number of chemical structures within the pool that are active for kinase 2 (i.e. z is 92, and N represents the total number of chemical structures to be analyzed (i.e. N is 3360).
One skilled in the art will consider the scoring function (IX) as a way to compare relative risks (relativisk) so that one can identify chemical determinants that are most likely selective for one of the other kinases. In this context, it is obvious to the person skilled in the art that the formula (IX) can be modified to include other variables related to molecular material, biological, chemical and/or physicochemical properties as described above, but they are not limited to the variables cited in example 1. Finally, those skilled in the art will also recognize that other relevance evaluation and/or scoring functions may be used for the same purpose, in place of those represented by formulas (III) and (IX). For example, the relevance assessment (I) can be used in scoring function (II) and the score obtained for kinase 1 activity can be subtracted from the score obtained for kinase 2 activity, or conversely, the score obtained for kinase 1 activity can be divided by the score obtained for kinase 2 activity. Many other methods are possible within the meaning of the invention, which are most suitably performed using functions comprising various combinations of two, three or four of the variables x, y, z and N.
The score of a series of chemical determinants was calculated using equation (IX) to identify a number of chemical determinants selective for kinase 1, one of which was designated "determinant 11" and consisted of determinant 10 substituted with another chemical motif. The result is that determinant 11 is considered to represent a pharmacologically active group of selective inhibitors of kinase 1, either as a representative backbone or as a pharmacologically active "fingerprint" of a compound of choice in the future. For this purpose, a search for substructures was performed using determinant 11 and fragments thereof in a database containing over 400,000 commercially available compounds. Based on these searches a total of 498 compounds were obtained, which, after two assay tests, yielded 3 inhibitors containing determinant 10, whichIC in kinase 1 and 2 assays50The s mean values were 0.94 μ M ± 0.52 μ M (n ═ 3) and 31.6 μ M ± 4.41 μ M (n ═ 3), respectively. This result indicates that the series has a 11-fold higher selectivity for kinase 1 over kinase 2, demonstrating that the method of the invention can increase the pharmacological selectivity of the related chemical series.
Example 8 rational identification of a series with multiple pharmacological effects
The present invention develops a functional assay for ligand-controlled ion channels believed to have some role in immune responses. The test compounds for the assay are combined together, tested according to the method of the invention, and new ion channel blockers are identified. The channels under investigation are generally considered to belong to a family of targets that are permeable to sodium ions, activated by purine nucleotides, and inhibited by some sodium channel blockers. In such a case, since the possibility of rapidly recognizing a ligand-controlled related ion channel inhibitor is increased, it is decided to recognize a pharmacological fingerprint having both the doubling ability of a mimetic purine nucleotide and the inhibition of a sodium channel.
The first step of the process involves compiling two chemical structure tables with reference to the existing literature. The first table contains 79 structures described as sodium channel inhibitors. The second table contains the structures of 2367 purine-nucleotide binding protein inhibitors (see in particular example 2). The second step of the method is to identify the bioactive chemical determinants contained on both chemical structure tables. To this end, each table was supplemented with more than 100,000 molecules with no effect on the relevant surrogate target, which was analyzed by selective relevance subtractive evaluation (I) as described in example 1, converting it into a scoring function (X), where X1Denotes the number of chemical structures, x, which are active in the sodium channel and contain the relevant chemical determinants2Represents the number of chemical structures having activity on purine-nucleotide binding proteins and containing the same chemical determinant, y1Denotes the total number of structures containing a chemodeterminant in a structural table annotated as having a sodium channel blocking effect, y2Expressed in the expression of the protein having purine-nucleotide binding inhibitionTotal number of structures containing chemodeterminants, z, in the effected structural list1Is represented by containing N1The total number of structures within the collection of molecules that inhibit sodium channels (i.e., z)1=79),z2Is represented by containing N2The total number of chemical structures within the collection of molecules that contribute to the purine-nucleotide binding protein (i.e., z)22367), and N1And N2Each representing the total number of chemical structures to be analyzed within the structure table being annotated.
One skilled in the art will consider the scoring function (X) as a means to combine two different association tests to allow one to identify the chemodeterminants most likely to act on both the sodium channel and the purine-nucleotide binding protein. In this context, it will be apparent to those skilled in the art that the scoring function (X) may also be modified to include other variables related to molecular material, biological, chemical and/or physicochemical properties as described above, but they are not limited to the variables cited in example 1. Those skilled in the art will also recognize that other relevance evaluation and/or scoring functions may be used for the same purpose, in place of those represented by formulas (I) and (X), particularly since scoring function (X) does not take into account the different ranges that exist between the two dataset proportions, but rather always requires that these proportions be comparable, and, moreover, requires that N be equal1And N2In comparison, the values are greater than 20. For example, one might weight the results for datasets with significantly different sample sizes with a scoring function based on a weighted average of the proportional differences (see example 21). In addition, one may also want to calculate the inclusion of a third, or fourth, or ith pharmacological property, in which case it will be apparent that formula (X) can be extended to its more general form (XI) in which d represents the number of tables of compounds to be analysed and the resulting scores can be directly referenced to a standard correct distribution table to determine the likelihood of finding one or more chemical determinants which are the basis of all the pharmacological properties under consideration. In the sense of the invention, it is also possibleThere are many other ways in which these scoring functions most suitably employ scoring functions that include various combinations of two, three or four of the variables x, y, z and N.
The structure yielding the maximum value greater than 2 is retained by calculating the scores of a series of chemical determinants using equation (X) in order to analyze the two structure tables that have been annotated. This analysis resulted in the identification of a chemical determinant with a probability of occurring within a subset of active chemical structures of less than 1/20(p < 0.05) based solely on likelihood. Thus, this chemical determinant, designated "determinant 12", is considered to represent one or more biologically active groups of sodium channel and purine-nucleotide binding protein inhibitors, either for direct use as a representative backbone, or as a pharmacological activity "fingerprint" for the compound of choice in the future.
The third step of the method involves virtual screening using the representative skeleton as a template. To this end, a search for substructures was performed using determinant 12 and fragments thereof in a database containing more than 250,000 commercially available compounds. A total of 800 compounds were obtained based on these searches, using the same collection of 1280 randomly selected compounds as controls as described in example 1.
The fourth and final step of the method involves testing the obtained compound in an ion channel assay. 23 compounds were selected from 800 molecules according to determinant 12, which tested at a concentration of 3 μ M had an inhibitory activity of at least 40%. Of these compounds, there are 3 compounds of IC50s is in the sub-micromolar range, one of the compounds being designated compound F, its IC50Is 145nM ± 56nM (n ═ 4). In the control test, 1280 compounds were randomly selected, of which only one molecule had a more significant inhibitory activity in the lower micromolar range, and whose chemical structure actually contained the majority of determinants 12. Interestingly, the same assay containing 800 when tested with a kinase believed to have some role in the immune responseIn a collection of compounds, 8 compounds were found to have at least 40% inhibitory activity at the tested concentration of 5 μ M, IC of compound F50Is 1.2. mu.M, and the other compound is named Compound G, IC thereof50Is 137nM ± 48nM (n ═ 4). It was also found that compound F, G and many closely related molecules with structures containing determinant 12 also inhibited sodium channels, typically 50-100% at 1 μ M. Taken together, these results demonstrate that the methods of the present invention allow for the selection and/or design of compounds with multipharmacological properties that are relevant for the development of drugs for the treatment of multifactorial pathologies, such as, but not limited to, inflammation. It is clear that the method of the invention is equally useful for introducing new pharmacological properties into chemical series that originally lacked these properties.
Example 9 summary of bioactive chemical determinants
In a preferred embodiment of the invention, the method is used to compile a table of bioactive chemical determinants, which in turn can be used as a reference database for rational drug design, such as computer-controlled decision making programs for medicinal chemistry. For illustration, reference is made to the scientific literature to combine tables containing 25 pharmacologically active molecules, each table comprising the chemical structure of a compound having a given pharmacological property, e.g., sigma receptor binding, dopamine D2Receptor agonistic effects and estrogen receptor antagonistic effects. Each table is then analyzed according to the method of the invention by selecting the relevance assessment (III) as described in example 2, converting it into a function (IV) which is used to calculate the various chemical determinants contained in the table or tables to be analyzed. These calculations ultimately identify a large number of pharmacologically active chemical determinants, 3 of which are listed in the table below, which are part of the resulting matrix:
this table provides a reference table for pharmacologically active chemodeterminants. Combining a table containing 25 structuresTogether, and the structural tables contain molecules with one of the 25 different pharmacological properties, these structural tables are analyzed according to the method of the invention using the association assessment (III) and the scoring function (IV). These 25 properties include the ability to bind sigma receptors (sigma ligands), dopamine D2Receptor agonistic effect (D)2Agonists) and estrogen receptor antagonistic effects (estrogen antagonists). The table above shows a small portion of the resulting 26-column matrix. A value greater than 1 indicates that the probability of a given chemical determinant occurring by chance within a collection of molecules having the same pharmacological property is less than 1/20, indicating that this determinant is most likely to be the molecular basis of the same property. The above table constitutes a repository of bioactive determinants or "fingerprints" that can be used as a reference table for decision making in drug discovery and development.
The table obtained is explained below. Chemical structure Compounds containing determinant 13 have dopamine D2Receptor agonist properties are more likely than with sigma receptor binding or estrogen receptor antagonist properties, i.e. 8.12 > 1.85 > 0.05. In contrast, determinant 13 is the construction of potential dopamine D2A preferred determinant of the receptor agonist set is 8.12 > 2.93 > 0.00. Similarly, compounds whose chemical structure contains determinant 14 are more likely to have sigma receptor ligands than dopamine receptor agonists or estrogen receptor antagonists, i.e. 2.4 > 0.00-0.00. And determinant 14 is the preferred determinant for assembling the sigma receptor ligand set, which is 2.40 > 1.85 > 0.91. Finally, compounds whose chemical structure contains determinant 15 are most likely to have estrogen receptor inhibitory properties, i.e., 28.17 > 2.93 > 0.91. In addition, determinant 15 is a preferred fingerprint to compile a pool of potential estrogen receptor antagonists, 28.17 > 0.05 > 0.00.
It will be apparent to those skilled in the art that other relevance assessment and/or scoring functions may be used to construct such tables in place of those represented by formulas (III) and (IV). One skilled in the art will also recognize that the scoring function used may include other variables related to molecular material, biological, chemical and/or physicochemical properties as described above, but they are not limited to the variables cited in example 1. Those skilled in the art will also recognize that the scoring function or process may be modified to include a weighting or normalization step to make it easier to compare the scores to each other, and that the table is modified by constructing the table using 3 closely-sized samples, but other data sets need not be so modified. Finally, it is clear that in the same way a reference structural table can be compiled that is used in the discovery process to calculate scores for other relevant properties, such as, but not limited to, general therapeutic use, toxicity, absorption, distribution, metabolism and/or excretion.
Example 10 prediction of the second pharmacological Effect of the molecule
The invention can also be used to predict the secondary effect of a molecule. To illustrate, a new class of ion channel blockers is identified as shown in example 3. As previously described for other inhibitors of this same channel, the basic chemical structure of the novel chemical family of inhibitors comprises the chemical determinant shown in group B of example 3, particularly the form of determinant 5 shown in group a of example 3. Comparing determinant 5 with the determinants contained in the above table, specifically, since the chemical structure of determinant 5 is identical to that of determinant 14, the probability of binding of the relevant inhibitor to sigma receptor is very high. Therefore, at σ1And σ2Channel blockers containing determinant 5 were tested in a receptor binding assay and found to have sub-micromolar affinity for both sites. Thus, these results demonstrate that the score calculated by the method of the invention can predict the second effect of a chemical series, being extremely useful in the progression of medicinal chemistry series.
Example 11 identification and prediction of toxic effects of molecules
It is clear from the above examples that the method of the invention can also be used to identify toxic chemodeterminants contained in pesticides, herbicides, insecticides etc., by merely analyzing the structure table already annotated, except that the pharmacological properties are replaced by toxicological properties. In this context, the invention can be used directly to identify toxic chemical families with higher efficacy, selectivity and/or broader action, which families are used for example in agrochemical projects to protect crops.
In addition, a reference table or database of toxic chemical determinants can be compiled using the present invention in the same manner as described in example 9. These tables are then used to estimate the likelihood that a chemical series will have a given toxic effect, for example in screening food additives and environmental pollutants.
To illustrate the possibility of predicting toxic effects in a drug study setting, 4480 compounds were tested for treatment of inflammation with related cellular phosphatases. A total of 25 compounds in the assay had at least 40% inhibitory activity at the test concentration of 10. mu.M, IC of all compounds50s is in the lower micromolar range. Analysis of the results according to the method of the present invention identified 2 most likely basic, molecularly distinct chemical determinants of pharmacological activity, designated determinants 16 and 17. Since both determinants are present in equivalent molecules and both are capable of producing chemical families that are equally suitable for chemical tracing, it was decided to choose between them based on predicted toxic side effects.
For this reason, comparing determinants 16 and 17 with the structures contained in the toxicological database, it was found that the molecules whose structure contained determinant 16 had a much higher probability of being cytotoxic than the compounds containing determinant 17 alone. This represents a loss of interest in the development of phosphatase inhibitors containing determinant 16 due to the inherent cytotoxicity of pharmacological fingerprints. This hypothesis has been experimentally confirmed by measuring cell viability using standard MTT assay techniques even when cultured cells are contacted with both classes of inhibitors at a concentration of 1. mu.M, and it was found that all compounds containing determinant 16 induced cell death within 24 hours of application, while most compounds containing determinant 17 did not. These results do therefore demonstrate that the method of the invention can identify or predict chemical families that are most likely to have toxic properties in a given setting. It is clear that the same calculations can be performed herein using, for example, mutagenesis data (Ames test), P450 isoenzyme inhibition data, or any other data generated in relation to toxicity tests.
Example 12 identification of groups with biological Activity in receptor ligands
One cell surface receptor was selected as a relevant target for the control of some endocrine disorders. This receptor is activated endogenously by the nonapeptide hormone produced by the pituitary. Reference is made to the scientific literature for a table of chemical structures for said same receptor. This structural table was then analyzed according to the method of the invention, using a correlation evaluation, a scoring function (IV) and a series of chemical determinants consisting of fragments of 20 amino acids (glycine, alanine, valine, leucine, isoleucine, proline, serine, threonine, tyrosine, phenylalanine, tryptophan, lysine, arginine, histidine, aspartic acid, glutamic acid, asparagine, glutamine, cysteine and methionine) and having a peptide backbone structure (NH-CH-CO-)3The fragments are complementary. The following are examples of some of these determinants:
tryptophan No.18 No.19 No.20 No.21
No.22 No.23 No.24 No.25 No.26
Peptide backbone No.27 No.28
No.29 No.30 No.31 No.32 No.33
These are examples of amino acids for analysis and chemical determinants derived from the peptide backbone. With reference to the scientific literature compiled receptor ligand tables, the method according to the invention was used to evaluate the relevance (III), the scoring function (IV) and a series of various fragments of 20 amino acids in peptide backbone structure (NH-CH-CO-)3The fragment complementation chemodeterminants analyze this structural table. The top two rows show some examples of determinants derived from tryptophan. These determinants are either exact segments (e.g., determinants 18, 19, 20, 21, and 26), exact segment combinations (e.g., determinant 22), non-exact segments (e.g., determinants 23, 24, and 25), or combinations of exact and non-exact segments (not shown). The lower two rows are derived from the peptide backbone structure (NH-CH-CO-)3Examples of derived determinants show the exact fragments (determinants 29, 31, 32) and the non-exact fragments (determinants 27, 28, 30, 33). Symbol: a represents carbon or sulfur; b represents carbon or nitrogen; e represents carbon, nitrogen, oxygen or sulfur.
The scores for these fragments were calculated using equation (IV) to identify a number of chemical determinants with scores greater than 1, which indicates that the probability that the corresponding structure is contained within the subset of active chemical structures based on likelihood only is less than 1/20(p < 0.05). Examples of these determinants and their respective scores are shown below:
No.34 No.35 No.36 No.37
3.09-1.17-1.06-3.78-3.09-points
No.38 No.39 No.40 No.41
Score 2.12 score 1.18 score 1.92 score 2.83 score
These are examples of high scoring chemodeterminants identified in the first round of analysis. The receptor ligand set is analyzed according to the method of the invention, i.e. the scoring function (IV) is used to calculate the scores for the chemical determinants previously shown and for a number of other chemical determinants. A value greater than 1 indicates that the determinant has a probability of occurring within the subset of receptor ligands based on likelihood alone of less than 1/20. The upper panel shows some of the higher scoring chemical determinants identified in the present method.
Thus, these determinants are considered to represent one or more amino acids contained within the main sequence of the peptide hormone, which are combined into a second table. The calculations are repeated using equation (IV) to identify the highest scoring combination of these new determinants, many of which score greater than 10. The structure of the highest ranking chemical determinant was designated determinant 42, which was compared to 800 dipeptide structures consisting of various combinations of 20 amino acids to determine that only one was designated A1-A2The entire structure of the dipeptide sequence of (a) comprises the determinant 42. This result indicates that the relevant hormone is most likely to include A at some position within its main structure1-A2The sequence, and at least one of the two amino acids, plays an important role in the binding of estrogen ligands to their receptors. After passing through the channelsConfirmation of the sequence of the element found it to be indeed as comprising A1-A2Sequence, calculated to have a probability of its occurrence of only 0.019 based on probability only. Interestingly, other experiments are shown in A1-A2A of sequence2Comprising a mutation at a position (e.g. A)1-A3Or A1-A4In place of A1-A2Wherein A is1、A2、A3And A4Are different amino acids) have a very low affinity for the receptor, indicating that indeed at least one of the two predicted residues constitutes an important group supporting the biological function of the relevant hormone. Taken together, these results demonstrate that the methods of the invention can identify bioactive groups of peptide ligands, which are useful in medicinal chemistry programs for rational design of, for example, peptidomimetic enzyme inhibitors and/or receptor ligands.
Example 13 prediction of protein-protein interactions
The present invention can also predict the presence of protein-protein interactions in a manner similar to the previous example. To illustrate, ion channel screening as described in example 3 resulted in the identification of more than 24 molecules with at least 40% inhibitory activity at the tested concentration of 5 μ M. The chemical determinants of these inhibitors are organized into a table which is analyzed as in example 12. This analysis led to the identification of a series of high scoring amino acid and peptide backbone derived chemical determinants, which upon further analysis, were found to be most likely to contain exactly what was termed A5-A6Inhibit peptide or protein interactions of a certain dipeptide sequence. Interestingly, these suppressor proteins have been described in the literature, all of which contain a 20 amino acid "channel suppression" region that contains exactly the predicted A5-A6A dipeptide sequence. It was determined that the probability of any one sequence of 20 amino acids containing a given sequential arrangement of two given residues based on random probability was only 0.046, and it was estimated that based on the probability in this and the previous examples, the correct prediction of the occurrence of two unrelated eggs was achievedThe probability of the presence of two different dipeptide sequences of the white matter is less than 1/1097. However, both examples made correct predictions, confirming that the present invention can identify and/or predict the presence of certain classes of protein-protein interactions. The present invention is simple to implement, as long as the amino acid sequence contained by the most likely chemical determinant is identified within the pharmacologically active structural subset, and then the sequence database is searched for proteins containing the relevant amino acid sequence. This process is described in example 14 below. In this context, it is obvious to the person skilled in the art that this method is not limited to the identification of dipeptide sequences, but that tripeptide and even tetrapeptide sequences can also be detected, depending on the structure of the pharmacologically active compound to be analyzed. It is clear that non-peptide ligands can also be used in a similar way, i.e. this method is suitable for detecting e.g. carbohydrate sequences (i.e. sugars), nucleotides etc.
Example 14 identification of orphan ligand-receptor pairs
The present invention is also useful for identifying lone ligands and/or lone ligand-receptor pairs. This approach first compiles a table of chemical structures that have a given effect (usually binding) on the protein of interest, but the ligand is not known to have this effect at the time of the study. There are a number of ways in which this information can be generated, such as, but not limited to, performing nuclear magnetic resonance studies, measuring conformational changes by circular dichroism, measuring protein-ligand interactions by surface plasmon resonance, or, in the case of orphan ligands, assaying with mutants in which the relevant receptor is constitutively activated.
To illustrate this concept, we hypothesize that the above-described class of experiments was performed on a lone ligand, resulting in a structure represented as follows:
this is a table of hypothetical structures for analyzing bioactive chemical determinants. The 9 structures shown above were analyzed according to the method described in example 2 of the present invention using the amino acids described above and a table of chemical determinants derived from the peptide backbone.
Analysis of some of the structures described in example 12 enables the identification of a number of amino acids with a score greater than 1 and chemical determinants derived from the peptide backbone. Examples of these determinants and their corresponding scores are shown below:
No.43 No.44
4.43 score 4.90 score
These are examples of high scoring chemodeterminants identified in the first round of analysis. The set of putative receptor ligands is analyzed according to the method of the present invention, using a scoring function (IV) to calculate the scores for the first set of chemical determinants shown in example 12 and for a number of other chemical determinants. Scores greater than 1 indicate that the determinant is less likely to occur within the ligand subset than 1/20 based on likelihood alone. The two higher scoring chemical determinants identified in the present method are shown above.
From these examples it is clear that determinants 43 and 44 can only be included in the chemical structure of the amino acids phenylalanine and tyrosine. It is concluded that peptides that interact with orphan receptors may contain tyrosine or phenylalanine residues in their sequence that may play a central role in ligand binding and/or receptor activation by these peptides. Next, if the high scoring determinants 43 and 44 are further analyzed, it can be determined whether the combination with other amino acid fragments will not result in a higher scoring structure, and fragments as shown in panel A below can also be identified, such as determinant 45.
The two sets of graphs show the high scoring chemodeterminants identified in the second round of analysis. Those chemical determinants described previously were further analyzed according to the method of the present invention to determine if combination with other amino acid fragments would not result in a higher scoring structure. One of the structures is designated determinant 45 (group A) with a score of greater than 40. Interestingly, the entire structure of determinant 45 is contained within the structure of the dipeptide sequence Tyr-Gly (group B), and thus it can be concluded that the endogenous ligand of the relevant orphan target contains a Tyr-Gly dipeptide sequence within its main structure.
Clearly, since the entire structure of determinant 45 is contained within the structure of the dipeptide sequence tyrosine-glycine (Tyr-Gly), it can be concluded that the orphan ligand we are looking for most likely contains the Tyr-Gly sequence somewhere in its main structure. Based on this information, amino acid sequence databases were screened to identify known and/or orphan ligands containing the predicted Tyr-Gly sequence, which were tested in an initial biochemical screening assay after selection and expression. Alternatively, the set of potential Tyr-Gly analogs can be compiled directly using the chemodeterminant 45.
Finally, it is noted that the chemical structures used in this example are actually opioid receptor agonists taken from the literature, the naturally occurring opioid receptor agonists dynorphin a, β -endorphin, enkephalin and dynorphin all containing the predicted Tyr-Gly sequence within their main structure. Since tyrosine was found to be absolutely required for opioid agonist activity, this example also demonstrates that the invention is capable of recognizing the bioactive group of the receptor ligand. It has been found that the estimation can be made more accurate with another algorithm using the variables x, y, z and N, such as Fisher's exact test. In fact, 9 structures were analyzed in a less-corrected way for small sample sizes, and the results showed that determinant 45 had a score of 41.96, which might be slightly overestimated.
Example 15 identification of drug target endogenous modulators
It will be apparent to those skilled in the art that the present invention may also be used to identify drug target endogenous modulators. To illustrate, the present invention develops a functional assay for the relevant ion channel in the treatment of neurodegeneration. The pool of compounds was screened as described in example 2 and the resulting inhibitor tables were analyzed for the presence of bioactive chemical determinants. This analysis results in the identification of high scoring chemodeterminants that are found to be contained within a subset of molecules endogenously produced by the eukaryotic cell. Following purchase of the corresponding compound, and testing in an assay, the relevant channel was found to be selectively inhibited by a sub-micromolar concentration of a particular subset of cellular phospholipids, and most interestingly, the relevant channel was associated with neuronal apoptosis previously via an unknown mechanism via other groups. Taken together, these results demonstrate that the present invention allows the identification of drug target endogenous modulators.
Example 16 identification of false Positive test results
The present inventors have developed an enzymatic assay for protein kinases believed to have some effect on the immune response. The compounds screened for the targets are combined according to the method of the invention, in particular as described in example 2. Compounds within the pool were then tested in the assay at a concentration of 5 μ M, resulting in the identification of 35 molecules with at least 40% inhibitory activity. By simply varying formula (II) and analyzing the structures of these compounds using the modified formula as a scoring function, and directly comparing the corresponding scores to the scores of a statistical table, it is possible to estimate the probability that a given chemical determinant will occur between subsets of 35 pharmacologically active compounds based on probability only.
With the probability of accidental occurrence, p < 0.05, as a threshold, it was determined that 14 of the 35 inhibitors were most likely to represent false positive results. This 14 compounds were then tested in the assay, confirming such a hypothesis that the present invention allows the identification of false positive experimental results.
Example 17 identification of false negative test results
By performing calculations similar to those described in example 16, the present invention can also identify false negative test results. To illustrate, a series of phosphatase inhibitors were analyzed for the presence of pharmacologically active chemical determinants in their chemical structure as described in example 16. Using the highest scoring chemodeterminants obtained as pharmacological activity "fingerprints", a substructure search was performed on a chemical structure table corresponding to the compound initially tested in the assay. Such a search finds many molecules that contain one or more of the above chemical determinants, but are still considered negative in the screening assay. The corresponding compounds were then retested in the assay and found to be false negative for more than 15% of the compounds, with one compound having even sub-micromolar inhibitory activity. These results clearly show that the method of the invention can identify false negative experimental results.
Example 18 quantitative configuration and conformation analysis
In a modified embodiment of the invention, one can quantitatively analyze configurations and/or conformations using algorithms that include various combinations of variables x, y, z and N. To illustrate this possibility, the results of example 4 show that the "fingerprint" of the pharmacologically active protease inhibitor shown in group B of example 4 has neither configuration nor conformational limitations. In fact, for both carbonyl or sulfonyl groups, it is not possible to distinguish from the structural formula whether the single bond form of the pharmacologically active fingerprint is in the inverted or forward conformation, or, in addition, in the case where the same structure is in the form of a double bond, it is not possible to distinguish between the (E) or (Z) conformation of the active fingerprint. The reason is that the calculations performed in example 4 identify the chemical determinants that are most likely to be the basis of protease inhibitory activity without taking into account the conformation and/or configuration such a determinant may have. Given the fact that many pharmacologically active structures contain double bonds and/or ring systems, which act to conformationally constrain chemical determinants by reducing their total number of rotational bonds, the present invention can be used to determine which conformation and/or configuration of a given chemical determinant is most likely to be pharmacologically active.
To illustrate, the 6 (protease inhibitory) structures shown in example 4 were analyzed by calculating a series of scores for conformational and configurational-defined chemical determinants derived from the structures shown in group B of example 4 using the scoring function (IV).
No.46 No.47
Score 36.90 score 14.10
This set of figures represents a quantitative conformation/configuration analysis of the chemical determinants that inhibit proteases. The 6 structures shown in example 4 were analyzed according to the method of the present invention using a chemical determinant table defined by conformation and configuration.
The chemical determinant 46 shown in the figure is one of the highest scoring determinants, and the next lower scoring chemical determinant 47, so it can be concluded that the (Z) configuration of the double bond form fingerprint is more likely to be the preferred arrangement for inclusion within the chemical structure of the relevant protease inhibitor. This hypothesis was later confirmed by another high-throughput screening of the aggregation type delivering large amounts of protease inhibitors, and in fact, the pharmacological activity fingerprint of these inhibitors was limited to the (Z) or "cis" configuration, which is rarely the case.
Taken together, these results demonstrate that the methods of the invention can recognize the biologically active conformation and/or configuration of a chemodeterminant. Finally, it is also known that many different algorithms containing various combinations of variables x, y, z and N are available to perform such calculations. It should be noted that the scoring function may also make the estimation described herein above more accurate if it includes other variables, such as, but not limited to, variables that take into account the pharmacological efficacy of the chemical structure.
Example 19 similarity search
As can be seen from the above examples, the concept of molecular similarity considered by the method of the invention is clearly different from the generally accepted definition of this term. For example, the compounds in the hypothetical Table of example 14 are very different and, to this extent, there has been no clear way to put these 9 compounds into a single chemical class using conventional clustering techniques. However, we show in example 14 that these compounds are very similar in nature, in that each compound contains at least one occurrence of a chemical determinant which is a representative fragment of the amino acid tyrosine; see the following figures:
these are amino acid tyrosine fragments encompassed by the 9 opioid receptor agonist structures. The structures shown above are not identical, so it is difficult to group these 9 structures into a single chemical class using conventional clustering techniques. However, they are again very similar in the sense of the present invention, since they all contain at least one chemodeterminant segment defined by the amino acid tyrosine, which segments are indicated by thick black lines.
Thus, with the present invention it is readily possible to measure molecular similarity and/or compare the similarity that may exist between different compound collections. To briefly illustrate this concept, one or more reference molecules are readily selected from a list of chemical structures, analyzed for certain chemical determinants, and, after identification, used to perform one or more substructure searches on one or more new molecules to determine whether the new molecules have similarities to the first population of molecules. The test molecules can be assigned a value reflecting the degree of similarity to the original set of reference compounds by calculating the scores of the corresponding chemical determinants using the scoring function described in the previous example and calculating the scores of the new chemical structures based on, for example, the number of different determinants they may contain. This approach is useful in designing a collection of aggregated compounds for drug discovery, as it allows researchers to quickly identify compounds that have a high degree of similarity to a pharmacologically active reference compound in the sense of the present invention.
Example 20 analysis of the diversity of a Compound pool
The invention can also be used to analyse the diversity of a collection of compounds in a manner similar to the previous example. It will be apparent to those skilled in the art that the chemodeterminant concept herein can be readily used to compare a given set of compounds with other sets of compounds. For example, a collection of compounds for high throughput screening can be selected by analyzing the corresponding chemical structure table according to the method of the invention, wherein the chemical structures contained in a reference collection of chemical structures, such as the Merck index, Derwent, MDDR or Pharmaprojects database, are used as a reference collection of "drug similarity" molecules. In this case, molecules whose structure essentially comprises a low scoring chemical determinant are considered "drug similarity" because the proportion of the same chemical determinant in the reference structure is higher. In contrast, molecules whose structure essentially comprises a high scoring chemical determinant are considered "drug dissimilarity" because the proportion of the same chemical determinant in the reference structure is low. This information is useful for the design of discovery experiments because it can help researchers identify chemical structures from a set of screened compounds that should or should not be included. It is apparent that there are many algorithms herein that include various combinations of the variables x, y, z and N to achieve this goal.
Example 21 Special Algorithm
It is apparent that the previous embodiments do not provide a complete table including each algorithm using various combinations of variables x, y, z and N as a discrete substructure analysis. It will be apparent to those skilled in the art that the scoring functions (XII), (XIII) and (XIV) herein can be used to address many of the problems of the previous embodiments. In fact, for some cases it may be even more appropriate to replace the formula explicitly set forth in the examples by one of these formulas in the statistical sense of the term. However, since the present invention is primarily intended to identify chemical determinants contained in a chemical structure table that are most likely to be the basis for a given biological effect, we are primarily concerned with the relative scores and subsequent ranking of the chemical determinants. However, formulae (XII), (XIII) and (XIV) may be used in the following cases: a) the small sample set requires an accurate estimate of the probability of accidental occurrence (see XII, where s corresponds to the smallest value among the variables x, (y-x), (z-x), and (N-y-z + x)); b) a proportional weighted estimation of the simultaneous contribution of the two determinants is considered to be more appropriate for use in example 8 (see XIII, where d corresponds to the number of individual chemical determinants); alternatively, c) it is crucial to estimate the order effect when assessing the simultaneous contribution of two chemical determinants linked to each other (see XIV). The variables x, y, z and N in the formula are defined exactly as previously.
Finally, it will be apparent to those skilled in the art that the use of variables in scoring functions and/or algorithms designed to identify bioactive chemical determinants but not explicitly recited in the preceding examples is mathematically equivalent to the use of various combinations of variables x, y, z and N. To illustrate this, the scoring function using the variable q is equivalent to using x and y, since q is y-x, and q is defined as the number of inactive molecules representing a chemical structure containing a given chemical determinant. Similarly, the scoring function using the variable r is algebraically equivalent to using the variables x and z, since it is readily apparent that r is z-x, defined as representing the total number of active compounds that do not contain the given chemical determinant. In addition, the scoring function using variable s is equivalent to using variables x, y, z and N, since s is defined as the total number of inactive compounds that do not contain a given chemical determinant. Finally, the algorithm using the variables t and u is equivalent to using the variables N, y and/or z, since it is readily seen that t-N-y and u-N-z, t and u are defined as the total number of molecules (t) representing structures not containing a given determinant and the total number of inactive molecules (u), respectively.
Example 22 plotting the relative contribution
The present invention may also construct a relative contribution graph. These figures represent chemical structures as curves, where the relative contribution of various atoms, bonds, fragments and/or substructures to a given biological outcome is represented by the scores calculated as described in the previous examples. In a preferred embodiment of the method, the probability scores used are, for example, the scores calculated using the formula (XII), where P (A) represents the probability of inclusion in a subset of biologically active structures based on a randomly given chemical determinant, which is calculated using the formula using various combinations of the variables x, y, z and N as described above.
(XII) score [1-P (A) ]. 100%
It is clear that there are many relevance assessments and/or scoring functions that can estimate p (a). Two examples of relative contribution graphs are discussed in detail below. The following figure shows the relevant molecules and a series of chemical determinants comprising the same molecular fragment, whose scores were calculated using formula (XII) and modified Association assessment (I), to determine P (A).
No.46
The related molecular score is 12%
No.47 No.48 No.49
Score 10.4% score 14.7% score 12.3%
No.50 No.51 No.52 No.53
23.8% score, 56.2% score, 63.0% score, 92.9%
No.54 No.55 No.56 No.57
98.1% score, 12.0% score, 0.3% score, 0.0%
Score=90.17% Score=12.0% Score=0.3% Score=0.0%
FIG. 15 shows the same information in the form of a curve plotted with determinants for their respective scores.
In this context, the same information can obviously be represented by a probability contour, as shown in the following figure:
in summary, these patterns are very useful for designing compound collections, as they can help researchers select compounds based on mathematical estimates of the probability of success in a given assay, reducing the need to rely on the notion of molecular diversity to identify novel bioactive chemical families. They are also relevant to medicinal chemistry, since the above figures clearly show which groups of the molecule can be rationally modified with minimal risk of loss of pharmacological activity. Similarly, these curves also alert the toxicologist which groups of toxic compounds need to be modified to eliminate undesirable effects.
To plot the relative contribution plots shown in FIG. 15 and the upper graph, the probability of accidental occurrence within the active fraction set (P (A)) was directly estimated according to the method of the present invention by calculating the score of the chemodeterminant corresponding to the bioactive molecule fragment using the scoring function of the variables x, y, z and N. Each determinant is assigned a probability score using the scoring function (XII), which reflects the relative likelihood that the corresponding chemical structure is the basis for the relevant biological activity, and the corresponding P (A) value is transformed. These scores can be represented on FIG. 15, which is a graph of the scores of the individual chemical determinants shown in FIG. 15. Chemodeterminant 54 corresponds to the relative maximum of the series. Alternatively, these scores can be represented in the upper graph, which is a probability contour graph showing which segment or region of the relevant chemical structure is most likely to bring about biological activity (determinant 54 is contained within the region bounded by 95% contours). FIG. 11 shows another way of expressing the score.
Example 23 score function equivalence
The scoring functions used in the previous examples may identify chemical determinants that are most likely to be the basis of a given biological, pharmacological and/or toxicological effect. It will be apparent to those skilled in the art that some relevance evaluation and/or scoring functions are best suited for solving only certain types of problems, and that each formula, when used in accordance with the methods described herein, can identify the same highest ranked chemical determinant that is most likely the basis for a given biological effect. Therefore, the formulas represented by the previous embodiments are functionally equivalent in the sense of discrete substructure analysis.
To demonstrate this, 131 dopamine D pairs were evaluated with 8 relevance evaluation and scoring functions containing various combinations of variables x, y, z and N as shown below2The chemical structure 131 of the receptor agonist was analyzed in parallel for a total of 8 times. The study was carried out as described above, mainly on dopamine D2The chemical structures of 101207 molecules with no effect on the receptor are added to the first table at 131, and the scores of the 19 chemodeterminant series shown below are calculated using scoring functions (XV) to (XXIII) which the reader would consider to be the same as those used in many of the previous examples or to vary as closely related thereto.
C N
No.58 No.59 No.60 No.61
No.62 No.63 No.64 No.65
No.66 No.67 No.68 No.69
No.70 No.71 No.72 No.73
No.74 No.75 No.76
These are the chemical determinants that calculate the score using 8 different scoring functions. Using functions (XV) to (XXIII) and for dopamine D2The chemical structure table for receptor agonist activity calculates the scores for the 19 chemodeterminants given above. The function used is
(XV) score MW (x/z)
(XVI) score (x/z) - (y/N)
(XVII) score-Nx-yz
(XXII) score=e[(x/z)-(z-x)/(N-z)]
Fig. 16A to 16H show the corresponding relative contribution plots. The scores of the chemical determinants shown in the figures are calculated according to the method described above and the determinants are plotted against their corresponding scores. Fig. 16A shows the score calculated by function (XV), fig. 16B shows the score calculated by function (XVI), fig. 16C shows the score calculated by function (XVII), fig. 16D shows the score calculated by function (XVIII), fig. 16E shows the score calculated by function (XIX), fig. 16F shows the score calculated by function (XX), fig. 16G shows the score calculated by function (XXI), and fig. 16H shows the score calculated by function (XXII). Each scoring function selects the same chemodeterminant (73), which is most likely the basis for biological activity.
As can be seen from the relative contribution plots shown in FIGS. 16A to 16H, each of the 8 scoring functions correctly identified the corresponding local maximum for chemodeterminant 73, which indicates that it is most likely to be dopamine D among the 19 test determinants2The basis of agonist activity. Interestingly, the different scoring functions are different when ranking the lower scoring chemical determinants, e.g., calculated using scoring functions (XV), (XVI) and (XVII), determinant 62 is third in the importance to biological activity, calculated using scoring function (XXII), determinant 63 is third, calculated using scoring functions (XIX) and (XXI), determinant 65 is third, and finally, tested using scoring functions (XVIII) and (XXII), determinant 66 is third.
In summary, these minor differences have little impact on the successful outcome of the method, since in each case the lower ranked determinants are actually fragments of the larger, highest ranked determinants 73 (see above). Therefore, the direct use of chemodeterminants 73 and fragments thereof is sufficient to design a high throughput screening compound set, since they invariably contain a structure that contains every low ranking determinant. The following shows a sample of a class of compounds included in such a collection.
These sample structures are examples of compounds for identifying dopamine D2The compound set of receptor agonists may optionally include such compounds. Each of the structures given above contains the chemical determinant 73 or most of its structure.
It was concluded that although the mathematical rationale behind the construction of each case and the case of using 8 different scoring functions differ, they all identified the same chemical determinant, which is most likely the basis for biological activity. Therefore, algorithms containing various combinations of the variables x, y, z and N or q, r, s, t and u as described above are functionally equivalent in the sense of the present invention.
Example 24 informatics-based drug discovery tools
As can be seen from the foregoing examples, the present invention can be combined into one or more series of steps, such as, but not limited to, computer programs designed to increase the efficiency of high throughput screening, compound discovery, hits-to-leads chemistry, compound progression, and/or lead optimization. These steps or procedures are preferably designed to guide a machine and/or robotic system for drug screening, compound selection, set generation, and/or chemical synthesis in a controlled, semi-automated or fully automated manner. Such steps include, but are not limited to, the following examples which form preferred embodiments of the invention.
The method of analyzing the chemical structures annotated with the corresponding experimental results and identifying bioactive chemical determinants according to the invention.
Methods for using the bioactive chemodeterminants identified by the present invention to search chemical databases, virtual or other databases to identify compounds, biomaterials, reagents, reaction products, intermediates or other substances most likely to have a given pharmacological, biochemical, toxicological and/or biological property.
Methods for storing in electronic or other form, in registers, with or without periodic updates, bioactive chemical determinants identified by the present invention, and experimental data and/or scores relating to any given pharmacological, biochemical, toxicological and/or biological property, as a repository of structural information for use in automatically or non-automatically selecting compounds, families and/or scaffolds in high-throughput screening, medicinal-chemical and/or lead optimization decisions.
A method of identifying pharmacological modulators of drug targets such as, but not limited to, receptor ligands, kinase inhibitors, ion channel modulators, protease inhibitors, phosphatase inhibitors and steroid receptor ligands using the invention as described in any of the preceding examples.
A method of increasing the efficacy of a chemical series, increasing the selectivity of a chemical series, designing a compound with a pharmacological effect, predicting a potential second pharmacological effect of a molecule, predicting a potential toxicological effect of a molecule, identifying a bioactive group for a receptor ligand, predicting a potential protein-protein interaction, identifying an orphan ligand-receptor pair, and/or identifying an endogenous modulator of a drug target, using the invention directly as described in any one of the preceding examples or in a computer program designed to analyze a chemical structure. The latter uses refer in particular to the field of functional genomics and proteomics, in which the nucleotide and/or amino acid sequences used for the study can be selected, for example on the basis of the molecular chemical structure identified in biochemical screening assays and treated according to the invention, in order to identify, for example, orphan ligands.
The method of the invention is used directly or in a program designed to identify false positive and/or negative experimental results.
Use of the method of the invention directly or in a program designed to predict the potential hazardous effect of a molecule on humans, livestock and/or the environment, for example to screen chemical substances for use in or as food additives, in plastics, textiles and the like.
The methods of the invention are used directly or in a program designed to perform configuration, conformation, stereochemistry, similarity and/or diversity analysis.
The method of the invention is used directly or in a program designed to map relative contributions and/or to curve bioactive groups or chemical structures.
Methods of running informatics, computer programs, and/or expert systems intended for use in conducting drug, herbicide, and/or pesticide discovery using any of the methods outlined above, alone or in serial and/or parallel combination.
Methods for directing the automatic or non-automatic, spontaneous or non-spontaneous operation of machines and/or meters, using any of the methods outlined above, alone or in combination in series and/or in parallel, and using up-to-date chemodeterminant registers annotated with or without scores, in the field of pharmaceutical and/or agricultural discovery, in order to generate chemical structures rationally, to retrieve compounds, to generate experimental protocols and/or screening data rationally, and/or to select results and/or chemical structures rationally.
Other steps comprising the present invention may be readily obtained with the general knowledge of a person skilled in the art.

Claims (19)

1. A method of operating a computer system for performing discrete substructure analysis, the method comprising: the method comprises the following steps:
evaluating (210, 220, 410) a database (110, 115) of molecular structures, the database being searched with molecular structure information and biological and/or chemical properties;
identifying (220) a subset of molecules within the database having a given biological and/or chemical property;
determining (230, 420) molecular fragments within the subset;
calculating (230, 430, 610-650) a score for each fragment, which represents the contribution of the respective fragment to the given biological and/or chemical property; and
an iterative process is performed (240, 250) by analyzing (250) the determined segments and the calculated scores, first selecting at least one segment whose score indicates that its contribution to the biological and/or chemical property is high, and then repeating the evaluating, identifying, determining and calculating steps.
2. The method of claim 1, wherein: the step of calculating a score comprises the steps of:
calculating (610) the number of molecules (x) within the subset of molecules containing a given fragment.
3. The method according to one of claims 1 or 2, characterized by: the method further comprises the steps of:
identifying a second subset of molecules within the database that do not have the biological and/or chemical property;
wherein the step of calculating a score comprises the steps of:
calculating (620) a number of molecules (y) within said subset and said second subset of molecules containing a given fragment.
4. The method of claim 1, wherein: the step of calculating a score comprises the steps of:
calculating (630) the number of molecules (z) within the subset of molecules.
5. The method of claim 1, wherein: the method further comprises the steps of:
identifying a second subset of molecules within the database that do not have the given biological and/or chemical property;
wherein the step of calculating a score comprises the steps of:
calculating (640) a total number of molecules (N) within the subset and the second subset of molecules.
6. The method of claim 1, wherein: the iterative process is performed by selecting fragments in the next round with a higher molecular weight than fragments in the previous round.
7. The method of claim 1, wherein: the method further comprises the steps of:
selecting (710) a segment based on the calculated score;
analyzing (810) the structure of the selected fragments;
finding (820) a generic item within the fragment structure; and
the generic item is replaced (830) with a generic expression, resulting in a generic substructure.
8. The method of claim 7, wherein: the method further comprises the steps of:
virtual screening is performed 840 using the generic substructures.
9. The method of claim 1, wherein: the analysis of the determined segments and the calculation
The step of scoring comprises the steps of:
selecting (1010) a first segment based on the calculated score;
selecting (1020) a second segment based on the calculated score; and
forming (1030) a molecular substructure, including the first segment and the second segment, using an annealing function.
10. The method of claim 1, wherein: the step of analyzing the determined segments and the calculated scores comprises the steps of:
selecting (710) at least one segment based on the calculated score;
extracting (720) compounds from the previous subset of molecules, the extracted compounds containing the selected fragments;
selecting (730) compounds from the last molecular subset that do not contain the selected fragment, or compounds that are not included in the last molecular subset; and
a new subset of molecules is formed (740), the subset including the extracted and selected compounds.
11. The method of claim 1, wherein: the method further comprises the steps of:
a fragment library (120) is formed (230) comprising the determined fragments and the calculated scores.
12. The method of claim 1, wherein: the database is a dedicated database.
13. The method of claim 1, wherein: the database is a common database.
14. The method of claim 1, wherein: the database is an amino acid and/or nucleotide sequence database and the biological and/or chemical property has a given effect on the protein of interest.
15. The method of claim 1, wherein: the biological and/or chemical properties are pharmacological and the method is useful for drug discovery.
16. The method of claim 1, wherein: the method further comprises the steps of:
a set of compounds containing at least one defined fragment is compiled (260).
17. The method of claim 16, wherein: the method further comprises the steps of:
testing said compiled set of compounds for said given biological and/or chemical property.
18. A computer system for performing discrete substructure analysis, the computer system comprising: the system comprises:
evaluation device (100, 110, 115) for a molecular structure database, which can be informed about the molecular structure
Information and biological and/or chemical property retrieval;
an identification means (100, 130) of a subset of molecules within the database having a given biological and/or chemical property;
-determination means (100, 130, 135) of the molecular fragments within said subset;
-computing means (100, 130, 140) of a score for each fragment, said score representing the contribution of the respective fragment to said given biological and/or chemical property; and
determining whether to perform an iterative process, and if so, analysing the determined segments with the calculated scores, and performing a determination means (100, 130) of the iterative process.
19. The computer system of claim 18, wherein: the system further comprises means for implementing the method of one of claims 2 to 17.
HK04104959.7A 2000-10-17 2001-10-16 Method of operating a computer system to perform a discrete substructural analysis HK1061911B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP00309114 2000-10-17
EP00309114.7 2000-10-17
PCT/EP2001/011955 WO2002033596A2 (en) 2000-10-17 2001-10-16 Method of operating a computer system to perform a discrete substructural analysis

Publications (2)

Publication Number Publication Date
HK1061911A1 HK1061911A1 (en) 2004-10-08
HK1061911B true HK1061911B (en) 2006-10-13

Family

ID=

Similar Documents

Publication Publication Date Title
CN1264110C (en) Method of operation of computer system for performing discrete substructure analysis
CN1533400A (en) Probes, systems and methods for drug discovery
CN1592852A (en) Biological discovery using gene regulatory networks generated from multiple-disruption expression libraries
CN1245638C (en) Methods and system for plant performance analysis
US7139665B2 (en) Computational method for designing enzymes for incorporation of non natural amino acids into proteins
US6631332B2 (en) Methods for using functional site descriptors and predicting protein function
CN1647067A (en) Apparatus and method for analyzing data
CN1886659A (en) Molecular conformation and combination analysis method and instrument
CN1287641A (en) Method and apparatus for performing pattern dictionary formation for use in sequence homology detection
CN1750003A (en) Information processing apparatus, information processing method, and program
CN1725222A (en) Combinatorial chemistry centralized repository design and optimization method
CN1795380A (en) Systems and methods for predicting specific genetic loci that affect phenotypic traits
Fielden et al. In silico approaches to mechanistic and predictive toxicology: an introduction to bioinformatics for toxicologists
CN1303556C (en) System and method for searching information
CN1624698A (en) High-order synthesis method and high-order synthesis device
HK1061911B (en) Method of operating a computer system to perform a discrete substructural analysis
CN1673368A (en) HCV regulated proteins
CN1595402A (en) Method and apparatus for information search
Hong et al. Mold2 molecular descriptors for QSAR
CN1836234A (en) Information management system for biochemical information
CN1914510A (en) Estimating Gene Networks Using Inferential Methods and Biological Constraints
US20060121455A1 (en) COP protein design tool
Birault et al. Bringing kinases into focus: efficient drug design through the use of chemogenomic toolkits
CN1211398C (en) Computer based method for identifying conserved invariant peptide motifs
CN1703704A (en) Three-dimensional structural activity correlation method