WO2003019183A1 - Procede d'elaboration informative et iterative d'une bibliotheque combinatoire de familles de genes - Google Patents
Procede d'elaboration informative et iterative d'une bibliotheque combinatoire de familles de genes Download PDFInfo
- Publication number
- WO2003019183A1 WO2003019183A1 PCT/US2002/026842 US0226842W WO03019183A1 WO 2003019183 A1 WO2003019183 A1 WO 2003019183A1 US 0226842 W US0226842 W US 0226842W WO 03019183 A1 WO03019183 A1 WO 03019183A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- molecules
- gene
- family
- library
- molecule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/60—In silico combinatorial chemistry
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
- G16B35/10—Design of libraries
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
- G16B35/20—Screening of libraries
Definitions
- the present invention in general, relates to methods for the design of molecule libraries and, in particular, to methods for the design of gene-family screening libraries.
- a variety of conventional techniques are known for selecting libraries of molecules for subsequent biological activity screening. These conventional techniques select molecules using a broad range of criteria, some being driven solely by chemical accessibility and reagent availability for combinatorial synthesis, others by molecular "diversity" (sampling of broad chemistry space) with no consideration of medicinal chemistry knowledge, and still others based on a similarity to molecule substructures (e.g., privileged substructures with an affinity for a receptor or enzyme), fragments or ligands that are known to be associated with a desired medicinal property.
- molecule substructures e.g., privileged substructures with an affinity for a receptor or enzyme
- a drawback of conventional molecule library design techniques is that the broad chemistry spaces used therein are often not unique to a specific target or family of targets and, therefore, the molecules selected for inclusion in a molecule library are not specific to any target or family of targets.
- libraries may be designed to be focused on a single target, to the exclusion of related targets within a particular gene family.
- the use of structurally definite substructures and/or fragments limits the structural breadth, and thus the novelty, of the molecules selected for inclusion in the molecule library.
- Still needed in the art is a method for selecting molecules for a library that are unique to a specific target(s), such as all target substances within a given gene-family.
- the method should not be limited to the selection of molecules that include structurally definite substructures or fragments.
- Embodiments in accordance with the presen invention provide a method for selecting molecules for inclusion in a gene-family screening library that are unique to multiple target substances (e.g., enzymes and/or receptors) of a specific gene-family.
- the process may utilizes structurally-abstract molecule descriptors, rather than structurally definite substructures or fragments, and is, therefore, expected to provide for the design and/or selection of library molecules with increased structural novelty.
- target substances e.g., enzymes and receptors belonging to a predetermined gene family are likely to have similarities in amino acid sequence and, therefore, structure and mechanism of action. Because of this, it is expected that molecules which interact with (i.e., have activity towards) target substances of a predetermined gene family may have certain characteristics (e.g., structurally-abstract molecule descriptors) in common. A library of molecules that share such characteristics would then be expected to exhibit activity toward a multitude of target substances of the predetermined gene family.
- An embodiment of a method in accordance with the present invention includes defining a gene-family source set that includes a plurality of source molecules and/or a plurality of target structures (e.g., enzyme and/or receptor structures).
- the plurality of source molecules that can be included in the gene-family source set are selected using selection criteria that includes activity towards a predetermined gene-family.
- the target structures that can be included in the gene-family source set are target structures of a predetermined gene- family.
- At least one class of structurally-abstract molecule descriptor e.g., a class of pharmacophore descriptors, a class of shape-feature, or a class of subshape-feature descriptors
- all members of at least one class of structurally-abstract molecule descriptor are then generated.
- An active molecule descriptor space is then established. Such an active molecule descriptor space includes each of the members of the at least one class of structurally-abstract molecule descriptor that is present in a predetermined number of the plurality of source molecules of the gene-family source set and/or correlated with a predetermined number of the plurality of target structures.
- the active molecule descriptor space may be established by first identifying a predetermined number of the plurality of source molecules of the gene-family source set and/or a predetermined number of the plurality of target structures. Members of the chosen class of the structurally- abstract molecule descriptor which correlate to the predetermined number of gene family source molecules or target structures are then identified.
- An embodiment of a method for selecting molecules for inclusion in a gene-family screening library comprises defining a gene-family source set that includes a plurality of source molecules selected using selection criteria that includes the criterion of activity towards a predetermined gene-family.
- At least one class of structurally-abstract molecule descriptor is chosen, and members of the at least one class of structurally-abstract molecule descriptor are generated.
- An active molecule descriptor space is established, the active molecule descriptor space including each of the members of the at least one class of structurally-abstract molecule descriptor that is present in a predetermined number of the plurality of source molecules of the gene-family source set.
- a group of candidate molecules is identified, and library molecules are selected for inclusion in a gene-family screening library from the group of candidate molecules, thereby designing a gene-family screening library.
- An alternative embodiment of a method for selecting molecules for inclusion in a gene-family screening library comprises defining a gene-family source set that includes a plurality of source molecules, the plurality of source molecules selected from a drug-like molecule database using selection criteria that includes the criterion of in vivo activity towards a predetermined gene-family. At least one class of structurally-abstract molecule descriptor is chosen. Members of the at least one class of structurally-abstract molecule descriptor are generated.
- An active molecule descriptor space is established utilizing a technique wherein the presence/absence of each of the members of the at least one class of structurally-abstract molecule descriptor in the source molecules is encoded in a matrix of source molecule bit strings, the active molecule descriptor space including each of the members of the at least one class of structurally-abstract molecule descriptor that is present in a predetermined number of the plurality of source molecules of the gene-family source set.
- a group of candidate molecules is identified. Library molecules are selected for inclusion in a gene-family screening library from the group of candidate molecules using an informative library design technique.
- This technique may include encoding in an active space bit string members of the at least one class of structurally-abstract molecule descriptor that are included in the active molecule descriptor space; encoding in a candidate bit string the presence/absence of each of the members of the at least one class of structurally-abstract molecule descriptor in each of the group of candidate library molecules, and then ascertaining the overlap of the candidate bit string and the active space bit string.
- Another embodiment of a method for selecting molecules for inclusion in a gene- family screening library comprises defining a gene-family source set that includes a plurality of target structures of a predetermined gene family, choosing at least one class of structurally- abstract molecule descriptor, and generating members of the at least one class of structurally- abstract molecule descriptor.
- An active molecule descriptor space is established, the active molecule descriptor space including each of the members of the at least one class of structurally-abstract molecule descriptor that is correlated with a predetermined number of the plurality of target structures of the gene-family source set.
- An embodiment of a method for examining suitability of a candidate molecule as a drug lead comprises defining a gene-family source set that includes a plurality of molecules selected using selection criteria relating to a predetermined gene-family, choosing at least one class of structurally-abstract molecule descriptor, and generating members of the at least one class of structurally-abstract molecule descriptor.
- An active molecule descriptor space is established, the active molecule descriptor space including members of the at least one class of structurally-abstract molecule descriptor that correlate with a predetermined number of the plurality of molecules of the gene-family source set.
- a group of the candidate molecules is identified, and library molecules are selected for inclusion in a gene- family screening library from the group of the candidate molecules, thereby designing a gene-family screening library.
- FIG. 1 is a flow chart illustrating steps in a process according to one exemplary embodiment of the present invention
- FIG. 2 illustrates a portion of a matrix of encoded source molecule bit strings wherein each row is an encoded bit string associated with a source molecule of the gene-family source set and each column is assigned to a member of a class of structurally-abstract molecule descriptors, wherein the light circles represent a "1" bit (when the structurally-abstract descriptor is present in the source molecule) and the darker circles represent a "0" bit (when the structurally-abstract descriptor is absent in the. source molecule);
- FIG. 3 is a diagram depicting the "space" defined by all members of a class of structurally-abstract molecule descriptors (depicted as a rectangle) and an active molecule descriptor space established therefrom (depicted as a circle disposed within the rectangle);
- FIG. 4 is a bar chart illustrating the results of a step of defining a gene-family source set in a process according to one exemplary embodiment of the present invention
- FIG. 5 A is a simplified diagram of a computing device for processing information according to an embodiment of the present invention.
- FIG. 5B is an illustration of basic subsystems in the computer system of FIG. 5 A.
- FIG. 6 is a simplified flowchart showing the steps for applying a gene family screening library created in accordance with the present invention to identify drug candidates.
- FIG. 7 shows a simplified schematic diagram contrasting the selection of two screening libraries using Shannon entropy.
- the term "gene-family” refers to target substances (e.g., receptors or enzymes) for which a family of genes (i.e., a group of genes with similar sequences) codes; and [27]
- the term “structurally-abstract molecule descriptors” refers to molecule descriptors (e.g., pharmacophore descriptors, shape-feature descriptors, and subshape-feature descriptors) that are structurally indefinite. The term, therefore, does not include definite molecule substructures (i.e., a completely defined portion of a molecule's structure) or definite molecule fragments. [28] FIG.
- Process 10 involves first defining a gene-family source set (as shown at step 12).
- the gene-family source set includes either a plurality of source molecules, a plurality of target structures, or a combination of source molecules and target structures.
- the plurality of target structures that can be included in the gene-family source set includes enzymes and/or receptors structures from a predetermined gene family.
- the target structures can, for example, be derived experimentally or from homology models of target structures.
- Examples of experimental techniques useful for deriving target structures include, but are not limited to, x-ray crystallization, nuclear magnetic resonance (NMR) spectroscopy, and circular dichoism (CD) spectroscopy.
- NMR nuclear magnetic resonance
- CD circular dichoism
- the plurality of source molecules that can be included in the gene-family source set are selected using selection criteria that include, at least, the criterion of activity toward a predetermined gene-family.
- selection criteria include, at least, the criterion of activity toward a predetermined gene-family.
- Such source molecules can be selected, for example, from any suitable corporate, commercial or academic compound collection.
- the molecules can also be derived from natural sources, or be synthesized as peptides or other molecular forms.
- embodiments of gene family screening libraries in accordance with the present invention are typically composed of small molecules having molecular weights of less than 1000 and preferably less than 700, although the present invention is not to be interpreted as limited to any particular maximum size, and may incorporate larger-sized molecules.
- the molecules can be selected from published literature or other public documents.
- the use of databases containing drug-like molecules e.g., the MDL Drug Data Report database ("MDDR"), the World Drug Index [WDI] database and the Comprehensive Medicinal Chemistry database
- MDDR MDL Drug Data Report database
- WDI World Drug Index
- Comprehensive Medicinal Chemistry database can beneficially serve to minimize the time required to select the source molecules and provide descriptions of molecule activity.
- MDDR which is produced by MDL Information Systems, Inc. of San Leandro, California
- the predetermined gene-family can be any gene-family known to one skilled in the art including, but not limited to: the G-Protein Coupled Receptor (GPCR) gene-family including the chemokine receptor sub-family, the ion channel gene-family including the potassium channel and sodium channel sub-families, the serine protease gene-family, the phosphodiesterase gene-family, the nuclear receptor gene-family or the kinase gene-family. Since an objective of process 10 is the creation of a gene-family screening library, the definition of the gene-family source set distinctively involves either source molecule selection criteria that includes activity towards a predetermined gene-family or target structures of a predetermined gene-family.
- GPCR G-Protein Coupled Receptor
- the gene-family source set By defining the gene-family source set in this manner, the gene-family source set, and the active molecule descriptor space that will be subsequently established using the gene-family source set, capture properties that are unique and specific to the predetermined gene-family.
- This uniqueness and specificity enable processes according to the present invention to create a gene-family screening library that includes a reasonable number of library molecules, that contains library molecules with a high likelihood of having activity toward multiple target substances of the predetermined gene-family of interest, while excluding molecules that are less likely to have such activity.
- Compiling a gene family screening library in accordance with the present invention may also capture other, non-specific properties not currently known or associated with a specific gene family, but an objective of embodiments of the invention is to enrich the screening family in what is unique and specific to the gene family relative to a variety of possible molecular properties.
- Source molecule selection criteria in addition to activity toward a predetermined gene family can, if desired, be used to select a plurality of source molecules for the gene- family source set.
- the selection criteria can include either a specific level of or general requirement for in vivo or in vitro activity toward the predete ⁇ nined gene family, and/or molecular weight of less than 1000 or 700, Still another possible selection criterion is molecules that have passed Phase 2 clinical trials.
- At least one class of structurally-abstract molecule descriptor is selected, as shown at step 14.
- the selection of the class of structurally-abstract molecule descriptors can be based on any suitable structurally-abstract characteristic, feature or property of a molecule.
- structurally-abstract molecule descriptor classes include pharmacophore descriptors, shape-feature descriptors, subshape-feature descriptors, atom path-length descriptors, BCUT descriptors, and other biophysical descriptors (e.g., calculated solubility or logP) known to one skilled in the art.
- biophysical descriptors e.g., calculated solubility or logP
- a pharmacophore comprises a set of relative positions in space which should be occupied by atoms of a specific type.
- Shape matching describes comparison of representations of the overall three dimensional shapes of molecules. A detailed description of approaches to shape matching is presented by Srinivasan et al.
- Subshape features are three-dimensional representations of specific subshapes that fit within a larger molecular shape.
- One approach to subshape feature matching is described in a co-pending application titled “Method for Molecular Subshape Similarity Matching,", application no. / , (Atty. Docket No. 018590-005710US), which is hereby fully incorporated by reference.
- subshape features represent a category of structurally-abstract molecule descriptors that can be employed in processes according to the present invention.
- embodiments in accordance with the present invention may utilize known ligand connectivities or fragments such as privileged substructures derived from known ligands, to create the gene family screening library.
- ligand connectivities or fragments such as privileged substructures derived from known ligands
- a description of the concept of privileged substructures is presented by Patchett et al., in "Chapter 26: Privileged Structures - An Update", Ann. Rep. Med. Chem 35, 289 (2000), hereby incorporated by reference in its entirety for all purposes.
- all members of the at least one class of structurally-abstract molecule descriptor are generated, as shown at step 16.
- the selected class of structurally-abstract descriptor is the class of 3- point pharmacophores wherein the 3 points are either aromatic ring features, hydrophobic features or positive charge features, with inter-feature distances of 1-15 Angstroms (A)
- all possible members of that class are generated.
- an active molecule descriptor space is established, as shown at step 18.
- the established active molecule descriptor space includes each member of the class of structurally-abstract molecule descriptor that is present in (represented in) a predetermined number (e.g., more than ten) of the plurality of source molecules of the gene-family source set.
- the active molecule descriptor "space" is defined by the members of the structurally-abstract molecule descriptor class (e.g., pharmacophores and/or subshape- features) that are frequently present in the source molecules.
- the established active molecule descriptor space includes each member of the class of structurally-abstract molecule descriptor that is correlated with a predetermined number (e.g., two) of the plurality of target structures of the gene- family source set.
- a member of the class of structurally-abstract molecule descriptors can be considered "correlated" with a receptor of the gene-family source set if the member is correlated with (i.e., corresponds to or is "present” in) a binding site of the receptor.
- the active molecule descriptor space is derived from (a) the source molecules of a specific gene-family source set and/or target structures of a specific gene-family and (b) the members of the class of structurally- abstract molecule descriptors, it is a molecule design space specifically targeted toward multiple target substances within the predetermined gene family, rather than a broad general design space.
- the active molecule descriptor space is, furthermore, the molecule design space that the library molecules of the gene-family screening library need to satisfy and cover.
- FIG. 2 An exemplary two-step technique for establishing an active molecule descriptor space is described with reference to FIGs. 2 and 3.
- the presence/absence of each of the members of the at least one class of structurally-abstract molecule descriptor in the source molecules of a gene- family source set is encoded in a matrix of source molecule bit strings, with a bit of " 1 " representing the presence of a member and a bit code of "0" representing the absence of that member.
- a matrix of source molecule bit strings is depicted in FIG. 2 (where the bit code of "1" is depicted as a light circle and the bit code of "0" is depicted as a dark circle).
- each column is assigned to one of the members of the at least one class of structurally-abstract molecule descriptors.
- the matrix of source molecule bit strings is analyzed to resolve each of the members of the at least one class of structurally-abstract molecule descriptor that is present in a predetermined number (e.g., more than ten) of the plurality of source molecules. Since the matrix is in an encoded bit format, this procedure can be readily automated using computer- automated and software-based techniques. Each of the members thus resolved establishes (i.e., defines) the active molecule descriptor space.
- a group of candidate molecules is identified.
- the group of candidate molecules is a subset of all possible molecules that can be chosen by any known means including medicinal chemistry intuition, selection from molecule inventories, synthetic accessibility, or computer design (i.e., virtual libraries of candidate molecules).
- the group of candidate molecules is encoded in candidate bit strings.
- encoding the candidate molecules in this manner allows the presence/absence of each of the members of the at least one class of structurally-abstract molecule descriptor in each of the group of candidate molecules to be encoded in the candidate bit strings. Overlap of the candidate bit string and the active space bit string may subsequently be ascertained and analyzed to aid in the selection of library molecules providing optimal coverage.
- library molecules for inclusion in a gene-family screening library are selected from the group of candidate molecules using a library design technique that optimizes coverage of the active molecule descriptor space (see step 22 of FIG. 1). The result of this selection is the creation of a gene-family screening library that includes a plurality of library molecules specifically directed toward multiple target substances of the predetermined gene family.
- an informative library design technique is especially beneficial in terms of efficiently providing optimal coverage of the active molecule descriptor space and selecting library molecules that will provide the maximum amount of information when used in screening studies. If desired, the coverage of the active molecule descriptor space obtained using an informative library design technique can be beneficially optimized using Shannon entropy, as is discussed in detail below in connection with FIG. 7.
- An informative library design technique selects library molecules from the group of candidate molecules that cover the active molecule descriptor space in such a way that a maximum number of conclusions can be drawn out of subsequent screening experiments results regarding which are the preferred structurally-abstract molecule descriptors for molecules to have activity toward the predetermined gene family (hence the term "informative design").
- Informative design in this unique context means that the library molecules selected for inclusion in the gene-family screening library sample multiple combinations of the members in the active molecule descriptor space in an overlapping fashion. This overlap allows a maximum number of conclusions to be drawn from gene- family screening experiments conducted using the library molecules.
- One informative library design technique that can be utilized in processes according to the present invention includes encoding, in active space bit strings, the members of the at least one class of structurally-abstract molecule descriptor that are included in the active molecule descriptor space. Next, the presence/absence of each of the members of the at least one class of structurally-abstract molecule descriptor in each of the group of candidate molecules is encoded in candidate bit strings. The overlap of the candidate bit string and the active space bit string, as well as the overlap of the active bit strings themselves, is subsequently ascertained.
- H the Shannon entropy of the set of molecules with respect to the active molecule descriptor space
- E number of descriptors in the active molecule descriptor space
- C the number of distinct clusters (i.e. unique descriptor bit string patterns)
- ⁇ c ⁇ the size of cluster i. (number of descriptors having the unique bit string pattern of the cluster)
- Figure 7 shows two potential libraries 700 and 702 comprising three molecules (rows) selected to cover a molecular descriptor space comprising six structurally-abstract descriptors (columns) A-G.
- Each row thus represents the bit string for a molecule, with a "1" indicating the presence of a particular descriptor and a "0" indicating the absence of a particular descriptor in that molecule.
- Each column represents a bit string for a particular descriptor, again using "1" and "0” to denote the presence and absence of that descriptor in a particular molecule.
- the descriptors A-G can be grouped based on the patterns in their bit strings (i.e. columns). Clusters of descriptors are defined based on the unique column bit patterns, with descriptors having the same pattern belonging to the same cluster. Thus in library 700 of Figure 7, molecules 704, 706, and 708 create four unique patterns or clusters 700a-d for the six descriptors A-G. Descriptors B and C have the same pattern, forming cluster 700b having a cluster size (c,) of 2. Descriptors A and D each have unique patterns comprising clusters 700a and 700d having a cluster size of 1.
- Figure 7 illustrates the difference between optimizing for coverage alone, and optimizing for information and coverage with Shannon entropy.
- Both library 700 and Library 702 cover the molecular descriptor space: i.e. each of the six descriptors A-G occurs at least once in the set of molecules).
- the Shannon entropy of the two libraries 700 and 702 is not the same.
- the Shannon entropy for Library 700 can be calculated using equation (1) based on the four clusters and their sizes as follows:
- H - [l/61n(l/6)] - [2/61n(2/6)] - [l/61n(l/6)] - [2/61n(2/6)] where the first, third, and fourth terms are the contribution of clusters 700a, 700c, and 700d respectively, each having a cluster size of one.
- the second term is from the second cluster 700b which has a cluster size of two.
- the resulting Shannon entropy for library 700 is 1.52. [57]
- the value of H reaches a maximum when each descriptor (column) has a unique pattern, thereby returning the most information.
- the library property M thus reflects both the Shannon entropy of the library (H), and how well the library matches specified property distribution constraints (D).
- the term D may also be characterized as a penalty where the library fails to match specified property distribution constraints.
- the total cost term (D) may be calculated as a summation ( ⁇ ) of terms according to equation (3) as follows:
- b number of different bins or property ranges considered in a particular distribution;
- Each distribution may be scaled by a weight term ( ⁇ ) as shown in the outer summation of equation (3).
- This weight term ( ⁇ ) allows the user to control the relative importance of the various properties. For example, it might be much more significant to match the molecular weight distribution than to match the number of rotatable bonds.
- D the total cost for Table 1 of matching both of the distributions
- first term represents the contribution from the rotatable bonds distribution and the second term represents the molecular weight.
- step 24 of FIG. 1 the coverage of the active molecule descriptor space by the library molecules is calculated. If the coverage is deemed sufficient, the process can be halted. [69] If the coverage is deemed insufficient, then at step 26, another group of candidate molecules not yet represented in the screening library is identified. This group of candidate molecules is identified such that "holes" (i.e., portions of the active molecule descriptor space that are not represented by any of the library molecules) in the active molecule descriptor space are likely to be filled be the proper selection of library molecules therefrom.
- holes i.e., portions of the active molecule descriptor space that are not represented by any of the library molecules
- additional library molecules for inclusion in a gene-family screening library are selected from this group of candidate molecules using a library design technique (for example, an informative library design technique).
- a library design technique for example, an informative library design technique.
- This selection of additional library molecules creates a gene-family screening library with an improved (e.g., increased or more informative) coverage of the active molecule descriptor space and therefore a high likelihood of activity versus a family of target substances.
- one criterion for identifying additional candidate molecules is that the candidate molecule must reflect the product of reaction between a combinatorial template and a reagent containing carboxylic acid or a bioisotere thereof (i.e. another functional group recognized as acting biologically like a carboxylic acid group).
- a reagent containing carboxylic acid or a bioisotere thereof i.e. another functional group recognized as acting biologically like a carboxylic acid group.
- Such an additional candidate molecule with the appropriate feature types should still present those features in the appropriate relative locations for the pharmacophore to be "covered".
- knowledge of the pharmacophores and their associated features can direct the search for candidate molecules.
- a benefit of processes according to the present invention is that the minimum size of the gene-family screening library can be estimated as the process progresses. For example, if 3000 source molecules are included in the gene-family source set, then it can be deduced that 3000 library molecules could conceivably completely cover the active descriptor source space, if those 3000 library molecules include the requisite structurally-abstract molecule descriptors.
- the iterative aspect of the present invention also provides an opportunity to estimate the size of the gene-family screening library. For example, if it is calculated that 5000 library molecules provide 29% coverage of the active molecule descriptor space, it can be estimated that if additional candidate molecules covering a similar number of structurally-abstract molecule descriptor characteristics are selected, then the final screening library should contain approximately 17,240 molecules. [75] In yet another alternative method of estimating the size of the gene-family screening library, if the exact nature of the candidate molecules is known, then the average number of structurally-abstract molecule descriptors represented by each of the candidate molecules can be calculated.
- the estimated number of library molecules that need to be selected for a gene- family screening library in order to cover the active molecule descriptor space would then be equal to the number of structurally-abstract molecule descriptors in the active molecular descriptor space divided by the average number of structurally-abstract molecule descriptors represented by each of the candidate molecules.
- FIG. 6 is a flow chart showing the steps of a method 600 for applying a gene family screening library in accordance with the present invention to identify drug leads.
- a gene-family screening library in accordance with an embodiment of the present invention is constructed as described in detail above.
- the first step 602 of the method 600 shown in FIG. 6 thus corresponds to step 22 or 28 of FIG. 1.
- members of the gene family screening library are procured.
- Members of the library can be procured in a number of ways. One approach is to synthesize in the laboratory one or more of the molecules comprising the library. Such synthesis can comprise conventional techniques, or more efficiently can employ combinatorial synthesis strategies wherein large numbers of organic compounds are created in parallel by linking chemical building blocks in all possible combinations.
- Such combinatorial synthesis approaches may involve solid phase synthesis wherein the molecules are anchored to beads, or may involve solution phase synthesis wherein the molecules are present in solution. Either or both solid or solution phase combinatorial synthesis techniques could be utilized to procure members of a gene family screening library created in accordance with embodiments of the present invention.
- Another alternative approach for procuring members of the gene family screening library is to purchase existing molecules from commercial sources. Examples of commercial sources of molecules suitable for procuring members of a gene family screening library created in accordance with an embodiment of the present invention include, but are not limited to, Pharmacopeia Inc. of Princeton, New Jersey, Sigma- Aldrich Corp. of St. Louis, Missouri, Maybridge Plc.of Tintagel, Cornwall U.K., Chembridge Corp. of San Diego, CA, and Albany Molecular Research, of Albany, NY.
- Two- and three-dimensional molecular representations of gene family library members generated utilizing various software packages may be stored in a number of standardized formats, including but not limited to the SMILES format from Daylight Chemical Information Systems, Mission Viejo, California, and described by Weininger, in "SMILES 1. Introduction and Encoding Rules", J. Chem. Inf. Comput. Sci. 28, 31 (1988), incorporated herein by reference, the MOL2 format by Tripos Inc. of St. Louis, Missouri, the MOL and SDF formats of MDL of San Leandro, California, and the PDB format of the Protein Data Bank, http://www.rcsb.org/pdb/, incorporated herein by reference.
- a third step 606 of flowchart 600 some or all of the procured members of the gene family library in accordance with the present invention can be screened for activity.
- screening wherein novel drug leads have been successfully generated from combinatorial libraries, include the identification of novel cholecystokinin receptor antagonists, herpes simplex virus inhibitors, carbonic anhydrase II inhibitors, and peroxisome proliferator-activated receptor ligands. See Bunin et al., "Chapter 27. Application of Combinatorial and Parallel Synthesis to Medicinal Chemistry", Ann Rep Med Chem, 34, 267-286 (1999), incorporated by reference herein for all purposes.
- Screening of the created gene family library can take the form of biological assays conducted outside of living tissue ⁇ in vitro).
- assay formats for measurement of enzyme activity or receptor binding include, but are not limited to, electrophoresis, scintillation proximity, ELISAs, immunoprecipitation, western blotting, and bead-based methods.
- detection techniques for application with biological assays include, but are not limited to, the use of time-resolved fluorescence, resonance energy transfer (FRET), fluorescence polarization, radioisotopic tracers, and chemiluminescent or colorimetric substrates.
- FRET resonance energy transfer
- in vitro screening techniques for use in conjunction with gene family screening libraries created in accordance with the present invention include, but are not limited to, binding assays, enzyme activity assays, and cell- based assays such as functional assays and metabolism assays.
- One or more of the screening techniques described above can be performed with different levels of throughput.
- High-throughput screening of compound libraries is a standard approach in pharmaceutical research to discover new lead compounds for drug design.
- High-throughput screening typically involves the use of ninety-six or a greater number of wells per plate.
- Such high-throughput screening methods have discovered novel molecules, dissimilar to known ligands, that nevertheless bind to the target receptor at micromolar or submicromolar concentrations.
- members of a gene family library created in accordance with embodiments of the present invention can be subjected to screening in living tissue ⁇ in vivo).
- in vivo assays include but are not limited to evaluation of a gene family screening library member activity in rodents, dogs, primates, or any other species. This evaluation may include testing of the library molecules in a suitable pharmacological model of a particular disease state, wherein physiological or behavioral changes in an animal are monitored. Such animals may be normal (wild-type) or genetically- modified, or may be subject to a particular experimental protocol.
- Data produced from in vivo assays may include but is not limited to physical examination, histological (organ/tissue) or behavioral observations, post-mortem examinations, and gene-expression analyses from tissue samples of animals exposed to library molecules.
- library molecules may effectively reduce the size, weight and/or adipose tissue density of animals fed a high-fat diet, as a model for human obesity and diabetes, or may produce a response associated with reduced anxiety in a behavioral test, or may alter normal gene-expression in a given tissue as a result of interacting with an appropriate biological target.
- screening "in silico" - within the silicon of the integrated circuits comprising a computer processor or memory, - is emerging as an increasingly useful technique.
- silico screening also known as virtual screening, relies upon electronic representations of the molecules in two- or three- dimensions, rather than upon the physical molecules themselves.
- In silico screening may permit a researcher to rapidly compare and evaluate similarity between candidate molecules from the library and other structures, such as receptors or other molecules with previously- demonstrated activity against a particular receptor.
- Processes according to the present invention can be implemented using computer- automated techniques that involve custom and/or commercial software routines implemented in a single application program or implemented as multiple programs in a distributed computing environment, such as a workstation, personal computer or remote terminal in a client-server relationship.
- FIG. 5 A is a simplified diagram of a computing device for processing information according to an embodiment of the present invention. This diagram is merely an example which should not limit the scope of the claims herein. One skilled in the art would recognize many other variations, modifications and alternatives. Embodiments according to the present invention can be implemented in a single application program such as a browser, or can be implemented as multiple programs in a distributed computing environment, such as a workstation, personal computer or a remote terminal in a client server relationship.
- FIG. 5 A shows a computer system 510 including a display device 520, a display screen 530, a cabinet 540, a keyboard 550, and a mouse 570.
- Mouse 570 and keyboard 550 are representative "user input devices.”
- Mouse 570 includes buttons 580 for selection of buttons on a graphical user interface device.
- Other examples of user input devices are a touch screen, light pen, track ball, data glove, microphone, and so forth.
- FIG. 5A is representative of but one type of system for embodying the present invention. It will be readily apparent to one of ordinary skill in the art that many system types and configurations are suitable for use in conjunction with the present invention.
- computer system 510 includes a Pentium class based computer, running Windows NT operating system by Microsoft Corporation. However, the apparatus is easily adapted to other operating systems and architectures by those skilled in the art without departing from the scope of the present invention.
- mouse 570 can have one or more buttons such as buttons 580.
- Cabinet 540 houses familiar computer components such as disk drives, a processor, storage device, etc. Storage devices include, but are not limited to, disk drives, magnetic tape, solid state memory, bubble memory, etc.
- Cabinet 540 can include additional hardware such as input/output (I/O) interface cards for connecting computer system 510 to external devices external storage, other computers or additional peripherals, further described below.
- FIG. 5B is an illustration of basic subsystems in computer system 510 of FIG. 5 A. This diagram is merely an illustration and should not limit the scope of the claims herein. One skilled in the art will recognize other variations, modifications and alternatives.
- the subsystems are interconnected via a system bus 575.
- I/O controller 571 Peripherals and input/output (I/O) devices, which couple to an I/O controller 571, can be connected to the computer system by any number of means known in the art, such as a serial port 577.
- serial port 577 can be used to connect the computer system to a modem 581, which in turn connects to a wide area network such as the Internet, a mouse input device, or a scanner.
- System memory and the fixed disk are examples of tangible media for storage of computer programs, other types of tangible media include floppy disks, removable hard disks, optical storage media such as CD-ROM's and bar codes, and semiconductor memories such as flash memory, read-only-memories (ROM) and battery backed memory.
- ROM read-only-memories
- GPCR G-protein coupled receptor
- the 3321 source molecules were selected from version 2000.1 of the MDL Drug Data Report database using the selection criteria of (i) membership in a class that exhibits activity towards the GPCR gene-family; (ii) in vivo activity towards the GPCR gene-family; and (iii) a molecular weight between zero and 700.
- FIG. 4 is a bar chart illustrating the total number of molecules with activity towards a given GPCR gene-family receptor grouping (i) and those that were selected based on criteria (ii) and (iii) above.
- the two classes were 3-point and 4-point pharmacophore descriptors, each of which includes either 3 or 4 features , selected from: (i) at most 2 positive charge features; (ii) at most 2 negative charge features; (iii) hydrogen bond donor features; (iv) hydrogen bond acceptor features; (v) at most two hydrophobic features; and (vi) aromatic ring features.
- the two classes of pharmacophore descriptors each included 25 distance bins in the range of 1.6 Angstroms to 24 Angstroms. [97] All members of these two classes of pharmacophore descriptors were then generated (enumerated). The members numbered approximately 35,000,000.
- This active molecule descriptor space included any of the members of the two classes of pharmacophore descriptors that were present in any eleven or more of the 3321 source molecules of the gene- family source set.
- the establishment of the active molecule descriptor space was facilitated by encoding the presence/absence of each of the members of the two classes of pharmacophore descriptors in the 3321 source molecules in a matrix of source molecule bit strings and then conducting a computer-based analysis of the matrix to establish the active molecule descriptor space.
- the active molecule descriptor space established in this manner included approximately 1,800,000 members (i.e., individual pharmacophore descriptors) of the two classes of pharmacophore descriptors.
- candidate molecules were identified from potential chemistries available identified by one skilled in the art. From one set of -160,000 candidate molecules, a matrix of 5000 library molecules (10 reagentl x 20 reagent2 x 25 reagent3) was selected to optimize coverage in the active molecular descriptor space. These 5000 library molecules provided 29% coverage of the active molecule descriptor space (calculated as the percent of active molecular descriptor space pharmacophore members that were represented in greater than 10 of the 5000 molecules). [100] At this point 71% of the active molecule descriptor space remains uncovered, and a second iteration of the process would be performed.
- informative design would be used to select another set of library molecules (e.g., 5000) which optimized the coverage of the remaining 71% of the active molecule descriptor space.
- the candidate pool can be the same or supplemented with additional compounds.
- the cumulative 10,000 member library would be checked for coverage of the active molecule descriptor space. Another iteration of design would be pursued if the coverage was not complete.
- a second experimental example illustrating the utilization of a gene family screening library created in accordance with an embodiment of the present invention is as follows.
- a GPCR targeted gene family library comprising 13,769 molecules was constructed in a manner similar to the procedure outlined in the first Example. Specifically, the known GPCR ligands collected in the MDDR were used to derive a pharmacophore space comprising 3- and 4-point pharmacophores. Small combinatorial libraries were selected, synthesized, and purified on the basis of 50-60 chemical scaffolds [102] Each of the 13,769 molecules of the GPCR gene family library were then screened against the ⁇ -opioid receptor.
- the percentage hit rate obtained utilizing a gene family screening library created in accordance with an embodiment of the present invention may fairly be contrasted with a conventional diverse, drug-like library of 10,560 molecules reported by Poulain et al, "From Hit to Lead, Combining Two Complementary Methods for Focused Library Design, Application to ⁇ Opiate Ligands", the Journal of Medicinal Chemistry, 44, 3378-3390 (2001), hereby incorporated by reference for all purposes. Activity of molecules in the conventional library against the ⁇ -opioid receptor resulted in a hit-rate of only 1.7%. [104] Moreover, the enhanced accuracy of screening utilizing the GPCR library created in accordance with an embodiment of the present invention was affirmed by subsequent research. Specifically, the largest number of active molecules of the GPCR gene family of this example corresponded to a combinatorial synthesis template or scaffold present in the known compound spiroxatrine.
- the MDDR represents a compilation of a consensus in the scientific and patent literature regarding chemical and biological activity of the compounds listed therein.
- spiroxatrine was designated as exhibiting biological activity as an antagonist at the dopamine D2 receptor, and activity against the serotonin 5HT1 A receptor, both of which are G-protein coupled receptors (GPCR). Therefore, spiroxatrine or a compound of similar structure might be expected to exhibit activity against other members of the GPCR family.
- GPCR G-protein coupled receptors
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biochemistry (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Medicinal Chemistry (AREA)
- Crystallography & Structural Chemistry (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US31461601P | 2001-08-23 | 2001-08-23 | |
| US60/314,616 | 2001-08-23 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2003019183A1 true WO2003019183A1 (fr) | 2003-03-06 |
| WO2003019183A9 WO2003019183A9 (fr) | 2004-01-15 |
Family
ID=23220675
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2002/026842 Ceased WO2003019183A1 (fr) | 2001-08-23 | 2002-08-22 | Procede d'elaboration informative et iterative d'une bibliotheque combinatoire de familles de genes |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2003019183A1 (fr) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111445954A (zh) * | 2020-04-01 | 2020-07-24 | 广州基迪奥生物科技有限公司 | 一种多基因家族鉴定及进化分析的方法 |
| US10800742B2 (en) | 2015-04-24 | 2020-10-13 | The Johns Hopkins University | Small molecule compounds targeting PBX1 transcriptional complex |
| WO2023102923A1 (fr) * | 2021-12-10 | 2023-06-15 | 深圳晶泰科技有限公司 | Procédé et appareil de détermination pour un programme de conception moléculaire, dispositif et support de stockage |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6090912A (en) * | 1993-05-27 | 2000-07-18 | Selectide Corporation | Topologically segregated, encoded solid phase libraries comprising linkers having an enzymatically susceptible bond |
| US6185506B1 (en) * | 1996-01-26 | 2001-02-06 | Tripos, Inc. | Method for selecting an optimally diverse library of small molecules based on validated molecular structural descriptors |
| US6240374B1 (en) * | 1996-01-26 | 2001-05-29 | Tripos, Inc. | Further method of creating and rapidly searching a virtual library of potential molecules using validated molecular structural descriptors |
-
2002
- 2002-08-22 WO PCT/US2002/026842 patent/WO2003019183A1/fr not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6090912A (en) * | 1993-05-27 | 2000-07-18 | Selectide Corporation | Topologically segregated, encoded solid phase libraries comprising linkers having an enzymatically susceptible bond |
| US6185506B1 (en) * | 1996-01-26 | 2001-02-06 | Tripos, Inc. | Method for selecting an optimally diverse library of small molecules based on validated molecular structural descriptors |
| US6240374B1 (en) * | 1996-01-26 | 2001-05-29 | Tripos, Inc. | Further method of creating and rapidly searching a virtual library of potential molecules using validated molecular structural descriptors |
Non-Patent Citations (3)
| Title |
|---|
| MANALLACK ET AL.: "Selecting screening candidates for kinase and G protein-coupled receptor targets using neural networks", JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, vol. 42, 2002, pages 1256 - 1262, XP002958747 * |
| XUE ET AL.: "Database searching for compounds with similar biological activity using short binary bit string representation of molecules", JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, vol. 39, 1999, pages 881 - 886, XP002958748 * |
| XUE ET AL.: "Mini-fingerprints detect similar activity of receptor ligands previously recognized only by three-dimensional pharmacophore-based methods", JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, vol. 41, 2001, pages 394 - 401, XP002958749 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10800742B2 (en) | 2015-04-24 | 2020-10-13 | The Johns Hopkins University | Small molecule compounds targeting PBX1 transcriptional complex |
| CN111445954A (zh) * | 2020-04-01 | 2020-07-24 | 广州基迪奥生物科技有限公司 | 一种多基因家族鉴定及进化分析的方法 |
| CN111445954B (zh) * | 2020-04-01 | 2023-09-01 | 广州基迪奥生物科技有限公司 | 一种多基因家族鉴定及进化分析的方法 |
| WO2023102923A1 (fr) * | 2021-12-10 | 2023-06-15 | 深圳晶泰科技有限公司 | Procédé et appareil de détermination pour un programme de conception moléculaire, dispositif et support de stockage |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2003019183A9 (fr) | 2004-01-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Meslamani et al. | Protein–ligand-based pharmacophores: generation and utility assessment in computational ligand profiling | |
| Sun | Pharmacophore-based virtual screening | |
| von Korff et al. | Comparison of ligand-and structure-based virtual screening on the DUD data set | |
| Cheng et al. | Comparative assessment of scoring functions on a diverse test set | |
| Desaphy et al. | Comparison and druggability prediction of protein–ligand binding sites from pharmacophore-annotated cavity shapes | |
| Krovat et al. | Recent advances in docking and scoring | |
| Hou et al. | Recent development and application of virtual screening in drug discovery: an overview | |
| Zoete et al. | Docking, virtual high throughput screening and in silico fragment‐based drug design | |
| Korb et al. | Potential and limitations of ensemble docking | |
| Kitchen et al. | Docking and scoring in virtual screening for drug discovery: methods and applications | |
| Joseph-McCarthy et al. | Fragment-based lead discovery and design | |
| US7751988B2 (en) | Lead molecule cross-reaction prediction and optimization system | |
| Danishuddin et al. | Structure based virtual screening to discover putative drug candidates: Necessary considerations and successful case studies | |
| Roy et al. | Other related techniques | |
| Foloppe et al. | Conformational sampling and energetics of drug-like molecules | |
| Neamati | New paradigms in drug design and discovery | |
| Hoffer et al. | S4mple–sampler for multiple protein–ligand entities: Simultaneous docking of several entities | |
| Ebalunode et al. | Novel approach to structure-based pharmacophore search using computational geometry and shape matching techniques | |
| Makhouri et al. | Combating diseases with computational strategies used for drug design and discovery | |
| Moriaud et al. | Computational fragment-based approach at PDB scale by protein local similarity | |
| Sanders et al. | Snooker: a structure-based pharmacophore generation tool applied to class A GPCRs | |
| Rasul et al. | Decoding drug discovery: exploring A-to-Z in silico methods for beginners | |
| Khan et al. | Modern methods & web resources in drug design & discovery | |
| WO2003019183A1 (fr) | Procede d'elaboration informative et iterative d'une bibliotheque combinatoire de familles de genes | |
| WO2000065421A2 (fr) | Representation de la selectivite de recepteurs |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EC EE ES FI GB GD GE GH HR HU ID IL IN IS JP KE KG KP KR LC LK LR LS LT LU LV MA MD MG MN MW MX MZ NO NZ OM PH PL PT RU SD SE SG SI SK SL TJ TM TN TR TZ UA UG UZ VC VN YU ZA ZM Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG UZ VC VN YU ZA ZM ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZM ZW AM AZ BY KG KZ RU TJ TM AT BE BG CH CY CZ DK EE ES FI FR GB GR IE IT LU MC PT SE SK TR BF BJ CF CG CI GA GN GQ GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| COP | Corrected version of pamphlet |
Free format text: PAGES 1/8-8/8, DRAWINGS, REPLACED BY NEW PAGES 1/8-8/8 |
|
| 122 | Ep: pct application non-entry in european phase | ||
| NENP | Non-entry into the national phase |
Ref country code: JP |
|
| WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |