Disclosure of Invention
In order to achieve the above purpose, the invention is realized by the following technical scheme: a ground state structure prediction method under substituent engineering comprises the following steps: obtaining candidate seed isomers; performing energy prediction on the candidate seed isomer based on a mortise and tenon model; and establishing a database and storing the structures corresponding to the candidate seed isomers after energy prediction.
Further, the obtaining candidate seed isomers comprises: constructing a basic framework; acquiring an initial seed isomer based on the basic skeleton; the initial seed isomers within the required energy range are screened as candidate seed isomers.
Further, the obtaining initial seed isomers based on the basic skeleton includes: and (3) taking the substituent groups as ligands by using a cluster growth mode and adding the ligands into the basic skeleton one by one to obtain initial seed isomers.
Further, the required energy range is in the range of 50 kcal/mol.
Further, the energy prediction for candidate seed isomers based on the mortise and tenon model comprises: obtaining the energy of a model with the weight of x under A-G and the energy of a model with the weight of y under A-A based on a gene sequence, wherein A is a gene isomer, and G is a substituent; acquiring the energy of a regular octahedron skeleton; and obtaining the energy of the candidate seed isomer based on the energy of the model with the weight of x under the A-G, the energy of the model with the weight of y under the A-A and the energy of the regular octahedron skeleton.
Further, the process of obtaining the energy of the model with the weight x under the A-G and the energy of the model with the weight y under the A-a is as follows: determining candidate seed isomers as backbones; determining the substituents contained in the candidate seed isomer; determining the site occupied by each substituent; searching a model with the weight of A-G being x and a model with the weight of A-A being y from a database based on the determined candidate seed isomer, the substituent contained in the candidate seed isomer and the site occupied by each substituent; and obtaining the energy of the model with the weight value x under the A-G and the energy of the model with the weight value y under the A-a.
Further, the energy of the candidate seed isomer is calculated as follows:
;
In the formula, As the energy of the candidate seed isomer,For the energy of the model with weight x under a-G,For the energy of the model with weight y under A-A,Is the energy of the regular octahedron framework.
Further, the energy calculation formula of the model with the weight value x under the A-G is as follows:
;
In the formula, The energy actually calculated for the model with weight x under A-G.
Further, the energy calculation formula of the model with the weight value of y under the A-A is as follows:
;
In the formula, The energy actually calculated for the model with weight y under A-G.
The invention has the following beneficial effects:
the ground state structure prediction method under substituent engineering not only considers all possible ground state structures (based on a double potential energy surface 'seed' strategy), but also considers diversity of substituents and prediction reliability of large-scale calculation (based on detailed searching of 'seed' and 'mortise and tenon' strategies). Therefore, the research paradigm which can be derived by the invention has universality and can be applied to the structural rule research of various clusters under substituent engineering.
Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.
Detailed Description
According to the embodiment of the application, through the ground state structure prediction method under substituent engineering, all possible ground state structures (based on a seed strategy of a double potential energy surface) are considered, and diversity of substituents and prediction reliability of large-scale calculation (based on detailed searching of a seed strategy and a mortise and tenon strategy) are also considered. Therefore, the research paradigm which can be derived by the application has universality and can be applied to the structural rule research of various clusters under substituent engineering.
The problems in the embodiment of the application have the following general ideas:
The structure predicted by the structural rule is supposed to be a ground state structure (i.e. a structure with lowest energy) under various systems, and the following difficulties are involved: 1. the same number and variety of atoms can form different isomers in numerous combinations in three-dimensional space, which establishes an isomer with excellent thermodynamic competitiveness (we refer to as a "seed isomer") under thermodynamics. 2. When we get the "seed isomer", this solves the energy calculation problem of massive isomers (isomers substituted by different numbers and kinds of substituents) under substituent engineering.
The Seed and Mortise-Tenon (SMT) method is to solve the two key problems, wherein the Seed method carries out global search based on a full H/F system to obtain a large number of isomers and carries out energy sequencing, and for the full H/F system, the isomers with the energy ranking of 5 (10) are respectively selected as Seed isomers; the 'Mortise-Tenon' method can predict the virtual energy of the 10 'seed isomers' under different substitution systems through an automatic program by means of early modeling and database establishment, and automatically screen and compare, and pick out the ground state structure under each system.
Referring to fig. 1, the embodiment of the invention provides a technical scheme: a ground state structure prediction method under substituent engineering comprises the following steps: obtaining candidate seed isomers; performing energy prediction on the candidate seed isomer based on a mortise and tenon model; a database is established and stored for the candidate seed isomers after energy prediction and the structures corresponding to the candidate seed isomers, as shown in fig. 7, the structures are seed isomer structures, and the energy of the module units of each seed isomer is stored in the database. In the form of "seed isomers": regular octahedron A, basin benzene C, triangular bipyramid D and butterfly E are examples. Because each "seed isomer" is structurally symmetrical, structural duplication may occur when our substituents are placed at different positions, and the module units are unique structural units under the disubstitution of C 2B4H6-nRn (n=1 to 6). As shown in fig. 8, the module unit corresponding to C 2B4H4(SiH3)2 uses a "seed isomer" regular octahedron a as an initial skeleton, and the substituent is < SiH 3 >, and n=2, that is, four module units corresponding to C 2B4H4(SiH3)2: [ A-SiH 3-2-1】,【A-SiH3-2-2】,【A-SiH3-2-3】,【A-SiH3 -2-4 ].
Specifically, obtaining candidate seed isomers includes: constructing a basic framework; acquiring an initial seed isomer based on the basic skeleton; the initial seed isomers within the required energy range are screened as candidate seed isomers.
Obtaining an initial seed isomer based on a base skeleton, comprising: and (3) taking the substituent groups as ligands by using a cluster growth mode and adding the ligands into the basic skeleton one by one to obtain initial seed isomers.
The energy range is required to be in the range of 50 kcal/mol.
In this embodiment, to fully show the effect of substituents on the relative energy of C 2B4R6 with significantly different substituents, a "down-top" strategy was employed to obtain the initial structure by global isomer search of both clusters C 2B4H6 (down) and C 2B4F6 (top) with contrasting electronegative substituents. Because in most chemical experience many substituents R are rigid to maintain the configuration and properties of the low energy isomer. Thus, we made a reasonable assumption that the C 2B4 fragment is considered to be the "framework" and that all substituents R are "ligands". By the "backbone-ligand cluster growth" method, we performed a global isomer search on C 2B4R6 (r=h/F). To determine all possible forms of the C 2B4 scaffold, we constructed the C 2B4 scaffold using a "grid-based comprehensive isomerization strategy" and then added H/F atoms one by one into the optimized C 2B4 core through a "cluster growth" pattern. Finally, at the B3LYP/6-31G (d) level, 4081C 2B4H6 and 2520C 2B4F6 isomers were obtained as local energy minima. Isomers of parent C 2B4R6 (r=h/F) in the range of 50 kcal/mol were refined at the level of complex CBS-QB3, see fig. 2. Depending on the energy of C 2B4R6 (r=h/F), these 10 seeds are the first 5 structures, as shown in fig. 3, with the outermost sphere in each molecular structure representing a substituent site of 17 substituents attached to a carbon or boron atom.
Mortise and tenon joints have long been widely used in traditional wood construction for joining two or more components. In which the concave part is called the tenon (as skeleton) and the protruding part of the component is called the tenon (as ligand), which is similar to the principle of operation of our model and is therefore called the "mortise and tenon" model. The "mortise and tenon" model establishes a complete database for each seed isomer, and the database records the energy modules with unique structures. It should be noted that in the construction of the "mortise and tenon" model, we only consider the influence of substituents on the skeleton, but not the influence between substituents. As can be seen from fig. 4 to fig. 6, the mortise and tenon model can realize automatic construction and energy prediction of the structure.
As shown in fig. 6, a flowchart of the ground state structure prediction method is detailed, and in the mortise and tenon joint model, we need to let the program know the following three information: 【1】 We used which seed isomer as the backbone. 【2】 Which substituents are present under the current substitution. 【3】 These substituents occupy each site, which allows the program to search the database for the exact modular unit energy based on the information described above to make the energy predictions of the structure. Regarding [1 ], there are relevant options in our program initialization setup, we need to let the program know at the beginning of program run which gene isomer we want to make energy predictions under the substituent engineering model next. For [ 2 ] and [ 3 ], relevant information is extracted from the program from the "gene" sequence (mentioned in 1.0).
Specifically, as shown in fig. 4-5, the energy prediction of the candidate seed isomer based on the mortise and tenon model includes: obtaining the energy of a model with the weight of x under A-G and the energy of a model with the weight of y under A-A based on a gene sequence, wherein A is a gene isomer, and G is a substituent; acquiring the energy of a regular octahedron skeleton; and obtaining the energy of the candidate seed isomer based on the energy of the model with the weight of x under the A-G, the energy of the model with the weight of y under the A-A and the energy of the regular octahedron skeleton.
As shown in fig. 4, a structure and an energy prediction flow chart are constructed for the ground state structure prediction method, taking the gene sequence < aapgg > as an example, (a) the 1,2,3,4 sites are placed with the a substituent CH 3, and (b) the 5,6 sites are placed with the G substituent SiH 3.
Aaaa: extracting the modular unit energy of which the weight is 11 under A (seed isomer) -A (substituent), G G: extracting the energy of a module unit with weight of 4 under the condition of A (seed isomer) -G (substituent).
The energy of the built structure is the total hydrogen system energy, and the sum of the module unit energy with the weight value of 11 and the module unit energy with the weight value of 4.
Taking the gene sequence AAAAGG as an example, the substituent engineering model program automatically builds a corresponding structure according to the gene sequence AAAAGG, wherein the < 12 34 > substitution site is used for placing the < A substituent CH 3 >, < 56 > substitution site is used for placing the < G substituent SIH 3 >.
And extracting energy from the database according to the corresponding weight.
As shown in FIG. 5, when we want to predict the energy E pre of structure I, we only need to select the energy of two module units E model (II) and E model (III) from the database and add the energy of all H system of regular octahedron seed structure A, and the calculation formula is E pre (I)= Emodel(II) + Emodel(III) + Eskeleton (A). In the above formula, E model = EISO-initial –Eskeleton, EISO-initial is the true calculated energy of the isomer, and E skeleton (A) is the full H system energy of the regular octahedron seed structure A.
As shown in fig. 6, the process of obtaining the energy of the model with the weight x under the a-G and the energy of the model with the weight y under the A-A is as follows: determining candidate seed isomers as backbones; determining the substituents contained in the candidate seed isomer; determining the site occupied by each substituent; searching a model with the weight of A-G being x and a model with the weight of A-A being y from a database based on the determined candidate seed isomer, the substituent contained in the candidate seed isomer and the site occupied by each substituent; and obtaining the energy of the model with the weight value x under the A-G and the energy of the model with the weight value y under the A-a.
In this embodiment, regarding the determination of candidate seed isomers as backbones, there are relevant options in the program initialization setup to determine which gene isomer to energy predict, and for the determination of substituents contained in the candidate seed isomers and the sites occupied by each substituent, it is extracted from the gene sequence.
Specifically, the energy of the candidate seed isomer is calculated as follows:
;
In the formula, As the energy of the candidate seed isomer,For the energy of the model with weight x under a-G,For the energy of the model with weight y under A-A,The energy of each model and the energy of the regular octahedron skeleton are known, because the molecular structure of the model is known, the energy can be directly obtained through the existing molecular structure energy measuring method, such as a thermochemical method and a spectrum method, and the model result and the corresponding energy are stored in a database.
The energy calculation formula of the model with weight x under A-G is as follows:
;
In the formula, The energy actually calculated for the model with weight x under A-G.
The energy of the model with the weight value of y under the A-A is calculated as follows:
;
In the formula, The energy actually calculated for the model with weight y under A-G.
In this embodiment, the core in performing energy calculation and prediction for synthesizing and analyzing energy changes from the basic structure to the complex molecular model can provide accurate energy prediction, i.e., optimize and predict the ground state characteristics of the compound through structure and energy analysis.
In summary, the present application has at least the following effects:
Not only are all possible ground state structures considered (based on the "seed" strategy of the double potential energy face), but also the diversity of substituents and the reliability of prediction for large-scale calculations (based on the exhaustive search of the "seed" and "mortise" strategies). Therefore, the research paradigm which can be derived by the invention has universality and can be applied to the structural rule research of various clusters under substituent engineering.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of systems, apparatuses (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.