Disclosure of Invention
The invention provides a high-throughput reaction screening system and a high-throughput reaction screening method based on computer control and data processing, which are used for solving the problem that the importance of experimental parameters is difficult to quantify, and are difficult to screen optimal experimental conditions with high efficiency when a plurality of groups of experimental data are processed, and the technical problems that in the optimization of experimental parameters, the influence of parameter weight and deviation is only estimated based on a reaction result, the influence of sub-optimal solution is possibly misselected or the change of key parameters is underestimated, and in the optimization process, the population diversity is possibly reduced due to insufficient variation or the optimal solution cannot be converged due to excessive variation, so that the optimization effect of chemical experimental conditions is influenced are solved.
The invention relates to a high-throughput reaction screening system and a high-throughput reaction screening method based on computer control and data processing, which concretely comprise the following technical scheme:
the high-throughput reaction screening method based on computer control and data processing comprises the following steps:
S1, performing a parallel chemical reaction experiment to obtain experimental parameters and experimental reaction results, performing standardization treatment on the experimental parameters and the experimental reaction results to obtain standardized parameter values and standardized experimental reaction results, and calculating the relative weights of the experimental parameters by analyzing the correlation between the experimental parameters and the experimental reaction results;
s2, based on relative weights of standardized parameter values and experimental parameters, selecting an optimal parameter vector through a high-flux reaction parameter screening algorithm, and performing inverse standardization processing to obtain actual parameter values for guiding subsequent experiments.
Preferably, the S1 specifically includes:
Calculating the product of the standardized parameter value and the experimental reaction result deviation, and carrying out normalization processing to obtain the relative weight of the experimental parameter.
Preferably, the S2 specifically includes:
the high-flux reaction parameter screening algorithm combines a genetic algorithm and weighted fitness evaluation, and gradually iterates and optimizes experimental parameters through simulating a biological evolution process to obtain an optimal parameter vector.
Preferably, the S2 specifically includes:
In the implementation process of the high-flux reaction parameter screening algorithm, taking the standardized experimental parameter data set as an initial population, calculating an absolute deviation value of a standardized parameter value and a standardized parameter average value for each experimental parameter, introducing an exponential function based on the absolute deviation value, and calculating the adaptability of an individual by combining the predicted reaction result of the individual and the relative weight of the experimental parameter;
The individual prediction reaction result is calculated according to a linear regression model, and the individual fitness calculation is performed by using the real standardized experimental reaction result during initialization.
Preferably, the S2 specifically includes:
Pairing individuals in the current population pairwise, performing cross operation to obtain crossed individuals, and performing fitness calculation on the crossed individuals to obtain fitness of the crossed individuals.
Preferably, the S2 specifically includes:
introducing an adaptive mutation mechanism, and executing mutation operation on the crossed individuals to obtain mutated individuals.
Preferably, the S2 specifically includes:
In the implementation process of the self-adaptive mutation mechanism, the mutation probability is calculated based on the relative weight of experimental parameters, the current iteration progress and the discrete degree of the population, and the specific formula is as follows:
Wherein, Represent the firstVariation probability of each experimental parameter; Represent the first Relative weights of the individual experimental parameters; Representing a sine function; Representing the circumference ratio; Representing the total iteration number; Representing the iteration progress proportion; The number of groups representing parallel chemical reaction experiments; Represent the first Group of generationsIndividual firstA normalized parameter value; Represent the first Initial mean values of the individual normalized parameters; representing the standardized degree of dispersion of the population; Represent the first The parameters are atStandard deviation in group parallel chemical reaction experiments; the number of parameters of each group of experimental data is represented; Representing a globally adjusted coefficient of variation probability.
Preferably, the S2 specifically includes:
When the variation probability is larger than a preset probability variation threshold, the variation of the experimental parameters is indicated, the mutated experimental parameters are calculated by adding random disturbance to the current standardized parameter values, mutated individuals are obtained, and the fitness of the mutated individuals is re-evaluated to obtain the fitness of the mutated individuals.
Preferably, the S2 specifically includes:
Comparing the fitness of the original individuals, the crossed individuals and the mutated individuals, selecting the individuals with the highest fitness as the individuals of the next generation, repeating the processes of crossing, mutation, selection and updating until the preset iteration times are reached, traversing all the individuals in the population of the last generation, selecting the individuals with the highest fitness as elite individuals, and taking the parameter vector of the elite individuals as the optimal parameter vector.
A high throughput reaction screening system based on computer control and data processing comprising the following:
The system comprises a reaction device, a data acquisition module, a data processing module, an optimization screening module and a control module;
The reaction device is used as a physical hardware part for executing parallel chemical reaction experiments to generate experimental data, wherein the experimental data comprises experimental parameters and corresponding experimental reaction results;
The data acquisition module is used for acquiring experimental data from the reaction device and outputting the experimental data to the data processing module;
The data processing module is used for carrying out standardized processing on the experimental data from the data acquisition module to obtain standardized parameter values and standardized experimental reaction results, and calculating the relative weights of the experimental parameters;
The optimization screening module is used for selecting an optimal parameter vector through a high-flux reaction parameter screening algorithm based on the standardized parameter value of the data processing module, the standardized experiment reaction result and the relative weight of the experiment parameter, and performing inverse standardization processing to convert the optimal parameter vector into an actual parameter value;
The control module is used for receiving the actual parameter values from the optimization screening module, generating control signals for guiding subsequent experiments and outputting the control signals to the reaction device.
The technical scheme of the invention has the beneficial effects that:
1. through standardized processing, dimensional differences are eliminated, data processing efficiency and accuracy are improved, and comparability and consistency of data are enhanced.
2. By calculating the relative weight of the experimental parameters, the influence degree of the experimental parameters on the experimental reaction result is quantized, the accurate identification of key influence factors of chemical reaction is realized, the experimental cost is reduced, and the experimental efficiency is improved.
3. The standardized experimental parameter data set is used as an initial population through a high-throughput reaction parameter screening algorithm, the genetic algorithm is combined, the biological evolution process is simulated through crossover, mutation and selection operation, the parameter combination is gradually optimized, the optimal experimental condition capable of maximizing the experimental reaction result (such as yield) is automatically screened out, the resource waste of a large number of invalid experiments in the traditional trial-and-error method is avoided, the manual intervention is reduced, and the scientificity and the efficiency of the experimental design are improved.
4. Through the global optimization capability of the genetic algorithm and the combination of the relative weights of experimental parameters, the optimization efficiency and the reliability of the result are improved.
5. The self-adaptive mutation mechanism balances exploration and convergence, enhances the robustness and applicability of the high-throughput reaction parameter screening algorithm, and is suitable for different chemical reaction scenes.
Detailed Description
In order to further illustrate the technical means and effects adopted by the present invention to achieve the preset purpose, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes the specific scheme of the high-throughput reaction screening system and method based on computer control and data processing provided by the invention with reference to the accompanying drawings.
Referring to FIG. 1, there is shown a block diagram of a high throughput, computer control and data processing based reaction screening system according to one embodiment of the present invention, the system comprising:
The system comprises a reaction device, a data acquisition module, a data processing module, an optimization screening module and a control module;
the reaction device is used as a physical hardware part for executing parallel chemical reaction experiments, and a plurality of groups of parallel chemical reaction experiments are simultaneously operated under different parameter conditions (such as temperature, pressure, concentration and the like) through a plurality of parallel reaction units (such as micro-reactors or multi-channel reactors) to generate experimental data, wherein the experimental data comprises experimental parameters and corresponding experimental reaction results;
The data acquisition module is used for acquiring experimental data from the reaction device and outputting the experimental data to the data processing module;
The data processing module is used for carrying out standardized processing on the experimental data from the data acquisition module to obtain standardized parameter values and standardized experimental reaction results, and calculating the relative weights of the experimental parameters;
The optimization screening module is used for selecting an optimal parameter vector from a plurality of groups of chemical reaction experimental data through a high-flux reaction parameter screening algorithm based on the standardized parameter value, the standardized experimental reaction result and the relative weight of experimental parameters of the data processing module, performing inverse standardization processing and converting the optimal parameter vector into an actual parameter value;
The control module is used for receiving the actual parameter values from the optimization screening module, generating control signals for guiding subsequent experiments and outputting the control signals to the reaction device.
Referring to fig. 2, a flowchart of a high throughput reaction screening method based on computer control and data processing according to one embodiment of the present invention is shown, the method comprising the steps of:
S1, performing a parallel chemical reaction experiment to obtain experimental parameters and experimental reaction results, performing standardization treatment on the experimental parameters and the experimental reaction results to obtain standardized parameter values and standardized experimental reaction results, and calculating the relative weights of the experimental parameters by analyzing the correlation between the experimental parameters and the experimental reaction results;
by reaction means Parallel chemical reaction experiments to generate experimental data, and obtaining each group of experimental dataParameters (such as temperature, pressure, concentration, etc.) and corresponding experimental reaction results, the experimental parameters being expressed as vectors,Represent the firstGroup experimentA parameter value;
in order to eliminate the dimension difference of the parameters, carrying out standardized treatment on each group of experimental parameters and experimental reaction results to obtain standardized parameter values and standardized experimental reaction results, wherein the standardized treatment formula is as follows by taking the experimental parameters as an example:
Wherein, Represent the firstGroup experimentA normalized parameter value; Represent the first Group experimentA parameter value; Represent the first The parameters are atAverage value in group experiment, the calculation formula is:; Represent the first The parameters are atStandard deviation in the group experiments, the calculation formula is:;
In order to quickly identify key parameters, invalid experiments are reduced, efficiency is improved, and the influence degree of each experimental parameter on the experimental reaction result is quantified by analyzing the correlation between the experimental parameters and the experimental reaction result, so that a group of weight values are generated;
for each experimental parameter, traversing each group of experimental data, and calculating experimental reaction result deviation based on a standardized experimental reaction result; calculating the product of the standardized parameter value and the deviation of the experimental reaction result to reflect the relevance between the experimental parameter change and the experimental reaction result change, if the experimental parameter is increased, the experimental reaction result also deviates from the average value obviously, which indicates that the experimental parameter has a larger influence on the experimental reaction result;
In order to convert the influence degree of each experimental parameter into relative weight, normalization processing is needed, the sum of absolute values of products of all standardized parameter values and experimental reaction result deviations is calculated, namely, the sum of contributions of each experimental parameter is added up to obtain a global total value, the sum of contributions of each experimental parameter is divided by the global total value to obtain a ratio value between 0 and 1, the ratio value is used as the relative weight of the experimental parameter, the relative importance of the experimental parameter to the experimental reaction result is represented, the larger the relative weight is, the more obvious the influence of the experimental parameter to the experimental reaction result is explained, otherwise, the influence is smaller, the calculation formula of the relative weight is as follows:
Wherein, Represent the firstThe relative weights of the experimental parameters are used for quantifying the influence degree of each experimental parameter on the experimental reaction result; the average value of the standardized experimental reaction results of all groups is represented, and the calculation formula is as follows: ; Representing the deviation of experimental reaction results; Representing the product of the normalized parameter value and the deviation of the experimental reaction result for reflecting the first Group experimentContribution of individual experimental parameters to the deviation of experimental reaction results from the mean; Represent the first Group standardization experiment reaction results; Representing double summation, and calculating the sum of absolute values of deviation products of standardized parameter values and experimental reaction results in all experiments for normalization processing;
S2, based on relative weights of standardized parameter values and experimental parameters, selecting an optimal parameter vector through a high-throughput reaction parameter screening algorithm, and performing inverse standardization processing to obtain actual parameter values for guiding subsequent experiments;
based on the relative weights of the standardized parameters and the experimental parameters, the optimal experimental conditions are screened out from a plurality of groups of parallel chemical reaction experimental data through a high-throughput reaction parameter screening algorithm;
The high-throughput reaction parameter screening algorithm combines a genetic algorithm and weighted fitness evaluation, gradually iterates and optimizes experimental parameters through simulating a biological evolution process, and finally outputs an optimal parameter vector capable of maximizing experimental reaction results;
Taking the standardized experimental parameter data set as an initial population Each individual in the initial population is a vector of parameters representing a set of experimental conditions that will be progressively optimized by the subsequent evolutionary process, and in order to facilitate tracking the variation of each iteration, each generation of population will be marked as a different version, starting from the first generation, each generation being marked as:, representing the number of current iterations and, The total iteration number is represented, and can be specifically set according to specific implementation scenes, and is not limited herein;
The fitness of each individual is calculated to evaluate the advantages and disadvantages of the individual, the calculation of the fitness not only considers experimental reaction results, but also introduces additional weighting factors to comprehensively evaluate the potential of each individual, and the higher the fitness is, the more likely the experimental parameter data set is to be close to an optimal solution; then, analyzing the deviation of each experimental parameter and the average value of the initial population, calculating the absolute deviation of the standardized parameter value and the average value of the standardized parameter for each experimental parameter, taking the negative absolute deviation value as the input of an exponential function, ensuring that the larger the deviation is, the smaller the function value is, so as to reflect the punishment of the individual from the initial state, and adjusting according to the importance of the experimental parameter (namely the relative weight of the experimental parameter), wherein the experimental parameter with larger relative weight has larger contribution to the adaptability;
the fitness is calculated by the following formula:
Wherein, Represent the firstIndividual at the firstFitness of generation; the relative weights used for balancing the experimental reaction result and experimental parameters are represented by the adjustment coefficients, and can be specifically set according to specific implementation scenes without limitation; representation pair Summing the experimental parameters; representing an exponential function, expressed as a natural constant Is a bottom; Represent the first Group of generationsIndividual firstA normalized parameter value; Represent the first The initial average value of the individual normalized parameters,;Represent the firstGroup of generationsIndividual firstAbsolute deviation of the individual normalized parameter values from the initial mean; Represent the first Individual at the firstThe prediction reaction result of the generation is obtained through the prediction of the existing linear regression model, and the calculation formula is as follows:
Wherein, ,Representing a set of predicted reaction results, regression coefficient vectorsCalculated by a regression coefficient formula, which is a technical means well known to those skilled in the art, and will not be described in detail herein;
In order to generate a new generation population, pairing individuals of the current population in pairs for cross operation, wherein the pairing rule ensures that each individual is only used once in one iteration, if the number of the individuals of the population is even, the individuals are just divided into a plurality of pairs, if the number of the individuals of the population is odd, one individual is randomly selected to be directly reserved to the next generation, and the other individuals are paired;
The crossover operation is represented as follows:
Wherein, AndRespectively represent the post-crossing firstThe first generation groupIndividual and the firstIndividual firstA normalized parameter value; representing random crossing points [ ] );AndRespectively represent the firstRandomly selected generation groupIndividual and the firstIndividual firstA normalized parameter value;
The crossed individuals are not directly added into the population, but the steps of previous fitness calculation are needed to be repeated, and the fitness of the crossed individuals is calculated AndTo ensure that individuals after crossing can also be fairly compared;
In order to further increase population diversity, introducing an adaptive mutation mechanism, and executing mutation operation on crossed individuals; the variation probability is dynamically regulated according to the relative weight of experimental parameters, the current iteration progress and the discrete degree of the population, specifically, for each experimental parameter, the product of the relative weight of the experimental parameter and a sine function is calculated, the current iteration times, the total iteration times and the ratio of the deviation degree of a standardized parameter value in the population to the standard deviation are used as the input of the sine function, the sum of the relative weights of all experimental parameters is divided by the sum of the relative weights of all experimental parameters, and a global regulation variation probability coefficient is multiplied to obtain the variation probability, if the variation probability is larger than a preset probability variation threshold, the experimental parameters are subjected to variation, the experimental parameters are obtained by adding a random disturbance (obtained by multiplying the variation probability by a random number ranging from-1 to 1) to the current standardized parameter value, otherwise, the variation is kept unchanged, and the self-adaptive variation mechanism ensures that the early exploration space is wide and the later period gradually converges;
The calculation formula of the variation probability is as follows:
Wherein, Represent the firstVariation probability of each experimental parameter; Representing a sine function for periodically adjusting the variation probability; Representing the circumference ratio; representing the iteration progress proportion, so that the variation probability changes along with the iteration times; The standardized discrete degree is expressed and used for reflecting population diversity and influencing variation probability; Represent the first The parameters are atStandard deviation in group parallel chemical reaction experiments; The overall adjustment mutation probability coefficient is used for controlling the mutation intensity, can be specifically set according to specific implementation scenes and is not limited herein;
the calculation formula of the experimental parameters after mutation is as follows:
Wherein, Represent the firstPost-population variation of generationIndividual firstA parameter value; Represent the first Post-population variation of generationIndividual firstA parameter value; Representation of Random numbers within the interval; the threshold value of the variation probability can be specifically set according to specific implementation scenarios, and is not limited herein;
Individuals after mutation also need to re-evaluate fitness AndThe calculation process is consistent with the previous process;
For each pair of paired individuals, comparing fitness values of an original individual, a crossed individual and a mutated individual, selecting the individual with the highest fitness as an individual of the next generation, and ensuring that the individuals in each generation of population are currently optimal, thereby promoting the whole optimization process to be continuously forward, wherein the formula is as follows:
Wherein, AndRepresenting the updated next generation of individuals; representing a maximum value operation to select an individual having the highest fitness;
Repeating the above-mentioned processes of crossing, mutation, selection and updating until the preset iteration times are reached, in the last generation population, traversing all individuals, selecting the individual with highest fitness as final solution, and the optimum individual is called elite individual, and the parameter vector of elite individual represents the optimum experimental condition selected by means of multiple evolutionary screening, further making anti-standardization treatment, and converting into actual parameter value for guiding subsequent experiment or application.
In summary, a high throughput reaction screening system and method based on computer control and data processing is completed.
The sequence of the embodiments of the invention is merely for description and does not represent the advantages or disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and the same or similar parts of each embodiment are referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing embodiments are merely for illustrating the technical solution of the present invention, but not for limiting the same, and although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that the technical solution described in the foregoing embodiments may be modified or substituted for some of the technical features thereof, and that such modifications or substitutions do not depart from the spirit and scope of the technical solution of the embodiments of the present invention and are intended to be included in the scope of the present invention.