[go: up one dir, main page]

CN119025844A - Optimization method of flocculation flotation algae removal process based on machine learning - Google Patents

Optimization method of flocculation flotation algae removal process based on machine learning Download PDF

Info

Publication number
CN119025844A
CN119025844A CN202410915748.8A CN202410915748A CN119025844A CN 119025844 A CN119025844 A CN 119025844A CN 202410915748 A CN202410915748 A CN 202410915748A CN 119025844 A CN119025844 A CN 119025844A
Authority
CN
China
Prior art keywords
flocculation
flotation
model
algae removal
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410915748.8A
Other languages
Chinese (zh)
Inventor
侯俊
赵骁
杨梓俊
苗令占
吴军
尤国祥
张鸣智
周宁远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN202410915748.8A priority Critical patent/CN119025844A/en
Publication of CN119025844A publication Critical patent/CN119025844A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Separation Of Suspended Particles By Flocculating Agents (AREA)

Abstract

本发明提供一种基于机器学习的絮凝气浮除藻工艺优化方法,采用H2O自动机器学习平台(H2O AutoML)进行絮凝气浮除藻工艺的优化,提高藻水处理效率和水质安全性。在本发明的基于机器学习的絮凝气浮除藻工艺优化方法中,首先获取絮凝气浮除藻处理历史数据并划分为训练集和测试集;H2O AutoML平台自动执行多种机器学习算法的训练,并根据评价指标选择最优模型,再对最优模型进行超参数优化后获得絮凝气浮除藻工艺优化控制模型,再将优化控制模型部署到水处理设施的控制系统中,根据实时的水质数据动态预测并调整絮凝剂的投加量、絮凝条件和气浮条件。由此通过自动化的机器学习模型选择和参数调优,以适应实时变化的水质条件,优化絮凝剂的投加量和操作参数。

The present invention provides a flocculation and flotation algae removal process optimization method based on machine learning, which uses an H2O automatic machine learning platform ( H2O AutoML) to optimize the flocculation and flotation algae removal process, thereby improving the algae water treatment efficiency and water quality safety. In the flocculation and flotation algae removal process optimization method based on machine learning of the present invention, the flocculation and flotation algae removal treatment history data is first obtained and divided into a training set and a test set; the H2O AutoML platform automatically performs the training of multiple machine learning algorithms, and selects the optimal model according to the evaluation index, and then optimizes the optimal model by performing hyperparameter optimization to obtain the flocculation and flotation algae removal process optimization control model, and then deploys the optimization control model to the control system of the water treatment facility, and dynamically predicts and adjusts the flocculant dosage, flocculation conditions and flotation conditions according to the real-time water quality data. Thus, through the automated machine learning model selection and parameter tuning, the flocculant dosage and operating parameters are optimized to adapt to the real-time changing water quality conditions.

Description

Flocculation air floatation algae removal process optimization method based on machine learning
Technical Field
The invention relates to the technical field of water treatment, in particular to a flocculation air floatation algae removal process optimization method based on machine learning.
Background
Flocculation air flotation is a common means for water treatment in which algae and other suspended materials in water are agglomerated into larger particles by the addition of a flocculant, and the particles are separated from the water by air flotation. In the flocculation air flotation process, parameters such as the type of flocculant, the addition amount, the pressure and the time of air flotation and the like are usually required to be adjusted according to specific water quality and treatment targets. Traditionally, the selection of the operation parameters is often optimized by experience or repeated experiments, the accurate control is lacked, the real-time adaptation to the change of the water quality is difficult, and the optimization process is tedious and easily influenced by the fluctuation of the water quality, so that the treatment efficiency and the water quality safety are difficult to ensure.
Therefore, the conventional algae removal method by the flocculation air floatation technology is time-consuming and labor-consuming, the flocculation air floatation process is very sensitive to the change of water quality, the concentration and type of pollutants in water can cause instability of the treatment effect, parameter optimization and adjustment are required to be carried out again, time and resources are consumed, inconsistent treatment effect is easily caused by subjective judgment of operators, and the treated water quality fluctuates due to the influence of the instability of the operation parameters and the fluctuation of the water quality, so that the efficiency of water supply treatment and the safety of the treated water quality are challenged.
Disclosure of Invention
In view of the defects of the prior art, the invention aims to provide a machine learning-based flocculation air flotation algae removal process optimization method, which adopts an H 2 O-based automatic machine learning platform (H 2 O AutoML) to optimize the flocculation air flotation algae removal process, thereby improving algae water treatment efficiency and water quality safety.
In the machine learning-based flocculation air flotation algae removal process optimization method, the automatic machine learning model selection and parameter tuning can adapt to the water quality conditions which change in real time, and the addition amount and the operation parameters of the flocculant are scientifically optimized.
According to a first aspect of the present invention, a machine learning-based flocculation air flotation algae removal process optimization method is provided, comprising the following steps:
Acquiring flocculation air flotation algae removal treatment historical data, wherein the flocculation air flotation algae removal treatment historical data comprises historical water quality data, flocculation air flotation process parameters and algae treatment data, and preprocessing the historical data to obtain a training data set;
carrying out data division on the training data set, and determining a training set and a testing set according to a preset proportion;
The H 2 O AutoML platform automatically executes the training of a plurality of machine learning algorithms by utilizing the training set and the testing set, and selects an optimal model according to a preset evaluation index;
performing super-parameter optimization on the optimal model to obtain an optimized control model of the flocculation air flotation algae removal process;
Deploying a flocculation air flotation algae removal process optimization control model into a control system of a water treatment facility, dynamically predicting flocculation air flotation operation parameters according to water quality data acquired in real time, and adjusting the addition amount of an actual flocculant, flocculation conditions and air flotation conditions; and
And (5) evaluating the algae removal effect according to the algae removal treatment of the actual flocculation air floatation operation parameters.
In a further embodiment, the flocculation air flotation algae removal treatment history data includes:
Historical water quality data: turbidity, algae species and quantity, pH, water temperature, conductivity;
Flocculation air floatation process parameters: air floatation time, coagulant addition amount and stirring intensity;
Algae treatment data: the algae species and quantity, turbidity and physical and chemical indicators of algae, including solubility, binding extracellular polymers and algae cell Zeta potential.
In a further embodiment, the machine learning algorithm executed on the H 2 O AutoML platform includes a GLM random forest, a DRF distributed random forest, an XRT extreme random forest, DEEPLEARNING deep learning, an XGboost, a GBM gradient hoist, and a Stack integrated Stack model, and selects an optimal model based on a predetermined evaluation index.
In a further embodiment, the H 2 O AutoML platform automatically performs training process configuration of a plurality of machine learning algorithms includes:
Selecting a variable of model prediction output, namely setting a response_column value;
Setting AutoML the maximum run time to 300s, i.e., the max_ runtime _secs parameter is set to 300s;
The maximum number of models explored before stopping is adjusted AutoML to 10, i.e., the max_ models parameter is set to 10;
providing a random seed for AutoML running processes to ensure the repeatability of the experiment, wherein the seed number is 1234, namely the seed parameter is set to 1234;
The distribution function supported by each machine learning algorithm is selected as AUTO, namely, the distribution parameter is set as AUTO;
The number of cross-validation folds is set to 5, which helps to evaluate the stability and generalization ability of the model, i.e., nfolds parameters are set to 5;
the training stop condition stop_ rounds parameter is set to 3, namely, the model performance is not improved in the specified number of rounds, and the training process is stopped;
The keep_cross_validation_ models parameter is set to Ture, leaving a cross-validated model.
In a further embodiment, the performing the super parameter optimization on the optimal model includes:
The model parameters are automatically adjusted using grid search or random search techniques of the H 2 O AutoML platform.
In a further embodiment, the method further comprises the steps of:
And analyzing key factors influencing the algae removal efficiency and weights thereof by using a model interpretation function of the H 2 O AutoML platform.
In a further embodiment, the model interpretation method used includes residual analysis, variable importance analysis, shapley interpretation method and partial dependency curve, which are used to interpret the prediction result of the optimal model obtained by training the H 2 O AutoML platform.
In a further embodiment, the method further comprises the steps of:
And acquiring water quality data, actual flocculation air flotation operation parameters and algae removal effect according to a preset period, dynamically training and updating a flocculation air flotation algae removal process optimization control model, and redeploying the updated flocculation air flotation algae removal process optimization control model.
According to the automatic machine learning-based flocculation air flotation algae removal process optimization method, indexes such as algae quantity and turbidity of water can be predicted according to the input water quality parameters and process parameters through the prediction model, so that the setting of the flocculant addition amount and air flotation operation conditions is not blind any more, the rapidly-changing water quality conditions can be met, the continuous meeting of the safety standard of the water quality is ensured, the social benefit is good, the economic benefit is high, and a water plant process manager can be assisted to rapidly make a proper flocculation air flotation process decision.
According to the invention, an automatic machine learning module is introduced in the flocculation air floatation process prediction, so that model selection and super-parameter optimization can be automatically realized, a plurality of basic machine learning algorithms are integrated, and the accuracy and generalization capability of the model are improved; errors caused by human factors, such as errors of manual data processing and parameter selection, are reduced through automatic model development, and the repeatability and reliability of the whole model development process are improved.
Compared with the prior art, the implementation of the flocculation air flotation algae removal process optimization method has the remarkable beneficial effects that:
1. Providing automated and intelligent decision support: the H 2 O AutoML platform is used for automatically executing the training of a plurality of machine learning algorithms, an optimal model can be selected from a large number of possible models, and the workload of professional technicians in the aspects of model selection and parameter adjustment is reduced through automatic decision optimization, so that the accuracy and efficiency of decision making are improved;
2. Providing a data driven optimization process: by utilizing historical data and real-time data, the method can dynamically adjust flocculation air floatation operation parameters such as the addition amount of a flocculating agent, flocculation conditions and air floatation conditions, the water treatment process is more accurate by a data driving method, and quick response can be made according to real-time change of water quality
3. Improving the treatment efficiency and economy: the scientifically and reasonably optimized flocculation air floatation operation parameters can improve the algae removal effect, reduce unnecessary chemical use and reduce the operation cost;
4. Enhancing the stability and reliability of the system: by continuously monitoring and adjusting the operation parameters, the method can cope with the fluctuation and change of the original water quality, keep the stability and reliability of the treatment effect, especially cope with seasonal change or water quality change in sudden events, timely and dynamically adjust the operation process parameters according to the water quality change in real time, and ensure the high efficiency and water quality safety of the water quality treatment;
5. continuous performance monitoring and evaluation: by continuously evaluating the actual flocculation air floatation operation parameters and the algae removal effect, the method can monitor the performance of the whole system in real time, and the continuous monitoring is helpful for timely finding problems and carrying out necessary adjustment, so that the water treatment effect is ensured;
6. Easy to expand and suitable for different scenarios: the optimization method can be easily adapted to water treatment facilities of different scales and different water quality conditions through online updating and deployment of the machine learning model. And as more data is accumulated, the predictive power and accuracy of the model will further increase.
It should be understood that all combinations of the foregoing concepts, as well as additional concepts described in more detail below, may be considered a part of the inventive subject matter of the present disclosure as long as such concepts are not mutually inconsistent. In addition, all combinations of claimed subject matter are considered part of the disclosed inventive subject matter.
The foregoing and other aspects, embodiments, and features of the present teachings will be more fully understood from the following description, taken together with the accompanying drawings. Other additional aspects of the invention, such as features and/or advantages of the exemplary embodiments, will be apparent from the description which follows, or may be learned by practice of the embodiments according to the teachings of the invention.
Drawings
The drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures may be represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. Embodiments of various aspects of the invention will now be described, by way of example, with reference to the accompanying drawings.
Fig. 1is a flow chart of a machine learning based flocculation air flotation algae removal process optimization method in accordance with an embodiment of the present invention.
Fig. 2 is a diagram of exploratory data analysis of raw data provided by the present invention.
Fig. 3 is a pearson product moment correlation coefficient matrix diagram of raw data provided by the present invention.
Fig. 4 is a candidate model and an accuracy chart thereof during training of the H2O AutoML platform provided by the invention.
Fig. 5 is a residual analysis chart of the best training model provided by the invention.
FIG. 6 is a learning graph of an integrated stacked training model provided by the present invention.
FIG. 7 is a graph of variable importance of the best training model provided by the present invention.
FIG. 8 is a SHAP abstract of the best training model provided by the present invention.
FIG. 9 is a partial dependency graph of the variables of the top six models provided by the present invention.
Detailed Description
For a better understanding of the technical content of the present invention, specific examples are set forth below, along with the accompanying drawings.
Aspects of the invention are described in this disclosure with reference to the drawings, in which are shown a number of illustrative embodiments. The embodiments of the present disclosure are not necessarily intended to include all aspects of the invention. It should be understood that the various concepts and embodiments described above, as well as those described in more detail below, may be implemented in any of a number of ways, as the disclosed concepts and embodiments are not limited to any implementation. Additionally, some aspects of the disclosure may be used alone or in any suitable combination with other aspects of the disclosure.
{ Flocculation air flotation algae removal process optimization method based on machine learning })
The implementation process of the flocculation air flotation algae removal process optimization method based on machine learning in combination with the embodiment shown in fig. 1 comprises the following steps:
Acquiring flocculation air flotation algae removal treatment historical data, wherein the flocculation air flotation algae removal treatment historical data comprises historical water quality data, flocculation air flotation process parameters and algae treatment data, and preprocessing the historical data to obtain a training data set;
carrying out data division on the training data set, and determining a training set and a testing set according to a preset proportion;
The H 2 O AutoML platform automatically executes the training of a plurality of machine learning algorithms by utilizing the training set and the testing set, and selects an optimal model according to a preset evaluation index;
performing super-parameter optimization on the optimal model to obtain an optimized control model of the flocculation air flotation algae removal process;
Deploying a flocculation air flotation algae removal process optimization control model into a control system of a water treatment facility, dynamically predicting flocculation air flotation operation parameters according to water quality data acquired in real time, and adjusting the addition amount of an actual flocculant, flocculation conditions and air flotation conditions; and
And (5) evaluating the algae removal effect according to the algae removal treatment of the actual flocculation air floatation operation parameters.
In a further embodiment, the flocculation air flotation algae removal treatment history data includes:
Historical water quality data: turbidity, algae species and quantity, pH, water temperature, conductivity;
Flocculation air floatation process parameters: air floatation time, coagulant addition amount and stirring intensity;
Algae treatment data: the algae species and quantity, turbidity and physical and chemical indicators of algae, including solubility, binding extracellular polymers and algae cell Zeta potential.
And obtaining a training data set through pretreatment aiming at the obtained flocculation air flotation algae removal treatment historical data.
In an embodiment of the present invention, the preprocessing includes: data normalization and outlier processing.
Through the data normalization to the historical data, be used for converting the data of different dimension or numerical range in the historical data to same scale, the contribution of balanced characteristic to the model to the convergence performance when doing model training prevents the convergence problem that the characteristic data scale difference caused, improves the precision of training the model simultaneously, more easily catches the slight change in the data, thereby improves the degree of accuracy and the robustness of model.
As an example, the normalization method of the present embodiment employs a Min-Max normalization (Min-Max normalization) or a Z-score normalization (standard or Z-score normalization) method.
The foregoing outlier processing, also referred to as outlier processing, means that for identifying and processing values that are significantly different from or significantly unreasonable to most data, the effects of outliers on model training are eliminated or mitigated by direct deletion/conditional deletion, substitution, clipping, transformation, etc. In the embodiment of the invention, a direct deleting method is selected, abnormal value data is proposed to eliminate noise and errors in the data, the overall quality and the signal to noise ratio of a data set are improved, the accuracy and the reliability of model training are improved, and the model errors and the prediction error influence caused by the abnormal value are eliminated.
In the embodiment of the invention, the data dividing ratio of the training set to the test set is 0.8:0.2.
In a further embodiment, the machine learning algorithm executed on the H 2 O AutoML platform includes a GLM random forest, a DRF distributed random forest, an XRT extreme random forest, DEEPLEARNING deep learning, an XGboost, a GBM gradient hoist, and a Stack integrated Stack model, and selects an optimal model based on a predetermined evaluation index.
In an embodiment of the invention, the segmented training set is selected as model training data, i.e. specified by the training_frame parameter.
In a further embodiment, the H 2 O AutoML platform automatically performs training process configuration of a plurality of machine learning algorithms includes:
Selecting a variable of model prediction output, namely setting a response_column value;
Setting AutoML the maximum run time to 300s, i.e., the max_ runtime _secs parameter is set to 300s;
The maximum number of models explored before stopping is adjusted AutoML to 10, i.e., the max_ models parameter is set to 10;
providing a random seed for AutoML running processes to ensure the repeatability of the experiment, wherein the seed number is 1234, namely the seed parameter is set to 1234;
The distribution function supported by each machine learning algorithm is selected as AUTO, namely, the distribution parameter is set as AUTO;
The number of cross-validation folds is set to 5, which helps to evaluate the stability and generalization ability of the model, i.e., nfolds parameters are set to 5;
the training stop condition stop_ rounds parameter is set to 3, namely, the model performance is not improved in the specified number of rounds, and the training process is stopped;
The keep_cross_validation_ models parameter is set to Ture, leaving a cross-validated model.
In an embodiment of the invention, further, in the model training process, the index for evaluating the model performance uses a mean square error (RMSE), a Mean Absolute Error (MAE), or a decision coefficient (r 2).
The mean square error RMSE is the average of the squares of the differences between the predicted and the actual values measured in the regression task and is used to evaluate the accuracy of the model predictions.
Mean absolute error MAE is the average of the absolute values of the deviations of all individual observations from the true value (or arithmetic mean).
The higher the decision coefficient r 2, i.e. the degree of interpretation of the dependent variable (independent variable), between 0 and 1, the better the fitting of the model.
In the method of the invention, the mean square error, the mean absolute error and the decision coefficients are selected to evaluate the performance of each model on a test set.
In a further embodiment, the performing the super parameter optimization on the optimal model includes:
The model parameters are automatically adjusted using grid search or random search techniques of the H 2 O AutoML platform.
In a further embodiment, the method further comprises the steps of:
And analyzing key factors influencing the algae removal efficiency and weights thereof by using a model interpretation function of the H 2 O AutoML platform.
In a further embodiment, the model interpretation method includes residual analysis, variable importance analysis, shape interpretation method and partial dependency curve, which are used to interpret the prediction result of the optimal model obtained by training the H 2 O AutoML platform.
The residual analysis refers to the difference between the actual observed value and the model predicted value, and the reliability, periodicity or other interference condition of the data can be analyzed through the information provided by the residual.
The variable importance analysis is a method for measuring the influence degree of each input characteristic in the model on the predicted result.
The Shapley interpretation measures the average contribution of a single feature to model predictions when considering interactions with other features.
The partial dependency curve analysis shows how the feature variables affect the model predictions by calculating the marginal effect meters of one (or both) of the input parameters to the prediction model.
In a further embodiment, the method further comprises the steps of:
And acquiring water quality data, actual flocculation air flotation operation parameters and algae removal effect according to a preset period, dynamically training and updating a flocculation air flotation algae removal process optimization control model, and redeploying the updated flocculation air flotation algae removal process optimization control model.
{ Example 1}
To further illustrate the practice of the foregoing method of the present invention, we will now describe in further detail by way of specific embodiments thereof, with reference to the accompanying figures 2-9.
Step 1: 400 pieces of historical data are obtained through laboratory experiment results, wherein the historical data comprise historical water quality data, flocculation air floatation process parameters and algae treatment data.
Water quality data: turbidity, algae species and quantity, pH, water temperature, conductivity, and dissolved oxygen.
Flocculation air floatation process parameters: air floatation time, coagulant addition amount and stirring intensity.
Algae treatment data: the algae species and quantity, turbidity and physical and chemical indicators of algae, including solubility, binding extracellular polymers and algae cell Zeta potential.
Step 2: data exploration, importing seaborn packages in the python script, calling corr and pairplot functions to see the correlation between the original dataset distribution and the data, as shown in fig. 2 and 3.
Specifically, relevant packets (pandas and matplotlib) required by seaborn are downloaded through a pip and configured in a conda environment, CSV type data are read through the pandas packets, a DATAFRAME tabular data structure matrix is constructed, data exploratory analysis is carried out through pairplot functions, and parameters are set to be like kine= "scanner", diag_kine= "kde"; then, the corr function is used to calculate the pearson correlation coefficients between all the numerical columns in DATAFRAME; finally, the pictures are displayed and saved through the showand save functions of matplotlib.
Step 3: initializing an H 2 O AutoML platform, inputting the data read in the step 2, splitting the data into a training set and a verification set at a ratio of 0.8:0.2, and then designating a target column (blue algae removal rate) for prediction and a characteristic column (comprising the water quality parameters, the flocculation air floatation process parameters and the algae treatment data mentioned in the step 1) for training.
The H 2 O AutoML platform automatically trains a plurality of models, and after training, a testing set is used for establishing a flocculation air flotation algae removal process prediction model and obtaining various model performance comparisons, as shown in figure 4.
Specifically, downloading and automatically installing a framework required by a training model through a pip in conda virtual environment; initializing H 2 O AutoML in the python script by init function; the upload_file function reads the data and segments the data set with split, with a training set to validation set ratio of 0.8:0.2, i.e., ratios= [0.8,0.2]; train, column extracts and specifies the feature columns for training: the H2OAutoML function was used to start building the model and finally five-fold cross-validation was used to ensure the reliability of the model.
Wherein, the parameter max_ runtime _secs of the H 2 O AutoML function is set to 300s, and the maximum running time of AutoML is 300s; max_ models is set to 10 for adjusting AutoML the maximum number of models that can be explored before stopping; seed is set to 1234, a random seed is provided for AutoML operation process, and repeatability of model establishment is guaranteed; the distribution function supported by various algorithms selects Auto, namely distribution is set as AUTO; the cross-validation fold number is set to 5, i.e., nfolds is set to 5; stop_ rounds is set to 3; keep_cross_validation_ models is set to Ture, leaving a cross-validated model.
In this embodiment, the mean square error (equation 1), the mean absolute error (equation 2) and the decision coefficient (equation 3) are selected to evaluate the performance of each model of the training over a test set.
The best model for training was Stack_1, the mean square error was 0.05164, the mean absolute error was 0.05308, and the coefficient was 0.972.
Wherein SS res (sum of squares of residuals) is the residual variation, i.e. the sum of squares of the difference between the observed and predicted values; SS tot (sum of squares) is the sum of squares of the total variation, i.e. the difference between the observed value and the average of the observed values.
Step 4: after the training in the step 3 is finished, the best model result of the performance obtained by training the H 2 O AutoML platform is explained through comprehensive interpretation methods such as residual analysis, learning curve, variable importance, shapley interpretation (SHAP) summarization, partial dependence graph (PDP) and the like.
Specifically, the aforementioned interpretable method proceeds directly in the Python script.
Wherein the residual analysis (fig. 5) shows that the residual analysis shows the residual distribution of the test dataset in the integrated stack model. The frequency of residuals around zero is high, especially for higher algae removal rates (60% -100%), indicating that the trained integrated stack model has obtained sufficient predictive information.
The learning curve (fig. 6) of the integrated stack model shows that three curves, including a training curve, a test curve and a cross validation curve, drop rapidly in the first 20 iterations, showing a fast initial learning rate and a fast improvement in model performance. The smaller spacing between the training curve and the cross-validation curve indicates that the model has no significant overfitting. The model has a better generalization capability as the model has a smaller difference in performance on the test curve than the training curve.
Wherein, the variable importance of the optimal model (figure 7) evaluates the importance of each input variable on the algae removal rate, and the order of the input variable is found that the air flotation time, the adding amount, the pH value, the turbidity value and the Zeta potential are respectively more than bEPS, dEPS, and the result shows that the influence of the air flotation time on the algae removal rate is most important.
Wherein the SHAP summary of the optimal model (fig. 8) further evaluates the marginal effect of each input variable on algae removal rate. The SHAP values of the test data set including air floatation time, dosing amount, bEPS, DO, algae density, pH, turbidity, zeta potential and dEPS numerical variables are obtained, and the SHAP values are changed within a certain range. The different distributions of similar variable values in horizontal positions indicate that the effect of an input variable is not only determined by its variable value, but also affected by other variables. The importance of this is demonstrated by the large distribution of air bearing time in the SHAP summary plot. Meanwhile, the low variable points of the air floatation time, the medicine adding amount and bEPS are mainly distributed on the left side, and the high variable points are mainly distributed on the right side. These results further indicate that longer air flotation times and higher dosing and bEPS facilitate algae removal.
Wherein, the dependence relationship between the most important input variables (air flotation time, dosing amount, bEPS and DO shown in four graphs respectively) and algae removal rate is visualized by a partial dependence graph (fig. 9). The PDP is drawn by changing the values of the variables of interest and keeping the other variables fixed. The curves of different colors in the graph represent the results obtained for the different models. As for the air floatation time, the dependence of the first 40-60s is obviously improved, and then the slope is gradually reduced, so that the air floatation time for removing algae can be obviously optimized and adjusted. For the dosage, it can be seen that before 20mg/L, the dependence is significantly improved with the increase of the dosage, and the improvement is slower with the subsequent re-dosing. The result shows that after a certain dosage of the medicine is reached, the dosage has little influence on flocculation-air floatation algae removal efficiency. For bEPS, a significant increase in dependence after 7.8mg/L can be seen, indicating that a high concentration of bEPS would be beneficial for the flocculation-air flotation process. For DO, the dependence is strongest at low and high concentrations.
While the invention has been described with reference to preferred embodiments, it is not intended to be limiting. Those skilled in the art will appreciate that various modifications and adaptations can be made without departing from the spirit and scope of the present invention. Accordingly, the scope of the invention is defined by the appended claims.

Claims (8)

1.一种基于机器学习的絮凝气浮除藻工艺优化方法,其特征在于,包括:1. A flocculation flotation algae removal process optimization method based on machine learning, characterized in that it includes: 获取絮凝气浮除藻处理历史数据,包括历史水质数据、絮凝气浮工艺参数与藻类处理数据,并对历史数据进行预处理后获得训练数据集;Obtain historical data on flocculation and flotation algae removal, including historical water quality data, flocculation and flotation process parameters, and algae treatment data, and obtain a training data set after preprocessing the historical data; 对训练数据集进行数据划分,按照预设比例确定训练集和测试集;Divide the training data set and determine the training set and test set according to the preset ratio; H2O AutoML平台利用上述训练集和测试集自动执行多种机器学习算法的训练,并根据预定的评价指标选择最优模型;The H 2 O AutoML platform automatically performs the training of multiple machine learning algorithms using the above training and test sets, and selects the optimal model based on the predetermined evaluation indicators; 对最优模型进行超参数优化,获得絮凝气浮除藻工艺优化控制模型;The hyperparameters of the optimal model were optimized to obtain the optimal control model of flocculation and flotation algae removal process; 将絮凝气浮除藻工艺优化控制模型部署到水处理设施的控制系统中,根据实时采集的水质数据动态预测絮凝气浮操作参数,并调整实际絮凝剂的投加量、絮凝条件和气浮条件;以及Deploy the flocculation and flotation algae removal process optimization control model into the control system of the water treatment facility, dynamically predict the flocculation and flotation operation parameters based on the real-time collected water quality data, and adjust the actual flocculant dosage, flocculation conditions and flotation conditions; and 根据实际的絮凝气浮操作参数的除藻处理,评估除藻效果。The algae removal effect was evaluated based on the actual algae removal treatment using the actual flocculation flotation operating parameters. 2.根据权利要求1所述的基于机器学习的絮凝气浮除藻工艺优化方法,其特征在于,所述絮凝气浮除藻处理历史数据,包括:2. The flocculation flotation algae removal process optimization method based on machine learning according to claim 1 is characterized in that the flocculation flotation algae removal process historical data includes: 历史水质数据:浊度、藻类种类和数量、pH值、水温、电导率;Historical water quality data: turbidity, algae species and quantity, pH, water temperature, conductivity; 絮凝气浮工艺参数:气浮时间、混凝剂投加量和搅拌强度;Flocculation flotation process parameters: flotation time, coagulant dosage and stirring intensity; 藻类处理数据:出水藻类种类和数量、浊度,以及藻类的理化指标,包括溶解性、结合性胞外聚合物和藻细胞Zeta电位。Algae treatment data: effluent algae species and quantity, turbidity, and physical and chemical indicators of algae, including soluble and bound extracellular polymers and algal cell Zeta potential. 3.根据权利要求1所述的基于机器学习的絮凝气浮除藻工艺优化方法,其特征在于,所述H2O AutoML平台上执行的机器学习算法包括GLM随机森林、DRF分布式随机森林、XRT极端随机森林、DeepLearning深度学习、XGboost、GBM梯度提升机和Stack集成堆叠模型,并基于预定的评价指标选择最优模型。3. The flocculation flotation algae removal process optimization method based on machine learning according to claim 1 is characterized in that the machine learning algorithms executed on the H2O AutoML platform include GLM random forest, DRF distributed random forest, XRT extreme random forest, DeepLearning deep learning, XGboost, GBM gradient boosting machine and Stack integrated stacking model, and the optimal model is selected based on predetermined evaluation indicators. 4.根据权利要求1所述的基于机器学习的絮凝气浮除藻工艺优化方法,其特征在于,所述H2O AutoML平台自动执行多种机器学习算法的训练过程配置包括:4. The method for optimizing the flocculation and flotation algae removal process based on machine learning according to claim 1, wherein the H2O AutoML platform automatically executes the training process configuration of multiple machine learning algorithms, including: 选择模型预测输出的变量,即设置response_column值;Select the variable of the model prediction output, that is, set the response_column value; 设置AutoML的最大运行时间为300s,即max_runtime_secs参数设置为300s;Set the maximum running time of AutoML to 300s, that is, the max_runtime_secs parameter is set to 300s; 调整AutoML在停止前探索的最大模型数量为10,即max_models参数设置为10;Adjust the maximum number of models that AutoML explores before stopping to 10, that is, the max_models parameter is set to 10; 为AutoML运行过程提供一个随机种子,以确保实验的重复性,种子数为1234,即seed参数设置为1234;Provide a random seed for the AutoML running process to ensure the repeatability of the experiment. The seed number is 1234, that is, the seed parameter is set to 1234; 各机器学习算法支持的分布函数选择为AUTO,即distribution参数设置为AUTO;The distribution function supported by each machine learning algorithm is selected as AUTO, that is, the distribution parameter is set to AUTO; 交叉验证折叠数设置为5,有助于评估模型的稳定性和泛化能力,即nfolds参数设置为5;The number of cross-validation folds is set to 5, which helps to evaluate the stability and generalization ability of the model, that is, the nfolds parameter is set to 5; 训练停止条件stopping_rounds参数设置为3,即在指定的轮数内模型性能没有提升,则训练过程停止;The training stopping condition, stopping_rounds, is set to 3, which means that if the model performance does not improve within the specified number of rounds, the training process stops. keep_cross_validation_models参数设置为Ture,保留交叉验证的模型。The keep_cross_validation_models parameter is set to True to keep the cross-validated model. 5.根据权利要求1所述的基于机器学习的絮凝气浮除藻工艺优化方法,其特征在于,所述对最优模型进行超参数优化,包括:5. The flocculation flotation algae removal process optimization method based on machine learning according to claim 1 is characterized in that the hyperparameter optimization of the optimal model comprises: 使用H2O AutoML平台的网格搜索或随机搜索技术自动调整模型参数。Automatically tune model parameters using the H2O AutoML platform’s grid search or random search techniques. 6.根据权利要求1所述的基于机器学习的絮凝气浮除藻工艺优化方法,其特征在于,所述方法还包括以下步骤:6. The method for optimizing the flocculation and flotation algae removal process based on machine learning according to claim 1, characterized in that the method further comprises the following steps: 使用H2O AutoML平台的模型解释功能,分析影响除藻效率的关键因素及其权重。The model interpretation function of the H2O AutoML platform was used to analyze the key factors and their weights that affect the algae removal efficiency. 7.根据权利要求6所述的基于机器学习的絮凝气浮除藻工艺优化方法,其特征在于,使用的模型解释方法包括残差分析、变量重要性分析、Shapley解释方法和部分依赖性曲线,用于解释H2O AutoML平台训练获得的最优模型的预测结果。7. The method for optimizing the flocculation and flotation algae removal process based on machine learning according to claim 6, characterized in that the model interpretation methods used include residual analysis, variable importance analysis, Shapley interpretation method and partial dependence curve, which are used to interpret the prediction results of the optimal model obtained by H2O AutoML platform training. 8.根据权利要求1-7中任意一项所述的基于机器学习的絮凝气浮除藻工艺优化方法,其特征在于,所述方法还包括以下步骤:8. The method for optimizing the flocculation and flotation algae removal process based on machine learning according to any one of claims 1 to 7, characterized in that the method further comprises the following steps: 按照预设的周期采集水质数据、实际絮凝气浮操作参数与除藻效果,动态训练并更新絮凝气浮除藻工艺优化控制模型,并重新部署更新后的絮凝气浮除藻工艺优化控制模型。Water quality data, actual flocculation flotation operation parameters and algae removal effects are collected according to the preset cycle, the flocculation flotation algae removal process optimization control model is dynamically trained and updated, and the updated flocculation flotation algae removal process optimization control model is redeployed.
CN202410915748.8A 2024-07-09 2024-07-09 Optimization method of flocculation flotation algae removal process based on machine learning Pending CN119025844A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410915748.8A CN119025844A (en) 2024-07-09 2024-07-09 Optimization method of flocculation flotation algae removal process based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410915748.8A CN119025844A (en) 2024-07-09 2024-07-09 Optimization method of flocculation flotation algae removal process based on machine learning

Publications (1)

Publication Number Publication Date
CN119025844A true CN119025844A (en) 2024-11-26

Family

ID=93537913

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410915748.8A Pending CN119025844A (en) 2024-07-09 2024-07-09 Optimization method of flocculation flotation algae removal process based on machine learning

Country Status (1)

Country Link
CN (1) CN119025844A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119397212A (en) * 2025-01-03 2025-02-07 西安理工大学 Prediction and optimization method of microalgae flotation recovery efficiency based on machine learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119397212A (en) * 2025-01-03 2025-02-07 西安理工大学 Prediction and optimization method of microalgae flotation recovery efficiency based on machine learning
CN119397212B (en) * 2025-01-03 2025-03-18 西安理工大学 Prediction and optimization method of microalgae flotation recovery efficiency based on machine learning

Similar Documents

Publication Publication Date Title
Mehrani et al. Application of a hybrid mechanistic/machine learning model for prediction of nitrous oxide (N2O) production in a nitrifying sequencing batch reactor
CN115375009B (en) Method for establishing intelligent monitoring linkage system for coagulation
CN119025844A (en) Optimization method of flocculation flotation algae removal process based on machine learning
CN116589078B (en) Intelligent sewage treatment control method and system based on data fusion
CN110991495A (en) Method, system, medium, and apparatus for predicting product quality in manufacturing process
Szeląg et al. An algorithm for selecting a machine learning method for predicting nitrous oxide emissions in municipal wastewater treatment plants
CN119626387B (en) An accurate prediction method for dual coagulants in water plants based on automatic machine learning
CN113793645A (en) Compost maturity prediction method based on machine learning model
Tomperi et al. Modelling effluent quality based on a real-time optical monitoring of the wastewater treatment process
CN119717744B (en) Intelligent smelting process parameter optimization management method and system
CN119470349A (en) A method, system, device and medium for detecting water turbidity using multi-light source scattering
Yusoff et al. Artificial intelligence in color classification of 3D-printed enhanced adsorbent in textile wastewater
CN119742001A (en) A method for predicting nitrous oxide release from sewage treatment plants based on long short-term memory neural network
CN119830051A (en) Organic fertilizer production parameter optimization control method for composite microbial agent
CN119742003A (en) A method for predicting soil acidification sensitivity based on regional big data
CN119225318A (en) Intelligent process optimization system and method for extracting plant calcium
CN119226976A (en) Sewage treatment effect evaluation method and system based on data analysis
CN118761509A (en) Method and system for predicting trend of forage element content based on animal husbandry
CN112651173A (en) Agricultural product quality nondestructive testing method based on cross-domain spectral information and generalizable system
CN119280904B (en) A method, system, medium and equipment for intelligent sludge discharge scheduling in sedimentation tanks of water supply plants
CN119668125B (en) A sewage phosphorus removal control system
CN118551209B (en) A method for evaluating the accuracy of measurement methods based on machine learning
CN119397212B (en) Prediction and optimization method of microalgae flotation recovery efficiency based on machine learning
CN119646763A (en) Method and device for predicting flocculation efficiency of microbial flocculants based on decision tree regression
CN116757074B (en) An intelligent optimization method for grazing strategies with different soil chemical properties

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination