Query optimization system, method and equipment based on Monte Carlo tree search and reinforcement learning
Technical Field
The invention relates to a query optimization system and a query optimization method, and belongs to the technical field of computers.
Background
Currently, query optimization algorithms fall into two broad categories, namely traditional query optimization algorithms and AI-based query optimization algorithms. The former typically uses cardinality/cost estimation techniques to screen query plans from several candidate plans, or uses predefined rules to optimize an initial query plan. This type of technology is currently most widely used, but is relatively backward. The latter then uses AI techniques for cardinality/cost estimation or directly uses machine learning techniques to generate an optimized query plan. Among such methods, the most advanced and representative are the NEO and BAO technologies.
The NEO technology is an end-to-end learning-based query optimization method, and can optimize the connection sequence, index selection and physical operator selection in a relational database. The method models a query plan generation problem into a Markov model, trains a deep learning model based on a tree convolution neural network by using a reinforcement learning algorithm, and gradually generates a query plan tree equivalent to an input SQL query language by using the output of the model. The input to the markov model (i.e., the "state" in reinforcement learning) is divided into two parts: query-level encoding and plan-level encoding, the encoding schemes are shown in fig. 1 and 2. See the paper "Neo: A Learned Query Optimizer", or descriptions of related papers, such as https:// blog.csdn.net/cuiris/article/details/111466631, https:// zhuanlan.zhuhu.com/p/80233443, etc.
The query level encoding contains the relationships of join operations (red vectors) for the various tables in the database, and predicates (blue vectors) used in the join. And the plan level code contains the structure of the whole query (i.e. the structure of the query plan tree) and the access mode information (index or direct access table) of each operator.
The NEO predicts the "value" of the query plan by tree convolutional neural networks, using the two codes as inputs. The value model (value model) is trained by using historical query plans stored in the experience pool and corresponding expenses, the MSE is used as a loss function, and the label is the corresponding minimum expense in different query plans of the same query in the experience pool.
As shown in fig. 3, the main flow of NEO is as follows:
1. an initial execution plan, called bootstrap, is generated using a "weak" optimizer, such as that of PostgreSQL.
2. Experience (Experience) represents a set of (execution plan- > time cost), which is a training sample of the value model.
3. Characterization (featurer) extracts two types of features: query features, execution plan features.
4. And training a value model by using the extracted features, and establishing a prediction model of the query plan- > time spending.
5. A query plan search is performed on the model, where a simple greedy search is used to select an execution plan.
The query plan searching part maintains a small top heap, elements in the heap are sub query plans, the priority is a value model, a sub plan with the minimum overhead of the top heap (only a root node of a query tree in the initial situation) is taken each time, all legal sub plans (sub nodes of the query plan tree) are expanded, the minimum overhead (query execution time used here) of a complete query plan which can be generated by each expanded sub plan is predicted, and the plan with the minimum overhead is selected and inserted into the small top heap. This process is repeated until the sub-plan is expanded to obtain a complete query plan.
It therefore has the following problems:
1. this technique can only support a portion of the limited database relational operations, specifically only selection, projection, isojoin, and aggregate queries.
2. The performance is unstable, and on part of query loads, the query plan optimized by the scheme has longer execution time and the problem of long tail distribution exists.
Disclosure of Invention
The invention aims to solve the problems of weak compatibility and poor stability of the existing NEO query optimization method.
The query optimization system based on Monte Carlo tree search and reinforcement learning adopts a framework which is the same as an NEO query optimization model and comprises a value model unit and a query plan search unit;
a value model unit: predicting the cost of the query plan by utilizing the characteristics corresponding to the query plan based on the value model;
the value model is a neural network model; the input of the value model is a vector tree used for representing a query plan needing to estimate the overhead, the topological structure of the vector tree is a binary tree structure, and the node codes are sequentially spliced according to the sequence traversal order of the tree; the node characteristics of the nodes consist of the codes of the node information; the node information comprises a node type, a table involved in operation, an index type involved in operation, a predicate involved in operation and a cardinality estimation result of the node;
query plan search unit: and (3) performing query plan search according to the prediction of the query plan- > time overhead by adopting a Monte Carlo tree search method, and generating an execution plan from a search space.
Further, the node information is encoded as follows:
the node type is one-hot coding; the table involved in the operation is one-hot encoding; the index type involved in the operation is one-hot encoding; the predicate involved in the operation is one-hot encoding; the cardinality estimation result of the node is an integer value.
Further, a neural network model as a value model adopts a structure of a multilayer stacked tree convolution layer.
Further, when the query plan searching unit searches by adopting a Monte Carlo tree searching method, the process of generating a better query plan is modeled as a Markov process, the Monte Carlo tree searching is used for expanding the sub-query plans, vector tree codes corresponding to the sub-query plans with consistent semantics are used as tree nodes in the Monte Carlo tree searching, a value model is used for evaluating the cost of the sub-query plans during each searching, and then the Monte Carlo tree searching is realized.
Further, the process of searching by the query plan search unit by adopting the method of Monte Carlo tree search comprises the following steps:
(1) selecting: starting from a root node R, recursively selecting an optimal child node by using a UCT algorithm until a leaf node L is reached;
(2) expanding: if L is not a termination node, creating one or more child nodes that are not inconsistent with the semantics of the query, and selecting one of the child nodes C;
(3) simulation: from C, randomly selecting a non-contradictory child node until the current query plan is completely expressed;
(4) and (3) back propagation: predicting the cost of the simulated complete query plan by using a value model, and reversely updating the simulation times and reward rewarded of all parent nodes of the leaf node from the leaf node;
(5) and (4) circularly executing the steps (1) to (4), and continuously expanding the sub-query plans until a complete query plan is generated.
The query optimization method based on Monte Carlo tree search and reinforcement learning comprises the following steps:
the value model is used as a query plan- > time overhead prediction model; performing query plan search on a prediction model of query plan- > time overhead;
the value model is a neural network model; the input of the value model is a vector tree used for representing a query plan needing to estimate the overhead, the topological structure of the vector tree is a binary tree structure, and the node codes are sequentially spliced according to the sequence traversal order of the tree; the node characteristics of the nodes consist of codes of node information; the node information comprises a node type, a table related to operation, an index type related to operation, a predicate related to operation and a cardinality estimation result of the node;
the query plan search is carried out by adopting a Monte Carlo tree search method, the query plan search is carried out according to the prediction of the query plan- > time expense, and an execution plan is generated from a search space.
Further, the training process of the value model comprises the following steps:
s1, constructing an experience set by taking the execution plan and the time overhead corresponding to the query as experience; taking an execution plan and time expenditure corresponding to each experience in the experience combination as a training sample of the value model;
s2, characterizing the training samples, namely extracting features, wherein the features comprise query features and execution plan features;
s3, training a value model by using the extracted features, wherein the trained value model is an established query plan- > time overhead prediction model.
The query optimization device based on Monte Carlo tree search and reinforcement learning is a storage medium, at least one instruction is stored in the storage medium, and the at least one instruction is loaded and executed by a processor to realize the query optimization method based on Monte Carlo tree search and reinforcement learning.
The query optimization device based on Monte Carlo tree search and reinforcement learning comprises a processor and a memory, wherein at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to realize the query optimization method based on Monte Carlo tree search and reinforcement learning.
Has the advantages that:
the invention realizes the compatibility of all relation operations in a relation database by improving the Value Model used in the NEO and improving the greedy-based Plan Search method in the NEO, and has higher and more stable performance compared with the NEO.
Drawings
FIG. 1 is a schematic illustration of NEO query level encoding;
FIG. 2 is a schematic illustration of NEO plan level encoding;
FIG. 3 is a schematic diagram of a NEO system framework;
FIG. 4 is an example of query plan vector tree encoding according to the present invention;
FIG. 5 is an example of a value model structure of the present invention;
FIG. 6 is a schematic diagram of a Monte Carlo tree search process.
Detailed Description
The first embodiment is as follows:
the embodiment is a query optimization method based on Monte Carlo tree search and reinforcement learning, and the query optimization process is realized by a query optimization system based on Monte Carlo tree search and reinforcement learning. The invention is a query optimization method based on cost and AI together with NEO, so the framework of the system (the query optimization system based on Monte Carlo tree search and reinforcement learning) of the invention is also shown in FIG. 3, and both the cost of the query plan is predicted by using a pre-trained Value Model (namely Value Model) and the search of the query plan is guided. The difference between them is the network structure of the value model and the query Plan Search (Plan Search) method; namely: the system of the invention has the same framework as the NEO query optimization model, and comprises a value model unit and a query plan search unit;
a value model unit: predicting the cost of the query plan by utilizing the characteristics corresponding to the query plan based on the value model;
the value model is a neural network model; the input of the value model is a vector tree used for representing a query plan needing to estimate the overhead, the topological structure of the vector tree is a binary tree structure, and the node codes are sequentially spliced according to the sequence traversal order of the tree; the node characteristics of the nodes consist of the codes of the node information; the node information comprises a node type, a table related to operation, an index type related to operation, a predicate related to operation and a cardinality estimation result of the node;
query plan search unit: and (3) performing query plan search according to the prediction of the query plan- > time overhead by adopting a Monte Carlo tree search method, and generating an execution plan from a search space.
The query optimization method based on Monte Carlo tree search and reinforcement learning comprises the following steps:
1. an initial execution plan, called bootstrap, is generated using a "weak" optimizer, such as that of PostgreSQL.
2. Experience (Experience) represents a set of (execution plan- > time cost), which is a training sample of the value model.
3. Characterization (featurer) extraction features: query features, execution plan features.
4. The neural network provided by the invention is used as a value model, the extracted characteristics are used for training the value model, and a prediction model of query plan- > time spending is established.
5. The method is different from NEO, and adopts a Monte Carlo tree search method to generate an execution plan from a search space.
In the aspect of a value model, as the query cardinality and the query cost are generally in a linear relation, and the accurate query cardinality estimation result can effectively improve the prediction precision of the query cost, the cardinality estimation strategy based on data distribution is introduced into the value model as a characteristic; that is, when estimating the overhead of a given query plan, the most effective Naru scheme in the cardinality estimation technique is used to estimate the cardinality of each node in the query plan. The estimation results are encoded into the vector tree as input to the value model.
The input of the value model is a vector tree, which is used to represent the query plan for which the cost needs to be estimated. As shown in FIG. 4, the value model is input as follows:
topological structure: the node codes are sequentially spliced according to the hierarchical traversal order of the tree;
node characteristics: a one-dimensional vector of 1; the node characteristics consist of the coding of the following information;
one-hot coding of node type (supporting all physical relationship operations, including null nodes);
table of operations involved: one-hot coding;
the type of index involved in the operation: one-hot coding;
operation-related predicates: one-hot coding;
cardinality estimation of nodes results: an integer value;
through the coding, the query level coding and the plan level coding in the NEO are represented in a unified and combined mode, and the extraction of the characteristics of a machine learning model is facilitated. Due to the change of input, the network architecture of the value model is also different from the NEO, a multi-layer stacked tree convolution layer architecture is adopted, and residual concatenation and batch normalization-full concatenation layer (BN) are introduced for stable training, as shown in fig. 5.
Finally, the prediction objective of the value model is also different from the NEO, the prediction objective of the NEO is the minimum cost of the complete query plan which can be generated by each expanded sub-plan, but the cost of the input query plan is only simply predicted by the method.
In terms of query Plan search (Plan search), the present invention models the process of generating a better query Plan as a Markov process, expands the sub-query Plan using Monte Carlo Tree Search (MCTS), and employs the value model as a heuristic function of Monte Carlo Tree search. Note: during the search, a query optimizer using a conventional database (e.g., PostgreSQL) determines the search space of each time, i.e., which expandable child nodes are searched for, so that the finally generated query plan can be semantically consistent with the initial query plan. All current cost-based query optimizers support this function and are not described in detail. The vector tree codes corresponding to the sub-query plans with consistent semantics are used as tree nodes in Monte Carlo tree search, and a value model is used for evaluating the expenditure of the sub-query plans in each search.
As shown in fig. 6, the monte carlo tree search process is as follows:
(1) selection (Selection): starting from the root node R, the optimal child node is recursively selected using the UCT algorithm until the leaf node L is reached.
(2) Extension (Expansion): if L is not a termination node (i.e., the entire query plan cannot be fully expressed), then one or more child nodes that do not contradict the semantics of the query are created based on the rules, and one of the child nodes C is selected.
(3) Simulation (Simulation): starting from C, a non-contradictory child node is randomly selected until the current query plan is completely expressed, using the same rules as in step (2) above.
(4) Back propagation (Backpropagation): the cost of the simulated complete query plan is predicted using the value model, and the simulation times and reward rewards (i.e., average cost) for all its parent nodes are updated in reverse starting from this leaf node.
(5) And (4) executing the steps (1) to (4) in a circulating way, and continuously expanding the sub-query plans until a complete query plan is generated.
The second embodiment is as follows:
the embodiment is a query optimization device based on monte carlo tree search and reinforcement learning, which is a storage medium having at least one instruction stored therein, and the at least one instruction is loaded and executed by a processor to implement the query optimization method based on monte carlo tree search and reinforcement learning according to one of claims 6 to 8.
The storage medium in this embodiment includes, but is not limited to, a usb disk, a hard disk, and the like.
The third concrete implementation mode:
the embodiment is a query optimization device based on monte carlo tree search and reinforcement learning, the device comprises a processor and a memory, the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor to realize the query optimization method based on monte carlo tree search and reinforcement learning according to one of claims 6 to 8.
The devices in this embodiment include, but are not limited to, mobile terminals, PCs, servers, workstations, and the like.
The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.