CN105630789B - A kind of inquiry plan method for transformation and device - Google Patents
A kind of inquiry plan method for transformation and device Download PDFInfo
- Publication number
- CN105630789B CN105630789B CN201410588240.8A CN201410588240A CN105630789B CN 105630789 B CN105630789 B CN 105630789B CN 201410588240 A CN201410588240 A CN 201410588240A CN 105630789 B CN105630789 B CN 105630789B
- Authority
- CN
- China
- Prior art keywords
- query
- operator
- partition
- query operator
- attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明实施例公开了一种查询计划转化方法及装置,涉及计算机领域,可以更大程度的减少构成物理查询计划的物理查询任务的数量。具体方案为:从逻辑查询计划中提取第一查询操作符和第二查询操作符;若第一查询操作符的分区属性为第二查询操作符的分区属性的前缀,则改写逻辑查询计划中第二查询操作符的分区属性,以使得第二查询操作符的分区属性与第一查询操作符的分区属性相同;从逻辑查询计划中删除第二查询操作符的分区操作符,并根据第二查询操作符、第一查询操作符以及第一查询操作符的分区操作符生成一个物理查询任务,以构成物理查询计划,本发明用于逻辑查询计划转化为物理查询计划的过程中。
The embodiment of the present invention discloses a query plan conversion method and device, which relate to the field of computers and can reduce the number of physical query tasks constituting a physical query plan to a greater extent. The specific solution is: extract the first query operator and the second query operator from the logical query plan; if the partition attribute of the first query operator is the prefix of the partition attribute of the second query operator, rewrite the first query operator in the logical query plan. The partition attribute of the second query operator, so that the partition attribute of the second query operator is the same as the partition attribute of the first query operator; the partition operator of the second query operator is deleted from the logical query plan, and the second query operator The operator, the first query operator and the partition operator of the first query operator generate a physical query task to form a physical query plan, and the present invention is used in the process of converting a logical query plan into a physical query plan.
Description
技术领域technical field
本发明涉及计算机领域,尤其涉及一种查询计划转化方法及装置。The present invention relates to the field of computers, in particular to a query plan conversion method and device.
背景技术Background technique
逻辑查询计划是一种由查询语句解析而来,以查询语句中的查询操作符为节点的树状查询结构,逻辑查询计划中的每一个查询操作符都具有分区属性和分区操作符,分区属性是该查询操作符所需要操作的所有数据共有的一个属性值。分区操作符位于该查询操作符与另一个查询操作符之间,用以间隔两个查询操作符。逻辑查询计划可以转化为物理查询计划。A logical query plan is a tree-like query structure that is parsed from a query statement and takes the query operators in the query statement as nodes. Each query operator in the logical query plan has a partition attribute and a partition operator and partition attribute. Is an attribute value common to all data that the query operator needs to operate on. The partition operator is located between this query operator and another query operator to separate two query operators. Logical query plans can be transformed into physical query plans.
逻辑查询计划转化为物理查询计划的具体过程为:以一个分区操作符作为开始生成一个物理查询任务的标识,将该分区操作符与下一个分区操作符之间的查询操作符生成一个物理查询任务,以下一个分区操作符作为生成下一个物理查询任务的标识,开始生成下一个物理查询任务。物理查询计划是一种以这些生成的多个物理查询任务为节点的树状查询结构。The specific process of converting a logical query plan into a physical query plan is as follows: starting with a partition operator as an identifier to generate a physical query task, and generating a physical query task from the query operator between the partition operator and the next partition operator , the next partition operator is used as the identifier for generating the next physical query task, and the next physical query task is generated. The physical query plan is a tree-like query structure with the generated multiple physical query tasks as nodes.
当构成物理查询计划所需的物理查询任务数量变少时,执行物理查询任务所需进行的读数据操作和写数据操作也会变少,由于读数据操作和写数据操作而引入的时间开销也就会变小,这样执行物理查询计划所用的时间就会变小。When the number of physical query tasks required to form a physical query plan decreases, the read data operations and write data operations required to execute the physical query tasks will also decrease, and the time overhead introduced by the read data operations and write data operations will also be reduced. will be smaller, so the time taken to execute the physical query plan will be smaller.
现有技术为了达到减少构成物理查询计划所需的物理查询任务的数量的目的,查找逻辑查询计划中间隔一个分区操作符且分区属性完全相同的两个查询操作符,删除上述两个查询操作符之间的分区操作符,使得在逻辑查询计划转化为物理查询计划的过程中,将上述两个查询操作符由原来生成为两个不同的物理查询任务变为生成一个物理查询任务,以此来减少物理查询任务的数量。In the prior art, in order to achieve the purpose of reducing the number of physical query tasks required to form a physical query plan, two query operators that are separated by one partition operator and have identical partition attributes in the logical query plan are searched, and the above two query operators are deleted. The partition operator between the two, so that in the process of converting the logical query plan into the physical query plan, the above two query operators are changed from two different physical query tasks to one physical query task, so as to Reduce the number of physical query tasks.
但是该技术要求两个查询操作符具有完全相同的分区属性,且具有直接前驱后继关系,这样会导致该技术的应用场景很受限制。However, this technology requires two query operators to have exactly the same partition properties and a direct predecessor-successor relationship, which limits the application scenarios of this technology.
发明内容SUMMARY OF THE INVENTION
本发明的实施例提供一种查询计划转化方法及装置,可以更大程度的减少构成物理查询计划的物理查询任务的数量。Embodiments of the present invention provide a query plan conversion method and device, which can reduce the number of physical query tasks constituting a physical query plan to a greater extent.
为达到上述目的,本发明的实施例采用如下技术方案:To achieve the above object, the embodiments of the present invention adopt the following technical solutions:
本发明实施例的第一方面,提供一种查询计划转化方法,包括:A first aspect of the embodiments of the present invention provides a query plan conversion method, including:
从逻辑查询计划中提取第一查询操作符和第二查询操作符,所述第一查询操作符为所述第二查询操作符的前驱操作符;extracting a first query operator and a second query operator from the logical query plan, where the first query operator is a precursor operator of the second query operator;
若所述第一查询操作符的分区属性为所述第二查询操作符的分区属性的前缀,则改写所述逻辑查询计划中所述第二查询操作符的分区属性,以使得所述第二查询操作符的分区属性与所述第一查询操作符的分区属性相同;If the partition attribute of the first query operator is a prefix of the partition attribute of the second query operator, rewrite the partition attribute of the second query operator in the logical query plan, so that the second query operator The partition attribute of the query operator is the same as the partition attribute of the first query operator;
从所述逻辑查询计划中删除所述第二查询操作符的分区操作符,并根据所述第二查询操作符、所述第一查询操作符以及所述第一查询操作符的分区操作符生成一个物理查询任务,以构成物理查询计划。The partition operator of the second query operator is deleted from the logical query plan, and is generated according to the second query operator, the first query operator, and the partition operator of the first query operator A physical query task to form the physical query plan.
结合第一方面,在第一种可能的实现方式中,所述若所述第一查询操作符的分区属性为所述第二查询操作符的分区属性的前缀,则改写所述逻辑查询计划中所述第二查询操作符的分区属性,以使得所述第二查询操作符的分区属性与所述第一查询操作符的分区属性相同,包括:With reference to the first aspect, in a first possible implementation manner, if the partition attribute of the first query operator is a prefix of the partition attribute of the second query operator, rewrite the partition attribute in the logical query plan The partition attribute of the second query operator, so that the partition attribute of the second query operator is the same as the partition attribute of the first query operator, including:
若所述第一查询操作符的分区属性为所述第二查询操作符的分区属性的前缀,且所述第一查询操作符与所述第二查询操作符之间间隔有第三查询操作符,所述第三查询操作符能够使用广播查询算法实现,则将所述第三查询操作符改写为广播查询操作符,所述广播查询操作符没有分区操作符;If the partition attribute of the first query operator is a prefix of the partition attribute of the second query operator, and a third query operator is spaced between the first query operator and the second query operator , the third query operator can be implemented using a broadcast query algorithm, then the third query operator is rewritten as a broadcast query operator, and the broadcast query operator does not have a partition operator;
改写所述逻辑查询计划中所述第二查询操作符的分区属性,以使得所述第二查询操作符的分区属性与所述第一查询操作符的分区属性相同;rewriting the partition attribute of the second query operator in the logical query plan, so that the partition attribute of the second query operator is the same as the partition attribute of the first query operator;
从所述逻辑查询计划中删除所述第二查询操作符的分区操作符,并根据所述第二查询操作符、所述第一查询操作符以及所述第一查询操作符的分区操作符生成一个物理查询任务,以构成所述物理查询计划,包括:The partition operator of the second query operator is deleted from the logical query plan, and is generated according to the second query operator, the first query operator, and the partition operator of the first query operator A physical query task to form the physical query plan, including:
从所述逻辑查询计划中删除所述第二查询操作符的分区操作符,并根据所述第二查询操作符、所述广播查询操作符、所述第一查询操作符以及所述第一查询操作符的分区操作符生成一个物理查询任务,以构成所述物理查询计划;The partition operator of the second query operator is removed from the logical query plan, and the second query operator, the broadcast query operator, the first query operator, and the first query The partition operator of the operator generates a physical query task to form the physical query plan;
其中,所述第一查询操作符为所述第三查询操作符的前驱操作符,且所述第三查询操作符为所述第二查询操作符的前驱操作符。The first query operator is a predecessor operator of the third query operator, and the third query operator is a predecessor operator of the second query operator.
结合第一方面,在第二种可能的实现方式中,若所述第一查询操作符的分区属性与所述第二查询操作符的分区属性相同,且所述第一查询操作符与所述第二查询操作符之间间隔有第三查询操作符,所述第三查询操作符能够使用广播查询算法实现,则将所述第三查询操作符改写为广播查询操作符,所述广播查询操作符没有分区操作符;With reference to the first aspect, in a second possible implementation manner, if the partition attribute of the first query operator is the same as the partition attribute of the second query operator, and the first query operator is the same as the A third query operator is spaced between the second query operators, and the third query operator can be implemented by using a broadcast query algorithm, then the third query operator is rewritten as a broadcast query operator, and the broadcast query operation operator has no partition operator;
从所述逻辑查询计划中删除所述第二查询操作符的分区操作符,并根据所述第二查询操作符、所述广播查询操作符、所述第一查询操作符以及所述第一查询操作符的分区操作符生成一个物理查询任务,以构成所述物理查询计划;The partition operator of the second query operator is removed from the logical query plan, and the second query operator, the broadcast query operator, the first query operator, and the first query The partition operator of the operator generates a physical query task to form the physical query plan;
其中,所述第一查询操作符为所述第三查询操作符的前驱操作符,且所述第三查询操作符为所述第二查询操作符的前驱操作符。The first query operator is a predecessor operator of the third query operator, and the third query operator is a predecessor operator of the second query operator.
结合第一方面、第一种可能的实现方式和第二种可能实现方式,在第三种可能的实现方式中,在所述从所述逻辑查询计划中删除所述第二查询操作符的分区操作符之前,所述方法还包括:With reference to the first aspect, the first possible implementation manner, and the second possible implementation manner, in a third possible implementation manner, in the deletion of the partition of the second query operator from the logical query plan before the operator, the method further includes:
改写所述第一查询操作符的排序属性,以使得所述第一查询操作符的排序属性与所述第二查询操作符的排序属性相同;rewriting the sorting attribute of the first query operator so that the sorting attribute of the first query operator is the same as the sorting attribute of the second query operator;
其中,改写前的所述第一查询操作符的排序属性与第一查询操作符的分区属性相同,所述第二查询操作符的排序属性与改写前的所述第二查询操作符的分区属性相同;Wherein, the sorting attribute of the first query operator before rewriting is the same as the partition attribute of the first query operator, and the sorting attribute of the second query operator is the same as the partitioning attribute of the second query operator before the rewriting same;
所述排序属性用于对所述逻辑查询计划的所述查询操作符所操作的数据表中的数据进行分区排序。The sorting attribute is used to perform partition sorting on the data in the data table operated by the query operator of the logical query plan.
结合第一方面、第一种可能的实现方式和第二种可能实现方式,在第四种可能的实现方式中,所述根据所述第二查询操作符、所述第一查询操作符以及所述第一查询操作符的分区操作符生成一个物理查询任务,以构成所述物理查询计划,包括:With reference to the first aspect, the first possible implementation manner, and the second possible implementation manner, in a fourth possible implementation manner, the The partition operator of the first query operator generates a physical query task to form the physical query plan, including:
采用任务流关联性优化JFC技术,根据所述第二查询操作符、所述第一查询操作符以及所述第一查询操作符的分区操作符生成一个物理查询任务,以构成所述物理查询计划。Using task flow associative optimization JFC technology, a physical query task is generated according to the second query operator, the first query operator and the partition operator of the first query operator to form the physical query plan .
本发明实施例的第二方面,还提供一种查询计划转化装置,包括:The second aspect of the embodiment of the present invention also provides a query plan conversion device, including:
提取单元,用于从逻辑查询计划中提取第一查询操作符和第二查询操作符,所述第一查询操作符为所述第二查询操作符的前驱操作符;an extraction unit, configured to extract a first query operator and a second query operator from the logical query plan, where the first query operator is a precursor operator of the second query operator;
第一改写单元,用于若所述提取单元提取的所述第一查询操作符的分区属性为所述提取单元提取的所述第二查询操作符的分区属性的前缀,则改写所述逻辑查询计划中所述第二查询操作符的分区属性,以使得所述第二查询操作符的分区属性与所述第一查询操作符的分区属性相同;a first rewriting unit, configured to rewrite the logical query if the partition attribute of the first query operator extracted by the extracting unit is a prefix of the partition attribute of the second query operator extracted by the extracting unit the partition attribute of the second query operator in the plan, so that the partition attribute of the second query operator is the same as the partition attribute of the first query operator;
删除单元,用于在所述第一改写单元改写所述逻辑查询计划中所述第二查询操作符的分区属性后,从所述逻辑查询计划中删除所述第二查询操作符的分区操作符;a deletion unit, configured to delete the partition operator of the second query operator from the logical query plan after the first rewriting unit rewrites the partition attribute of the second query operator in the logical query plan ;
生成单元,用于在所述删除单元删除所述第二查询操作符的分区操作符后,根据所述第二查询操作符、所述第一查询操作符以及所述第一查询操作符的分区操作符生成一个物理查询任务,以构成物理查询计划。a generating unit, configured to, after the deleting unit deletes the partition operator of the second query operator, generate the partition according to the second query operator, the first query operator and the partition of the first query operator The operator generates a physical query task to form the physical query plan.
结合第二方面,在第一种可能的实现方式中,所述第一改写单元,包括:With reference to the second aspect, in a first possible implementation manner, the first rewriting unit includes:
第一改写模块,用于若所述第一查询操作符的分区属性为所述第二查询操作符的分区属性的前缀,且所述第一查询操作符与所述第二查询操作符之间间隔有第三查询操作符,所述第三查询操作符能够使用广播查询算法实现,则将所述第三查询操作符改写为广播查询操作符,所述广播查询操作符没有分区操作符;The first rewriting module is used for if the partition attribute of the first query operator is a prefix of the partition attribute of the second query operator, and the relationship between the first query operator and the second query operator is There is a third query operator in the interval, and the third query operator can be implemented using a broadcast query algorithm, then the third query operator is rewritten as a broadcast query operator, and the broadcast query operator does not have a partition operator;
第二改写模块,用于改写所述逻辑查询计划中所述第二查询操作符的分区属性,以使得所述第二查询操作符的分区属性与所述第一查询操作符的分区属性相同;a second rewriting module, configured to rewrite the partition attribute of the second query operator in the logical query plan, so that the partition attribute of the second query operator is the same as the partition attribute of the first query operator;
所述生成单元,还用于根据所述第二查询操作符、所述广播查询操作符、所述第一查询操作符以及所述第一查询操作符的分区操作符生成一个物理查询任务,以构成所述物理查询计划;The generating unit is further configured to generate a physical query task according to the second query operator, the broadcast query operator, the first query operator and the partition operator of the first query operator, to constitute the physical query plan;
其中,所述第一查询操作符为所述第三查询操作符的前驱操作符,且所述第三查询操作符为所述第二查询操作符的前驱操作符。The first query operator is a predecessor operator of the third query operator, and the third query operator is a predecessor operator of the second query operator.
结合第二方面,在第二种可能的实现方式中,第二改写单元,用于若所述提取单元提取的所述第一查询操作符的分区属性与所述提取单元提取的所述第二查询操作符的分区属性相同,且所述第一查询操作符与所述第二查询操作符之间间隔有第三查询操作符,所述第三查询操作符能够使用广播查询算法实现,则将所述第三查询操作符改写为广播查询操作符,所述广播查询操作符没有分区操作符;With reference to the second aspect, in a second possible implementation manner, a second rewriting unit is configured to, if the partition attribute of the first query operator extracted by the extracting unit is the same as the partition attribute of the second query operator extracted by the extracting unit The partition attributes of the query operators are the same, and a third query operator is spaced between the first query operator and the second query operator, and the third query operator can be implemented using a broadcast query algorithm, then the The third query operator is rewritten as a broadcast query operator, and the broadcast query operator has no partition operator;
所述删除单元,还用于在所述第二改写单元将所述第三查询操作符改写为所述广播查询操作符后,从所述逻辑查询计划中删除所述第二查询操作符的分区操作符;The deleting unit is further configured to delete the partition of the second query operator from the logical query plan after the second rewriting unit rewrites the third query operator as the broadcast query operator operator;
所述生成单元,还用于在所述删除单元删除所述第二查询操作符的分区操作符后,根据所述第二查询操作符、所述广播查询操作符、所述第一查询操作符以及所述第一查询操作符的分区操作符生成一个物理查询任务,以构成所述物理查询计划;The generating unit is further configured to, after the deleting unit deletes the partition operator of the second query operator, according to the second query operator, the broadcast query operator, the first query operator and the partition operator of the first query operator generates a physical query task to form the physical query plan;
其中,所述第一查询操作符为所述第三查询操作符的前驱操作符,且所述第三查询操作符为所述第二查询操作符的前驱操作符。The first query operator is a predecessor operator of the third query operator, and the third query operator is a predecessor operator of the second query operator.
结合第二方面、第一种可能的实现方式和第二种可能的实现方式,在第三种可能的实现方式中,第三改写单元,用于在所述删除单元从所述逻辑查询计划中删除所述第二查询操作符的分区操作符之前,改写所述第一查询操作符的排序属性,以使得所述第一查询操作符的排序属性与所述第二查询操作符的排序属性相同;In combination with the second aspect, the first possible implementation manner, and the second possible implementation manner, in a third possible implementation manner, a third rewriting unit is configured to extract data from the logical query plan in the deletion unit Before deleting the partition operator of the second query operator, rewrite the sorting attribute of the first query operator so that the sorting attribute of the first query operator is the same as the sorting attribute of the second query operator ;
其中,改写前的所述第一查询操作符的排序属性与第一查询操作符的分区属性相同,所述第二查询操作符的排序属性与改写前的所述第二查询操作符的分区属性相同;Wherein, the sorting attribute of the first query operator before rewriting is the same as the partition attribute of the first query operator, and the sorting attribute of the second query operator is the same as the partitioning attribute of the second query operator before the rewriting same;
所述排序属性用于对所述逻辑查询计划的所述查询操作符所操作的数据表中的数据进行分区排序。The sorting attribute is used to perform partition sorting on the data in the data table operated by the query operator of the logical query plan.
结合第二方面、第一种可能的实现方式和第二种可能的实现方式,在第四种可能的实现方式中,所述生成单元,具体用于采用任务流关联性优化JFC技术,根据所述第二查询操作符、所述第一查询操作符以及所述第一查询操作符的分区操作符生成一个物理查询任务,以构成所述物理查询计划。In combination with the second aspect, the first possible implementation manner, and the second possible implementation manner, in the fourth possible implementation manner, the generating unit is specifically configured to optimize the JFC technology by using the task flow correlation, according to the The second query operator, the first query operator and the partition operator of the first query operator generate a physical query task to form the physical query plan.
本发明实施例提供的查询计划转化方法及装置,只要第一查询操作符的分区属性为第二查询操作符的分区属性的前缀,便可以通过改写逻辑查询计划中第二查询操作符的分区属性,以便于可以在第二查询操作符的分区属性与第一查询操作符的分区属性相同时,删除第二查询操作符的分区操作符,以根据第一查询操作符和第二查询操作符生成一个物理查询任务,减少构成物理查询计划的物理查询任务的数量。In the query plan conversion method and device provided by the embodiments of the present invention, as long as the partition attribute of the first query operator is the prefix of the partition attribute of the second query operator, the partition attribute of the second query operator in the logical query plan can be rewritten by rewriting the partition attribute of the second query operator. , so that when the partition attribute of the second query operator is the same as that of the first query operator, the partition operator of the second query operator can be deleted to generate A physical query task, reducing the number of physical query tasks that make up the physical query plan.
与现有技术中,仅可以在第一查询操作符的分区属性与第二查询操作符的分区属性完全相同时,才能够删除第二查询操作符的分区操作符相比,当第一查询操作符的分区属性为第二查询操作符的分区属性的前缀时,改写第二查询操作符的分区属性可以获得更多的分区属性完全相同的满足前驱-后继关系的查询操作符对(如第一查询操作符和第二查询操作符),进而可以更大程度的减少构成物理查询计划的物理查询任务的数量。Compared with the prior art, the partition operator of the second query operator can be deleted only when the partition attribute of the first query operator is exactly the same as that of the second query operator. When the partition attribute of the second query operator is the prefix of the partition attribute of the second query operator, rewriting the partition attribute of the second query operator can obtain more query operator pairs that satisfy the predecessor-successor relationship (such as the first query operator and the second query operator), which can further reduce the number of physical query tasks that constitute the physical query plan.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.
图1为本发明实施例中的一种查询计划转化方法的流程示意图;1 is a schematic flowchart of a query plan conversion method in an embodiment of the present invention;
图2为本发明实施例中的一种逻辑查询计划的实例示意图;2 is a schematic diagram of an example of a logical query plan in an embodiment of the present invention;
图3为本发明实施例中的另一种查询计划转化方法的流程示意图;3 is a schematic flowchart of another query plan conversion method in an embodiment of the present invention;
图4为本发明实施例中的另一种逻辑查询计划的实例示意图;4 is a schematic diagram of an example of another logical query plan in an embodiment of the present invention;
图5为本发明实施例中的另一种逻辑查询计划的实例示意图;5 is a schematic diagram of an example of another logical query plan in an embodiment of the present invention;
图6为本发明实施例中的另一种逻辑查询计划的实例示意图;6 is a schematic diagram of an example of another logical query plan in an embodiment of the present invention;
图7为本发明实施例中的另一种查询计划转化方法的流程示意图;7 is a schematic flowchart of another query plan conversion method in an embodiment of the present invention;
图8为本发明实施例中的一种查询计划转化装置的结构示意图;8 is a schematic structural diagram of a query plan conversion device in an embodiment of the present invention;
图9为本发明实施例中的另一种查询计划转化装置的结构示意图;9 is a schematic structural diagram of another query plan conversion device in an embodiment of the present invention;
图10为本发明实施例中的另一种查询计划转化装置的结构示意图;10 is a schematic structural diagram of another query plan conversion device in an embodiment of the present invention;
图11为本发明实施例中的另一种查询计划转化装置的结构示意图。FIG. 11 is a schematic structural diagram of another query plan conversion apparatus in an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
另外,本文中术语“系统”和“网络”在本文中常被可互换使用。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。Additionally, the terms "system" and "network" are often used interchangeably herein. The term "and/or" in this article is only an association relationship to describe the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, and A and B exist independently B these three cases. In addition, the character "/" in this document generally indicates that the related objects are an "or" relationship.
实施例1Example 1
本发明实施例提供一种查询计划转化方法,可以应用于将逻辑查询计划转化为物理查询计划的过程中,如图1所示,该查询计划转化方法包括:An embodiment of the present invention provides a query plan conversion method, which can be applied to the process of converting a logical query plan into a physical query plan. As shown in FIG. 1 , the query plan conversion method includes:
S101、查询计划转化装置从逻辑查询计划中提取第一查询操作符和第二查询操作符,第一查询操作符为第二查询操作符的前驱操作符。S101. The query plan conversion apparatus extracts a first query operator and a second query operator from the logical query plan, where the first query operator is a precursor operator of the second query operator.
其中,查询计划转化装置从逻辑查询计划中提取第一查询操作符和第二查询操作符的方法可以包括:查询计划转化装置从逻辑查询计划中查找存在前驱-后继关系的两个查询操作符,并分别确定存在前驱-后继关系的两个查询操作符的分区属性;当存在前驱-后继关系的两个查询操作符中的前驱查询操作符的分区属性为后继查询操作符的分区属性的前缀,或者前驱查询操作符的分区属性与后继查询操作符的分区属性相同时,则可以将这存在前驱-后继关系的两个查询操作符确定为第一查询操作符和第二查询操作符。具体的,存在前驱-后继关系的两个查询操作符中,前驱查询操作符作为第一查询操作符,后继查询操作符作为第二查询操作符。The method for the query plan transformation device to extract the first query operator and the second query operator from the logical query plan may include: the query plan transformation device searches the logical query plan for two query operators that have a predecessor-successor relationship, and respectively determine the partition attributes of the two query operators with the predecessor-successor relationship; when the partition attribute of the predecessor query operator in the two query operators with the predecessor-successor relationship is the prefix of the partition attribute of the successor query operator, Or when the partition attribute of the predecessor query operator is the same as the partition attribute of the successor query operator, the two query operators with a predecessor-successor relationship can be determined as the first query operator and the second query operator. Specifically, among the two query operators that have a predecessor-successor relationship, the predecessor query operator is used as the first query operator, and the successor query operator is used as the second query operator.
需要说明的是,在本发明实施例中,逻辑查询计划中的两个查询操作符存在前驱-后继关系可以包括:这两个查询操作符存在直接前驱-后继关系。It should be noted that, in this embodiment of the present invention, the existence of a predecessor-successor relationship between two query operators in the logical query plan may include: the two query operators have a direct predecessor-successor relationship.
具体的,当逻辑查询计划中的两个查询操作符之间存在直接依赖关系,即这两个查询操作符中的一个查询操作符所需操作的数据是另一个查询操作符所操作数据的结果时,则可以认为这两个查询操作符存在直接前驱-后继关系。例如,假设两个查询操作符为查询操作符1和查询操作符2,当查询操作符2所需操作的数据是查询操作符1操作数据的结果时,则可以认为查询操作符1与查询操作符2存在直接前驱-后继关系,且查询操作符1为查询操作符2的直接前驱操作符,查询操作符2为查询操作符1的直接后继操作符。Specifically, when there is a direct dependency between two query operators in the logical query plan, that is, the data required to be operated by one of the two query operators is the result of the data operated by the other query operator , it can be considered that there is a direct predecessor-successor relationship between these two query operators. For example, assuming that the two query operators are query operator 1 and query operator 2, when the data required to be operated by query operator 2 is the result of the operation data of query operator 1, it can be considered that query operator 1 and query operator Operator 2 has a direct predecessor-successor relationship, and query operator 1 is the direct predecessor operator of query operator 2, and query operator 2 is the direct successor operator of query operator 1.
进一步的,逻辑查询计划中的两个查询操作符存在前驱-后继关系还可以包括:这两个查询操作符存在间接前驱-后继关系。Further, the existence of a predecessor-successor relationship between the two query operators in the logical query plan may further include: the two query operators have an indirect predecessor-successor relationship.
具体的,当逻辑查询计划中的两个查询操作符之间存在间接依赖关系,即两个查询操作符之间间隔有至少一个其他查询操作符时,若这两个查询操作符中的一个查询操作符所需操作的数据是该其他查询操作符所操作数据的结果,且该其他查询操作符所操作数据为这两个查询操作符中的另一个查询操作符所操作数据的结果时,则可以认为这两个查询操作符存在间接前驱-后继关系。例如,假设两个查询操作符为查询操作符3和查询操作符4,查询操作符3和查询操作符4之间间隔有查询操作符5,当查询操作符4所需操作的数据是查询操作符5操作数据的结果,且查询操作符5所需操作的数据是查询操作符3操作数据的结果时,则可以认为查询操作符3与查询操作符4存在间接前驱-后继关系,且查询操作符3为查询操作符4的间接前驱操作符,查询操作符4为查询操作符3的间接后继操作符。Specifically, when there is an indirect dependency between two query operators in the logical query plan, that is, when there is at least one other query operator between the two query operators, if one of the two query operators queries When the data to be operated by the operator is the result of the data operated by the other query operator, and the data operated by the other query operator is the result of the data operated by the other query operator among the two query operators, then These two query operators can be considered to have an indirect predecessor-successor relationship. For example, suppose the two query operators are query operator 3 and query operator 4. There is query operator 5 between query operator 3 and query operator 4. When query operator 4 needs to operate the data, it is a query operation. When the result of the operation of data operator 5, and the data required to operate by the query operator 5 is the result of the operation data of the query operator 3, it can be considered that the query operator 3 and the query operator 4 have an indirect predecessor-successor relationship, and the query operation Operator 3 is the indirect predecessor operator of query operator 4, and query operator 4 is the indirect successor operator of query operator 3.
相应的,当第一查询操作符与第二查询操作符之间存在直接依赖关系时,第一查询操作符则为第二查询操作符的直接前驱操作符;当第一查询操作符与第二查询操作符之间存在间接依赖关系,即第一查询操作符与第二查询操作符之间间隔有第三查询操作符时,第一查询操作符则为第二查询操作符的间接前驱操作符。Correspondingly, when there is a direct dependency between the first query operator and the second query operator, the first query operator is the direct predecessor operator of the second query operator; There is an indirect dependency between query operators, that is, when there is a third query operator between the first query operator and the second query operator, the first query operator is the indirect predecessor operator of the second query operator .
示例性的,在图2所示的逻辑查询计划中,查询操作符2所需操作的数据是查询操作符1操作数据的结果,则可以确定查询操作符1与查询操作符2之间存在直接前驱-后继关系,查询操作符1为查询操作符2的直接前驱操作符(查询操作符1的分区属性“分区属性1”与查询操作符2的分区属性“分区属性1”相同,则可以确定查询操作符1作为第一查询操作符,查询操作符2作为第二查询操作符);查询操作符5所需操作的数据是查询操作符4操作数据的结果,则可以确定查询操作符5与查询操作符4之间存在直接前驱-后继关系,查询操作符4为查询操作符5的直接前驱操作符(查询操作符4的分区属性“分区属性3”为查询操作符5的分区属性“分区属性3,分区属性5”的前缀,则可以确定查询操作符4作为第一查询操作符,查询操作符5作为第二查询操作符)。Exemplarily, in the logical query plan shown in FIG. 2 , the data to be operated by query operator 2 is the result of the data operated by query operator 1, so it can be determined that there is a direct relationship between query operator 1 and query operator 2. Predecessor-successor relationship, query operator 1 is the direct predecessor operator of query operator 2 (the partition attribute "partition attribute 1" of query operator 1 is the same as the partition attribute "partition attribute 1" of query operator 2, then it can be determined that Query operator 1 is used as the first query operator, and query operator 2 is used as the second query operator); the data to be operated by query operator 5 is the result of the operation data of query operator 4, then it can be determined that query operator 5 and There is a direct predecessor-successor relationship between query operator 4, and query operator 4 is the direct predecessor operator of query operator 5 (the partition attribute "partition attribute 3" of query operator 4 is the partition attribute "partition attribute" of query operator 5. attribute 3, partition attribute 5” prefix, query operator 4 can be determined as the first query operator, and query operator 5 as the second query operator).
示例性的,在图2所示的逻辑查询计划中,查询操作符5所需操作的数据是查询操作符4操作数据的结果,查询操作符4所需操作的数据是查询操作符3操作数据的结果,则可以确定查询操作符3与查询操作符5之间存在间接前驱-后继关系,查询操作符3为查询操作符5的间接前驱操作符(查询操作符3的分区属性“分区属性3”为查询操作符5的分区属性“分区属性3,分区属性5”的前缀,则可以确定查询操作符3作为第一查询操作符,查询操作符5作为第二查询操作符,查询操作符4作为第三查询操作符)。Exemplarily, in the logical query plan shown in FIG. 2, the data required to operate by query operator 5 is the result of the operation data of query operator 4, and the data required to operate by query operator 4 is the operation data of query operator 3. , it can be determined that there is an indirect predecessor-successor relationship between query operator 3 and query operator 5, and query operator 3 is the indirect predecessor operator of query operator 5 (the partition attribute of query operator 3 "Partition attribute 3 " is the prefix of the partition attribute "partition attribute 3, partition attribute 5" of query operator 5, then query operator 3 can be determined as the first query operator, query operator 5 as the second query operator, and query operator 4 as the third query operator).
需要说明的是,本发明实施例提供的查询计划转化方法可以应用于具有数据查询功能的设备(如通信系统或者数据处理系统的后台服务器/存储设备)中,本发明实施例中的查询计划转化装置可以为上述具有数据查询功能的设备的处理器,如中央处理器(Central Processing Unit,CPU),或者查询计划转化装置也可以为上述具有数据查询功能的设备的处理器中的一个功能模块,本发明实施例对查询计划转化装置的具体形式不做限定。It should be noted that the query plan conversion method provided in the embodiment of the present invention can be applied to a device with a data query function (such as a communication system or a background server/storage device of a data processing system), and the query plan conversion method in the embodiment of the present invention The device may be a processor of the above-mentioned device with a data query function, such as a central processing unit (Central Processing Unit, CPU), or the query plan conversion device may also be a functional module in the processor of the above-mentioned device with a data query function, The embodiment of the present invention does not limit the specific form of the query plan conversion device.
S102、若第一查询操作符的分区属性为第二查询操作符的分区属性的前缀,则查询计划转化装置改写逻辑查询计划中第二查询操作符的分区属性,以使得第二查询操作符的分区属性与第一查询操作符的分区属性相同。S102. If the partition attribute of the first query operator is a prefix of the partition attribute of the second query operator, the query plan conversion device rewrites the partition attribute of the second query operator in the logical query plan, so that the second query operator's partition attribute The partition attribute is the same as that of the first query operator.
示例性的,假设第一查询操作符的分区属性为:“分区属性1”,第二查询操作符的分区属性为:“分区属性1,分区属性2”,则得出第一查询操作符的分区属性“分区属性1”为第二查询操作符的分区属性“分区属性1,分区属性2”的前缀。Exemplarily, assuming that the partition attribute of the first query operator is: "partition attribute 1", and the partition attribute of the second query operator is: "partition attribute 1, partition attribute 2", then the first query operator's partition attribute is obtained. The partition attribute "partition attribute 1" is the prefix of the partition attribute "partition attribute 1, partition attribute 2" of the second query operator.
其中,第一查询操作符的分区属性为第二查询操作符的分区属性的前缀,可以表示按第一查询操作符的分区属性进行分区即是将第一查询操作符所需操作的数据表中所有含有分区属性1的行划分到一个区域;按第二查询操作符的分区属性进行分区即是将第二查询操作符所要操作的数据表中所有含有分区属性1且含有分区属性2的行划分到一个区域,由此可以看出,当提取出的第一查询操作符的分区属性为提取出的第二查询操作符的分区属性的前缀时,按第一查询操作符的分区属性划分的区域中的数据包含按第二查询操作符的分区属性划分的区域中的数据。Wherein, the partition attribute of the first query operator is the prefix of the partition attribute of the second query operator, which can indicate that partitioning according to the partition attribute of the first query operator is to put the operation required by the first query operator in the data table. All rows containing partition attribute 1 are divided into one area; partitioning according to the partition attribute of the second query operator is to divide all rows containing partition attribute 1 and partition attribute 2 in the data table to be operated by the second query operator It can be seen from this that when the extracted partition attribute of the first query operator is the prefix of the extracted partition attribute of the second query operator, the region divided by the partition attribute of the first query operator The data in contains data in regions partitioned by the partition attribute of the second query operator.
需要说明的是,改写逻辑查询计划中第二查询操作符的分区属性,即将第二查询操作符的分区属性改写为第一查询操作符的分区属性,保持第一查询操作符的分区属性不变,以保证改写后的第二查询操作符的分区属性与第一查询操作符的分区属性相同。这样一来,第一查询操作符的分区属性与第二查询操作符的分区属性相同,就满足了现有技术要求的查询操作符对的分区属性完全相同的条件,以便将第一查询操作符与第二查询操作符生成一个物理查询任务。It should be noted that rewriting the partition attribute of the second query operator in the logical query plan means rewriting the partition attribute of the second query operator to the partition attribute of the first query operator, and keeping the partition attribute of the first query operator unchanged. , to ensure that the partition attribute of the rewritten second query operator is the same as that of the first query operator. In this way, the partition attribute of the first query operator is the same as the partition attribute of the second query operator, which satisfies the same condition as the partition attribute of the query operator pair required by the prior art, so that the first query operator Generate a physical query task with the second query operator.
S103、查询计划转化装置从逻辑查询计划中删除第二查询操作符的分区操作符,并根据第二查询操作符、第一查询操作符以及第一查询操作符的分区操作符生成一个物理查询任务,以构成物理查询计划。S103, the query plan conversion device deletes the partition operator of the second query operator from the logical query plan, and generates a physical query task according to the second query operator, the first query operator, and the partition operator of the first query operator , to form the physical query plan.
其中,经改写后的第二查询操作符的分区属性与第一查询操作符的分区属性相同,此时第一查询操作符与第二查询操作符满足现有技术要求的查询操作符对具有完全相同的分区属性的条件,删除第二查询操作符的分区操作符后,由于第一查询操作符与第二查询操作符之间存在前驱-后继关系,且第一查询操作符与第二查询操作符之间不再有分区操作符,以第一查询操作符的分区操作符作为生成新的物理查询任务的标识,第一查询操作符与第二查询操作符可以生成一个物理查询任务。查询计划转化装置以生成的多个物理查询任务为节点构造一个物理查询计划。Wherein, the partition attribute of the rewritten second query operator is the same as the partition attribute of the first query operator. At this time, the query operator pair for which the first query operator and the second query operator meet the requirements of the prior art have complete The same partition attribute conditions, after deleting the partition operator of the second query operator, because there is a predecessor-successor relationship between the first query operator and the second query operator, and the first query operator and the second query operator There is no partition operator between operators, and the partition operator of the first query operator is used as an identifier for generating a new physical query task. The first query operator and the second query operator can generate a physical query task. The query plan conversion device constructs a physical query plan for the nodes by using the generated multiple physical query tasks.
本发明实施例提供的查询计划转化方法,通过本方案,只要第一查询操作符的分区属性为第二查询操作符的分区属性的前缀,便可以通过改写逻辑查询计划中第二查询操作符的分区属性,以便于可以在第二查询操作符的分区属性与第一查询操作符的分区属性相同时,删除第二查询操作符的分区操作符,以根据第一查询操作符和第二查询操作符生成一个物理查询任务,减少构成物理查询计划的物理查询任务的数量。In the query plan conversion method provided by the embodiment of the present invention, through this solution, as long as the partition attribute of the first query operator is the prefix of the partition attribute of the second query operator, the second query operator in the logical query plan can be rewritten by rewriting the partition attribute of the second query operator. Partition attribute, so that when the partition attribute of the second query operator is the same as the partition attribute of the first query operator, the partition operator of the second query operator can be deleted, so as to operate according to the first query operator and the second query operator The operator generates one physical query task, reducing the number of physical query tasks that make up the physical query plan.
与现有技术中,仅可以在第一查询操作符的分区属性与第二查询操作符的分区属性完全相同时,才能够删除第二查询操作符的分区操作符相比,当第一查询操作符的分区属性为第二查询操作符的分区属性的前缀时,改写第二查询操作符的分区属性可以获得更多的分区属性完全相同的满足前驱-后继关系的查询操作符对(如第一查询操作符和第二查询操作符),进而可以更大程度的减少构成物理查询计划的物理查询任务的数量。Compared with the prior art, the partition operator of the second query operator can be deleted only when the partition attribute of the first query operator is exactly the same as that of the second query operator. When the partition attribute of the second query operator is the prefix of the partition attribute of the second query operator, rewriting the partition attribute of the second query operator can obtain more query operator pairs that satisfy the predecessor-successor relationship (such as the first query operator and the second query operator), which can further reduce the number of physical query tasks that constitute the physical query plan.
实施例2Example 2
本发明实施例提供一种查询计划转化方法,可以应用于逻辑查询计划向物理查询计划转化的过程中,该查询计划转化方法包括:The embodiment of the present invention provides a query plan conversion method, which can be applied to the process of converting a logical query plan to a physical query plan. The query plan conversion method includes:
S201、查询计划转化装置从逻辑查询计划中提取第一查询操作符和第二查询操作符,第一查询操作符为第二查询操作符的前驱操作符。S201. The query plan conversion apparatus extracts a first query operator and a second query operator from the logical query plan, where the first query operator is a precursor operator of the second query operator.
第一查询操作符为第二查询操作符的前驱操作符具体可以包括:第一查询操作符为第二查询操作符的直接前驱操作符、第一查询操作符与第二查询操作符之间间隔有第三查询操作符。The first query operator is the predecessor operator of the second query operator may specifically include: the first query operator is the direct predecessor operator of the second query operator, the interval between the first query operator and the second query operator There is a third query operator.
在本发明实施例的第一种应用场景中,如图3所示,在查询计划转化装置提取到第一查询操作符和第二查询操作符后,查询计划转化装置可以先判断第一查询操作符与第二查询操作符之间是否间隔有第三查询操作符,即执行S202:In the first application scenario of the embodiment of the present invention, as shown in FIG. 3 , after the query plan transformation device extracts the first query operator and the second query operator, the query plan transformation device may first determine the first query operation Whether there is a third query operator between the operator and the second query operator, that is, execute S202:
S202、查询计划转化装置判断第一查询操作符与第二查询操作符之间是否间隔有第三查询操作符。S202. The query plan conversion apparatus determines whether there is a third query operator spaced between the first query operator and the second query operator.
其中,第一查询操作符与第二查询操作符之间间隔有第三查询操作符,是指第一查询操作符是第二查询操作符的间接前驱操作符。Wherein, a third query operator is spaced between the first query operator and the second query operator, which means that the first query operator is an indirect precursor operator of the second query operator.
具体的,在本发明实施例的第一种应用场景中,若第一查询操作符为第二查询操作符的直接前驱操作符,即第一查询操作符与第二查询操作符之间没有间隔第三查询操作符,则继续执行S204及后续流程;若第一查询操作符为第二查询操作符的间接前驱操作符,即第一查询操作符与第二查询操作符之间间隔有第三查询操作符,则继续执行S203及后续流程。Specifically, in the first application scenario of the embodiment of the present invention, if the first query operator is a direct predecessor operator of the second query operator, that is, there is no gap between the first query operator and the second query operator If the first query operator is an indirect precursor operator of the second query operator, that is, there is a third interval between the first query operator and the second query operator The query operator continues to execute S203 and subsequent processes.
S203、查询计划转化装置判断第三查询操作符是否能够使用广播查询算法实现。S203. The query plan conversion device determines whether the third query operator can be implemented using a broadcast query algorithm.
具体的,若第三查询操作符能够使用广播查询算法实现,查询计划转化装置则继续执行S204;若第三查询操作符不能够使用广播查询算法实现,查询计划转化装置则不能将第三查询操作符改写为广播查询操作符,进而不能删除第二查询操作符的分区操作符,以合并第一查询操作符和第二查询操作符。Specifically, if the third query operator can be implemented using the broadcast query algorithm, the query plan conversion device continues to execute S204; if the third query operator cannot be implemented using the broadcast query algorithm, the query plan transformation device cannot convert the third query operation. The operator is rewritten as a broadcast query operator, so that the partition operator of the second query operator cannot be deleted to merge the first query operator and the second query operator.
S204、查询计划转化装置将第三查询操作符改写为广播查询操作符,广播查询操作符没有分区操作符。S204, the query plan conversion apparatus rewrites the third query operator as a broadcast query operator, and the broadcast query operator does not have a partition operator.
其中,当第一查询操作符与第二查询操作符之间间隔有第三查询操作符时,则表示若查询计划转化装置在将逻辑查询计划改写为物理查询计划的过程中,要根据第一查询操作符和第二查询操作符仅生成一个物理查询任务,则在生成物理查询任务之前,需要对第一查询操作符与第二查询操作符之间间隔的第三查询操作符进行改写,否则在生成物理查询任务时,由于间隔在第一查询操作符与第二查询操作符之间的第三查询操作符具有分区操作符,即表示第一查询操作符与第二查询操作符之间具有一个分区操作符,查询计划转化装置在生成物理查询任务时,根据第一查询操作符和第二查询操作符则至少可以生成两个物理查询任务,则不能够达到减少构成物理查询计划的物理查询任务的数量的目的。Wherein, when there is a third query operator spaced between the first query operator and the second query operator, it means that if the query plan conversion device is in the process of rewriting the logical query plan into a physical query plan, The query operator and the second query operator only generate one physical query task. Before generating the physical query task, the third query operator at the interval between the first query operator and the second query operator needs to be rewritten, otherwise When generating a physical query task, since the third query operator spaced between the first query operator and the second query operator has a partition operator, it means that there is a partition operator between the first query operator and the second query operator. A partition operator, when the query plan conversion device generates a physical query task, at least two physical query tasks can be generated according to the first query operator and the second query operator, and the physical query that constitutes the physical query plan cannot be reduced. The purpose of the number of tasks.
基于上述描述,当第一查询操作符与第二查询操作符之间间隔有第三查询操作符时,查询计划转化装置则可以将第三查询操作符改写为广播查询操作符,而广播查询操作符是没有分区操作符的,因此,由于此时第一查询操作符与第二查询操作符之间的广播查询操作符是没有分区操作符的,在删除第二查询操作符的分区操作符后,第一查询操作符与第二查询操作符之间没有分区操作符,则查询计划转化装置在生成物理查询任务时,根据第一查询操作符和第二查询操作符仅可以生成一个物理查询任务,则可以能够达到减少构成物理查询计划的物理查询任务的数量的目的。Based on the above description, when there is a third query operator spaced between the first query operator and the second query operator, the query plan conversion device can rewrite the third query operator as a broadcast query operator, and the broadcast query operator The operator has no partition operator. Therefore, since the broadcast query operator between the first query operator and the second query operator has no partition operator, after deleting the partition operator of the second query operator , there is no partition operator between the first query operator and the second query operator, when the query plan conversion device generates a physical query task, it can only generate one physical query task according to the first query operator and the second query operator , the purpose of reducing the number of physical query tasks that constitute the physical query plan can be achieved.
示例性的,如图4或图5所示,查询操作符2所需操作的数据是查询操作符3操作数据的结果,查询操作符3所需操作的数据是查询操作符1操作数据的结果,则确定查询操作符1与查询操作符2存在间接前驱-后继关系。其中,图4中改写前的逻辑查询计划中查询操作符1的分区属性为查询操作符2的分区属性的前缀,在图5中改写前的逻辑查询计划中查询操作符1的分区属性与查询操作符2的分区属性相同。Exemplarily, as shown in FIG. 4 or FIG. 5, the data to be operated by the query operator 2 is the result of the operation of the query operator 3, and the data to be operated by the query operator 3 is the result of the operation of the query operator 1. , then it is determined that there is an indirect predecessor-successor relationship between query operator 1 and query operator 2. Among them, the partition attribute of query operator 1 in the logical query plan before rewriting in FIG. 4 is the prefix of the partition attribute of query operator 2, and the partition attribute of query operator 1 in the logical query plan before rewriting in FIG. 5 is the same as the query The partition properties of operator 2 are the same.
S205、查询计划转化装置确定第一查询操作符的分区属性与第二查询操作符的分区属性的关系。S205. The query plan conversion apparatus determines the relationship between the partition attribute of the first query operator and the partition attribute of the second query operator.
其中,第一查询操作符的分区属性与第二查询操作符的分区属性的关系包括:第一查询操作符的分区属性为第二查询操作符的分区属性的前缀、第一查询操作符的分区属性与第二查询操作符的分区属性相同。The relationship between the partition attribute of the first query operator and the partition attribute of the second query operator includes: the partition attribute of the first query operator is the prefix of the partition attribute of the second query operator, the partition of the first query operator The properties are the same as the partition properties of the second query operator.
具体的,若第一查询操作符的分区属性为第二查询操作符的分区属性的前缀,则继续执行S206及后续流程;若第一查询操作符的分区属性与第二查询操作符的分区属性相同,则继续执行S207及后续流程。Specifically, if the partition attribute of the first query operator is the prefix of the partition attribute of the second query operator, then continue to execute S206 and subsequent processes; if the partition attribute of the first query operator is the same as the partition attribute of the second query operator If the same, then continue to execute S207 and subsequent processes.
S206、查询计划转化装置改写逻辑查询计划中第二查询操作符的分区属性,以使得第二查询操作符的分区属性与第一查询操作符的分区属性相同。S206. The query plan conversion device rewrites the partition attribute of the second query operator in the logical query plan, so that the partition attribute of the second query operator is the same as the partition attribute of the first query operator.
其中,查询计划转化装置可以改写逻辑查询计划中的第二查询操作符的分区属性,即将第二查询操作符的分区属性改写为第一查询操作符的分区属性,同时保持第一查询操作符的分区属性不变,以保证改写后的第二查询操作符的分区属性与第一查询操作符的分区属性相同,这样改写以后,第一查询操作符与第二查询操作符具有完全相同的分区属性,满足现有技术生成物理查询任务所要求的查询操作符对具有完全相同的分区属性的条件,以便在由逻辑查询计划转化为物理查询计划时,将第一查询操作符与第二查询操作符生成一个物理查询任务。The query plan conversion device can rewrite the partition attribute of the second query operator in the logical query plan, that is, rewrite the partition attribute of the second query operator to the partition attribute of the first query operator, while maintaining the partition attribute of the first query operator. The partition attribute remains unchanged to ensure that the partition attribute of the rewritten second query operator is the same as the partition attribute of the first query operator, so that after the rewrite, the first query operator and the second query operator have exactly the same partition attribute , satisfies the condition that the pair of query operators required by the prior art to generate a physical query task has exactly the same partition attribute, so that when the logical query plan is converted into a physical query plan, the first query operator and the second query operator Generate a physical query task.
示例性的,如图4所示,查询计划转化装置可以对“改写广播查询操作符后的逻辑查询计划”中的查询操作符2的分区属性进行改写,将查询操作符2的分区属性“分区属性1,分区属性2”改写为查询操作符1的分区属性“分区属性1”,同时保持查询操作符1的分区属性“分区属性1”不变,以使得查询操作符1的分区属性与查询操作符2的分区属性相同,则可以得到如图4所示的“改写分区属性后的逻辑查询计划”。Exemplarily, as shown in FIG. 4 , the query plan conversion device may rewrite the partition attribute of the query operator 2 in the “logical query plan after rewriting the broadcast query operator”, and convert the partition attribute of the query operator 2 to the “partition attribute”. attribute 1, partition attribute 2" is rewritten as the partition attribute "partition attribute 1" of query operator 1, while keeping the partition attribute "partition attribute 1" of query operator 1 unchanged, so that the partition attribute of query operator 1 is the same as the query If the partition attributes of operator 2 are the same, the "logical query plan after rewriting the partition attributes" as shown in Figure 4 can be obtained.
S207、查询计划转化装置从逻辑查询计划中删除第二查询操作符的分区操作符。S207. The query plan conversion apparatus deletes the partition operator of the second query operator from the logical query plan.
需要说明的是,此时第二查询操作符的分区属性与第一查询操作符的分区属性完全相同,满足现有技术要求的查询操作符的分区属性完全相同的要求,将逻辑查询计划中的第二查询操作符的分区操作符删除后,由于第一查询操作符与第二查询操作符之间存在前驱-后继关系,并且第一查询操作符与第二查询操作符之间不再有分区操作符,因此,以第一查询操作符的分区操作符作为生成新的物理查询任务的标识,第一查询操作符与第二查询操作符可以生成一个物理查询任务。It should be noted that, at this time, the partition attribute of the second query operator is exactly the same as that of the first query operator, and the partition attribute of the query operator that meets the requirements of the prior art is exactly the same. After the partition operator of the second query operator is deleted, since there is a predecessor-successor relationship between the first query operator and the second query operator, and there is no more partition between the first query operator and the second query operator Therefore, the partition operator of the first query operator is used as an identifier for generating a new physical query task, and the first query operator and the second query operator can generate a physical query task.
示例性的,查询计划转化装置可以将如图4所示的“改写分区属性后的逻辑查询计划”或者如图5所示的“改写后的逻辑查询计划”作为如图6所示的“删除分区操作符前的逻辑查询计划”,并删除“删除分区操作符前的逻辑查询计划”中查询操作符2的分区操作符,以得到如图6所示的“删除分区操作符后的逻辑查询计划”。Exemplarily, the query plan conversion device may use the “logical query plan after rewriting the partition attribute” as shown in FIG. 4 or the “logical query plan after rewriting” as shown in FIG. "Logical query plan before partition operator", and delete the partition operator of query operator 2 in "Logical query plan before delete partition operator" to obtain "Logical query after delete partition operator" as shown in Figure 6 plan".
S208、查询计划转化装置根据第二查询操作符、第一查询操作符以及第一查询操作符的分区操作符生成一个物理查询任务,以构成物理查询计划。S208, the query plan conversion device generates a physical query task according to the second query operator, the first query operator and the partition operator of the first query operator, so as to form a physical query plan.
其中,当第一查询操作符与第二查询操作符之间间隔有第三查询操作符时,查询计划转化装置根据第二查询操作符、第一查询操作符以及第一查询操作符的分区操作符生成一个物理查询任务的方法具体可以为:根据第二查询操作符、广播查询操作符、第一查询操作符以及第一查询操作符的分区操作符生成一个物理查询任务。Wherein, when there is a third query operator spaced between the first query operator and the second query operator, the query plan conversion device operates according to the partition of the second query operator, the first query operator and the first query operator The method for generating a physical query task by the operator may specifically be: generating a physical query task according to the second query operator, the broadcast query operator, the first query operator and the partition operator of the first query operator.
示例性的,查询计划转化装置可以采用任务流关联性优化(Job-flowCorrelation,JFC)技术,根据第二查询操作符、第一查询操作符以及第一查询操作符的分区操作符生成一个物理查询任务,以构成物理查询计划。查询计划转化装置采用JFC技术生成物理查询任务,以构成物理查询计划的具体方法与现有技术中采用JFC技术生成物理查询任务,以构成物理查询计划的方法类似,本发明实施例这里不再赘述。Exemplarily, the query plan conversion apparatus may use a job-flow correlation optimization (Job-flow Correlation, JFC) technology to generate a physical query according to the second query operator, the first query operator, and the partition operator of the first query operator. tasks to form the physical query plan. The query plan conversion device uses the JFC technology to generate a physical query task, and the specific method for forming a physical query plan is similar to the method for using the JFC technology to generate a physical query task in the prior art to form a physical query plan, and the embodiment of the present invention will not be repeated here. .
在本发明实施例的第二种应用场景中,如图7所示,在查询计划转化装置提取到第一查询操作符和第二查询操作符后,查询计划转化装置可以先确定第一查询操作符的分区属性与第二查询操作符的分区属性的关系,即执行S208。In the second application scenario of the embodiment of the present invention, as shown in FIG. 7 , after the query plan transformation device extracts the first query operator and the second query operator, the query plan transformation device may first determine the first query operation The relationship between the partition attribute of the operator and the partition attribute of the second query operator is executed, that is, S208 is executed.
在第二种应用场景中,如图7所示,在执行S205之后,若查询计划转化装置确定第一查询操作符的分区属性为第二查询操作符的分区属性的前缀,则可以继续执行S206,即查询计划转化装置改写逻辑查询计划中第二查询操作符的分区属性,以使得第二查询操作符的分区属性与第一查询操作符的分区属性相同;若查询计划转化装置确定第一查询操作符的分区属性与第二查询操作符的分区属性相同,则直接执行S207,即查询计划转化装置从逻辑查询计划中删除第二查询操作符的分区操作符;然后查询计划转化装置确定第一查询操作符与第二查询操作符之间是否间隔有第三查询操作符,即执行S202;若查询计划转化装置确定第一查询操作符与第二查询操作符之间间隔有第三查询操作符,则继续执行S204,若查询计划转化装置确定第一查询操作符与第二查询操作符之间未间隔有第三查询操作符,则直接执行S208。In the second application scenario, as shown in FIG. 7 , after executing S205, if the query plan conversion apparatus determines that the partition attribute of the first query operator is the prefix of the partition attribute of the second query operator, it can continue to execute S206 , that is, the query plan transformation device rewrites the partition attribute of the second query operator in the logical query plan, so that the partition attribute of the second query operator is the same as the partition attribute of the first query operator; if the query plan transformation device determines that the first query If the partition attribute of the operator is the same as the partition attribute of the second query operator, then directly execute S207, that is, the query plan transformation device deletes the partition operator of the second query operator from the logical query plan; then the query plan transformation device determines the first query plan transformation device. Check whether there is a third query operator between the query operator and the second query operator, that is, go to S202; if the query plan conversion device determines that there is a third query operator between the first query operator and the second query operator , then continue to execute S204, and if the query plan conversion apparatus determines that there is no third query operator spaced between the first query operator and the second query operator, then directly execute S208.
本发明实施例利用以下公式用以计算采用本发明实施例提供的查询计划转化方法所能带来的有益效果。In the embodiment of the present invention, the following formula is used to calculate the beneficial effect brought by the query plan conversion method provided by the embodiment of the present invention.
执行一个物理查询任务所耗费的时间由读数据所耗费的时间、写数据所耗费的时间、网络传输数据所耗费的时间、CPU运算所耗费的时间四部分组成,通过检测读单位数据量的数据所用的时间,和当前的物理查询任务中查询操作符操作前输入的数据量的大小,来计算读数据所耗费的时间;检测写单位数据量的数据所用的时间,及当前的物理查询任务中查询操作符操作后输出的数据量的大小,来计算写数据所耗费的时间;检测CPU对单位数据量的数据进行运算所用的时间,和当前的物理查询任务中查询操作符对输入数据使用的百分比以及当前的物理查询任务中查询操作符操作前输入的数据量的大小,来计算CPU运算所耗费的时间;检测网络对单位数据量的数据进行传输所用的时间,和当前的物理查询任务中查询操作符对输入数据使用的百分比以及当前的物理查询任务中查询操作符操作前输入的数据量的大小,来计算网络传输数据所耗费的时间。The time spent executing a physical query task consists of four parts: the time spent reading data, the time spent writing data, the time spent transmitting data over the network, and the time spent in CPU operations. The time used, and the size of the data input before the query operator operation in the current physical query task, to calculate the time spent reading data; detecting the time used to write data per unit data volume, and the current physical query task. The size of the output data volume after the operation of the query operator is used to calculate the time spent writing data; the time it takes for the CPU to perform operations on the data per unit of data is detected, and the query operator in the current physical query task uses the input data. The percentage and the amount of data input before the query operator operation in the current physical query task are used to calculate the time consumed by the CPU operation; the time it takes for the network to transmit data per unit of data is detected and compared with the current physical query task. The percentage of input data used by the query operator and the amount of data input before the query operator operation in the current physical query task are used to calculate the time spent on network transmission of data.
对于一个逻辑查询计划中的三个查询操作符OPi、OPj和OPk,满足条件OPi是OPj的直接前驱操作符,OPj是OPk的直接前驱操作符,并且OPi的分区属性是OPk的分区属性的前缀,OPj可以通过广播查询操作符来完成。For the three query operators OP i , OP j and OP k in a logical query plan, it is satisfied that OP i is the immediate predecessor operator of OP j , OP j is the immediate predecessor operator of OP k , and the partition of OP i The attribute is the prefix of the partition attribute of OP k , and OP j can be done by broadcasting the query operator.
采用本发明实施例提供的查询计划转化方法之前,这三个查询操作符生成三个物理查询任务,执行三个物理查询任务所耗费的时间是分别执行这三个物理查询任务所耗费的时间之和,可以根据以下公式一来计算:Before adopting the query plan conversion method provided by the embodiment of the present invention, the three query operators generate three physical query tasks, and the time spent in executing the three physical query tasks is the time spent in executing the three physical query tasks respectively. and can be calculated according to the following formula 1:
Costbefore=Hr×Ii+v×Ii+μ×ai×Ii+v×ai×Ii+Hw×Mi+Hr×Ij+v×Ij+μ×aj×Ij+v×aj×Ij+Hw×Mj+Hr×Ik+v×Ik+μ×ak×Ik+v×ak×Ik+Hw×Mk Cost before =H r ×I i +v×I i +μ×a i ×I i +v×a i ×I i +H w ×M i +H r ×I j +v×I j +μ×a j ×I j +v×a j ×I j +H w ×M j +H r ×I k +v×I k +μ×a k ×I k +v×a k ×I k +H w ×M k
公式一Formula one
其中Costbefore用于表示采用本发明实施例提供的查询计划转化方法之前,执行三个物理查询任务所耗费的时间,Hr表示读单位数据量的数据所用的时间,Hw表示写单位数据量的数据所用的时间,v表示CPU对单位数据量的数据进行运算所用的时间,μ表示网络中单位数据量的数据进行传输所用的时间,Ii、Ij和Ik分别表示当前的三个物理查询任务中的查询操作符OPi、OPj和OPk操作前输入的数据量的大小,Mi、Mj和Mk分别表示当前的三个物理查询任务中的查询操作符OPi、OPj和OPk操作后输出的数据量的大小,ai、aj和ak分别表示当前执行的三个物理查询任务中的查询操作符OPi、OPj和OPk对输入数据使用的百分比。Wherein, Cost before is used to represent the time taken to execute three physical query tasks before adopting the query plan conversion method provided by the embodiment of the present invention, H r represents the time used to read data of a unit data volume, and H w represents a unit data volume of writing The time it takes for the data to be generated, v represents the time it takes for the CPU to perform operations on the unit data volume, μ represents the time it takes for the unit data volume data to be transmitted in the network, I i , I j and I k represent the current three The size of the data input before the query operators OP i , OP j and OP k in the physical query task, M i , M j and M k respectively represent the query operators OP i , OP i , and OP k in the current three physical query tasks. The size of the data output after the operations of OP j and OP k , a i , a j and a k respectively represent the query operators OP i , OP j and OP k in the currently executed three physical query tasks used for the input data. percentage.
采用本发明实施例提供的查询计划转化方法后,查询操作符OPk与查询操作符OPi的分区属性相同,查询操作符OPj改写为广播查询操作符,三个查询操作符可以生成为一个物理查询任务,由于三个查询操作符在同一个物理查询任务中,则执行物理查询任务时,只进行一次读数据操作和一次写数据操作即可,此物理查询任务中的广播查询操作符虽然不必对查询操作符OPi的输出数据进行读数据操作,但仍然需要对其他输入来源的数据进行读数据操作。因此系统执行这一个物理查询任务所耗费的时间可以根据以下公式二来计算:After using the query plan conversion method provided by the embodiment of the present invention, the query operator OP k and the query operator OP i have the same partition attribute, the query operator OP j is rewritten as a broadcast query operator, and three query operators can be generated as one For a physical query task, since the three query operators are in the same physical query task, only one read data operation and one data write operation can be performed when the physical query task is executed. Although the broadcast query operator in this physical query task It is not necessary to perform read data operations on the output data of the query operator OP i , but it is still necessary to perform read data operations on data from other input sources. Therefore, the time spent by the system to execute this physical query task can be calculated according to the following formula 2:
Costafter=Hr×Ii+v×Ii+μ×ai×Ii+v×ai×Ii+Hr×(Ij-Mi)+v×r×Ij+v×aj×Ij+v×ak×Ik+Hw×Mk Cost after =H r ×I i +v×I i +μ×a i ×I i +v×a i ×I i +H r ×(I j -M i )+v×r×I j +v× a j ×I j +v×a k ×I k +H w ×M k
公式二Formula two
其中,Costafter用于表示采用本发明实施例提供的查询计划转化方法之后,执行三个查询操作符OPi、OPj和OPk生成的一个物理查询任务所耗费的时间,r表示Mi的进程数量,Hr表示读单位数据量的数据所用的时间,Hw表示写单位数据量的数据所用的时间,v表示CPU对单位数据量的数据进行运算所用的时间,μ表示网络中单位数据量的数据进行传输所用的时间,Ii、Ij和Ik分别表示当前的物理查询任务中的查询操作符OPi、OPj和OPk操作前输入的数据量的大小,Mi和Mk分别表示当前的物理查询任务中的查询操作符OPi和OPk操作后输出的数据量的大小,ai、aj和ak分别表示当前执行的物理查询任务中的查询操作符OPi、OPj和OPk对输入数据使用的百分比。Wherein, Cost after is used to represent the time taken to execute a physical query task generated by the three query operators OP i , OP j and OP k after using the query plan conversion method provided by the embodiment of the present invention, and r represents the value of M i The number of processes, H r represents the time it takes to read the unit data volume, H w represents the time it takes to write the unit data volume data, v represents the CPU operation time for the unit data volume data, μ represents the unit data in the network The time it takes to transmit the amount of data, I i , I j and I k respectively represent the amount of data input before the operation of the query operators OP i , OP j and OP k in the current physical query task, M i and M k represents the size of the data output after the operation of the query operators OP i and OP k in the current physical query task, respectively, a i , a j and a k respectively represent the query operator OP i in the currently executed physical query task , OP j , and OP k are used as a percentage of the input data.
采用本发明实施例提供的查询计划转化方法可节省的时间,为采用本发明实施例提供的查询计划转化方法及装置之前所耗费的时间与采用本发明实施例提供的查询转化方法及装置后所耗费的时间的差值。采用本发明实施例提供的查询计划转化方法及装置可节省的时间可通过以下公式三计算:The time saved by using the query plan conversion method provided by the embodiment of the present invention is the time spent before using the query plan conversion method and device provided by the embodiment of the present invention and the time spent after using the query plan conversion method and device provided by the embodiment of the present invention. difference in time spent. The time that can be saved by adopting the query plan conversion method and device provided by the embodiment of the present invention can be calculated by the following formula 3:
Costsaved=Costbefore-Costafter=Hw×Mi+Hr×Mi-v×(r-1)×Ij+μ×aj×Ij+Hw×Mj+Hr×Ik+v×Ik+μ×ak×Ik=(Hw+Hr)×Mi+(μ×aj-v×(r-1))×Ij+Hw×Mj+(Hr+v+μ×ak)×Ik Cost saved =Cost before -Cost after =H w ×M i +H r ×M i -v×(r-1)×I j +μ×a j ×I j +H w ×M j +H r ×I k +v×I k +μ× ak ×I k =(H w +H r )×M i +(μ×a j -v×(r-1))×I j +H w ×M j + (H r +v+μ× ak )×I k
公式三Formula three
其中,Costsaved用于表示采用本发明实施例提供的查询计划转化方法后可节省的时间,r表示Mi的进程数量,Hr表示读单位数据量的数据所用的时间,Hw表示写单位数据量的数据所用的时间,v表示CPU对单位数据量的数据进行运算所用的时间,μ表示网络中单位数据量的数据进行传输所用的时间,Ij和Ik分别表示查询操作符OPj和OPk操作前输入的数据量的大小,Mi和Mj分别表示查询操作符OPi和OPj操作后输出的数据量的大小,aj和ak分别表示查询操作符OPj和OPk对输入数据使用的百分比。Wherein, Cost saved is used to represent the time that can be saved after adopting the query plan conversion method provided by the embodiment of the present invention, r represents the number of processes of Mi , H r represents the time used to read data of a unit data amount, and H w represents the writing unit The time taken by the data of the data volume, v represents the time it takes for the CPU to perform operations on the data of the unit data volume, μ represents the time it takes for the data of the unit data volume to be transmitted in the network, I j and I k represent the query operator OP j respectively and the size of the data input before the operation of OP k , M i and M j represent the size of the data output after the operation of the query operators OP i and OP j respectively, a j and a k represent the query operators OP j and OP respectively The percentage of k used on the input data.
当系统中读数据操作和写数据操作占时间开销的主导,即读数据操作和写数据操作在时间开销中占的比例大于其他时间开销时,对于公式三中的系数μ(网络对单位数据量的数据进行传输所用的时间)与系数v(CPU对单位数据量的数据进行运算所用的时间)均可忽略不计,则采用本发明实施例提供的查询计划转化方法后可节省的时间为:When read data operations and write data operations in the system dominate the time overhead, that is, when the proportion of read data operations and write data operations in the time overhead is greater than other time overheads, for the coefficient μ in formula 3 (the network to unit data volume The time used for transmission of the data) and the coefficient v (the time used by the CPU to calculate the data of the unit data volume) can be ignored, then the time that can be saved after adopting the query plan conversion method provided by the embodiment of the present invention is:
Costsaved=(Hw+Hr)×Mi+Hw×Mj+Hr×Ik Cost saved =(H w +H r )×M i +H w ×M j +H r ×I k
其中,Costsaved用于表示采用本发明实施例提供的查询计划转化方法后可节省的时间,Hr表示读单位数据量的数据所用的时间,Hw表示写单位数据量的数据所用的时间,Ik表示查询操作符OPk操作前输入的数据量的大小,Mi和Mj分别表示查询操作符OPi和OPj操作后输出的数据量的大小。Wherein, Cost saved is used to represent the time that can be saved after adopting the query plan conversion method provided by the embodiment of the present invention, H r represents the time used to read data of a unit data volume, H w represents the time used to write data of a unit data volume, I k represents the size of the input data before the operation of the query operator OP k , and M i and M j respectively represent the size of the output data after the operation of the query operator OP i and OP j .
当系统中网络传输占时间开销的主导,即网络传输在时间开销中占的比例大于其他时间开销时,对于公式三中的系数Hr(读单位数据量的数据所用的时间)、系数Hw(写单位数据量的数据所用的时间)以及系数v(CPU对单位数据量的数据进行运算所用的时间)均可忽略不计,则采用本发明实施例提供的查询计划转化方法后可节省的时间为:When the network transmission in the system dominates the time overhead, that is, the network transmission accounts for a larger proportion of the time overhead than other time overheads, for the coefficient H r (the time it takes to read data of a unit amount of data) in formula 3, the coefficient H w (the time it takes to write the data of the unit data volume) and the coefficient v (the time it takes for the CPU to perform the operation on the data of the unit data volume) can be ignored, then the time that can be saved by using the query plan conversion method provided by the embodiment of the present invention for:
Costsaved=μ×aj×Ij+μ×ak×Ik Cost saved = μ×a j ×I j +μ× ak ×I k
其中,Costsaved用于表示采用本发明实施例提供的查询计划转化方法后可节省的时间,μ表示网络中单位数据量的数据进行传输所用的时间,Ij和Ik分别表示查询操作符OPj和OPk操作前输入的数据量的大小,aj和ak分别表示查询操作符OPj和OPk对输入数据使用的百分比。Wherein, Cost saved is used to represent the time that can be saved after adopting the query plan conversion method provided by the embodiment of the present invention, μ represents the time taken for data transmission of unit data volume in the network, I j and I k represent the query operator OP respectively The size of the input data before the operation of j and OP k , a j and a k represent the percentage of the input data used by the query operators OP j and OP k , respectively.
当系统中读数据操作和写数据操作与网络传输在时间开销上所占比例相当时,即读数据操作和写数据操作在时间开销中占的比例等于网络传输在时间开销中占的比例时,公式三中只对系数v(CPU对单位数据量的数据进行运算所用的时间)忽略不计,则采用本发明实施例提供的查询计划转化方法后可节省的时间为:When the proportion of the time overhead of read data operations and write data operations and network transmission in the system is equal, that is, when the proportion of read data operations and write data operations in the time overhead is equal to the proportion of network transmission in the time overhead, In formula 3, only the coefficient v (the time it takes for the CPU to perform operations on the data of the unit data volume) is ignored, then the time that can be saved after adopting the query plan conversion method provided by the embodiment of the present invention is:
Costsaved=(Hw+Hr)×Mi+μ×aj×Ij+Hw×Mj+(Hr+μ+ak)×Ik Cost saved =(H w +H r )×M i +μ×a j ×I j +H w ×M j +(H r +μ+ ak )×I k
其中,Costsaved用于表示采用本发明实施例提供的查询计划转化方法后可节省的时间,Hr表示读单位数据量的数据所用的时间,Hw表示写单位数据量的数据所用的时间,μ表示网络中单位数据量的数据进行传输所用的时间,Ij和Ik分别表示查询操作符OPj和OPk操作前输入的数据量的大小,Mi和Mj分别表示查询操作符OPi和OPj操作后输出的数据量的大小,aj和ak分别表示查询操作符OPj和OPk对输入数据使用的百分比。Wherein, Cost saved is used to represent the time that can be saved after adopting the query plan conversion method provided by the embodiment of the present invention, H r represents the time used to read data of a unit data volume, H w represents the time used to write data of a unit data volume, μ represents the time it takes to transmit a unit of data in the network, I j and I k represent the size of the data input before the operation of the query operators OP j and OP k , respectively, and M i and M j represent the query operators OP, respectively The size of the output data after the operations of i and OP j , a j and a k represent the percentage of the input data used by the query operators OP j and OP k , respectively.
进一步可选的,为了提高数据查询系统的查询效率,在从逻辑查询计划中删除第二查询操作符的分区操作符之前,该查询计划转化方法还可以包括:改写所述第一查询操作符的排序属性,以使得所述第一查询操作符的排序属性与所述第二查询操作符的排序属性相同。Further optionally, in order to improve the query efficiency of the data query system, before deleting the partition operator of the second query operator from the logical query plan, the query plan conversion method may further include: rewriting the first query operator. ordering properties, so that the ordering properties of the first query operator are the same as the ordering properties of the second query operator.
其中,改写前的所述第一查询操作符的排序属性与第一查询操作符的分区属性相同,所述第二查询操作符的排序属性与改写前的所述第二查询操作符的分区属性相同;所述排序属性用于对所述逻辑查询计划的所述查询操作符所操作的数据表中的数据进行分区排序。Wherein, the sorting attribute of the first query operator before rewriting is the same as the partition attribute of the first query operator, and the sorting attribute of the second query operator is the same as the partitioning attribute of the second query operator before the rewriting The same; the sorting attribute is used to perform partition sorting on the data in the data table operated by the query operator of the logical query plan.
需要说明的是,分区操作符根据查询操作符的排序属性对逻辑查询计划的查询操作符所操作的数据表中的数据进行排序。It should be noted that the partition operator sorts the data in the data table operated by the query operator of the logical query plan according to the sorting attribute of the query operator.
当第一查询操作符的分区属性为改写前的第二查询操作符的分区属性的前缀时,即第一查询操作符的排序属性为第二查询操作符的排序属性的前缀。假设第一查询操作符的排序属性为“排序属性1”,第二查询操作符的排序属性为“排序属性1,排序属性2”时,分区操作符按照第一查询操作符的排序属性对第一查询操作符及第二查询操作符所操作的数据表进行排序时,即是将数据表中所有含有相同排序属性1的行排序到一起,分区操作符按照第二查询操作符的排序属性对第一查询操作符及第二查询操作符所操作的数据表进行分区排序时,即是将数据表中所有含有相同排序属性1且含有相同排序属性2的行排序到一起。When the partition attribute of the first query operator is the prefix of the partition attribute of the second query operator before rewriting, that is, the sorting attribute of the first query operator is the prefix of the sorting attribute of the second query operator. Assuming that the sorting attribute of the first query operator is "sort attribute 1", and the sorting attribute of the second query operator is "sort attribute 1, sorting attribute 2", the partition operator according to the sorting attribute of the first query operator When sorting the data table operated by the first query operator and the second query operator, all the rows in the data table with the same sorting attribute 1 are sorted together, and the partition operator is based on the sorting attribute of the second query operator. When the data table operated by the first query operator and the second query operator is partitioned and sorted, all rows in the data table with the same sorting attribute 1 and the same sorting attribute 2 are sorted together.
在不对排序属性进行改写的场景下,在由逻辑查询计划转化为物理查询计划时,由于第二查询操作符的分区操作符删除,第一查询操作符与第二查询操作符将生成一个物理查询任务,分区操作符将按照第一查询操作符的排序属性对第一查询操作符及第二查询操作符所需操作的数据表进行排序,由于第一查询操作符的分区属性和排序属性是相同的,因此按照第一查询操作符的排序属性进行分区排序的结果和分区排序前是相同的。第二查询操作符对按照第一查询操作符的排序属性排序后的数据表进行操作时,仍需要对该数据表进行全面的扫描,将所有含有排序属性2的行找出来,这样一来,会比较耗费时间。In the scenario where the sorting attribute is not rewritten, when the logical query plan is converted into a physical query plan, since the partition operator of the second query operator is deleted, the first query operator and the second query operator will generate a physical query Task, the partition operator will sort the data table that the first query operator and the second query operator need to operate according to the sorting attribute of the first query operator, because the partitioning attribute and sorting attribute of the first query operator are the same , so the result of partition sorting according to the sorting attribute of the first query operator is the same as before the partition sorting. When the second query operator operates on the data table sorted according to the sorting attribute of the first query operator, it still needs to perform a comprehensive scan on the data table to find out all the rows containing the sorting attribute 2. In this way, Will be more time consuming.
对第一查询操作符的排序属性进行改写,即是将第一查询操作符的排序属性改写为第二查询操作符的排序属性,这样一来分区操作符将按照第二查询操作符的排序属性对第一查询操作符及第二查询操作符所需操作的数据表进行分区排序,将数据表中所有含有相同排序属性1且含有相同排序属性2的行排序到一起,第二查询操作符不必再对数据表进行全面的扫描,就可以找到需要进行操作的数据,相对于不对排序属性进行改写的场景,节省了扫描数据所需的时间。Rewriting the sorting attribute of the first query operator is to rewrite the sorting attribute of the first query operator to the sorting attribute of the second query operator, so that the partition operator will be based on the sorting attribute of the second query operator. Partition and sort the data table that the first query operator and the second query operator need to operate, and sort all the rows in the data table that contain the same sorting attribute 1 and the same sorting attribute 2 together. The second query operator does not need to be After a comprehensive scan of the data table, the data that needs to be manipulated can be found, which saves the time required to scan the data compared to the scenario where the sorting attribute is not rewritten.
本发明实施例提供的查询计划转化方法,只要第一查询操作符的分区属性为第二查询操作符的分区属性的前缀,便可以改写逻辑查询计划中第二查询操作符的分区属性,以便于可以在第二查询操作符的分区属性与第一查询操作符的分区属性相同时,删除第二查询操作符的分区操作符,以根据第一查询操作符和第二查询操作符生成一个物理查询任务,减少构成物理查询计划的物理查询任务的数量。In the query plan conversion method provided by the embodiment of the present invention, as long as the partition attribute of the first query operator is the prefix of the partition attribute of the second query operator, the partition attribute of the second query operator in the logical query plan can be rewritten, so as to facilitate When the partition attribute of the second query operator is the same as the partition attribute of the first query operator, the partition operator of the second query operator can be deleted to generate a physical query according to the first query operator and the second query operator tasks, reducing the number of physical query tasks that make up the physical query plan.
与现有技术中,仅可以在第一查询操作符的分区属性与第二查询操作符的分区属性完全相同时,才能够删除第二查询操作符的分区操作符相比,当第一查询操作符的分区属性为第二查询操作符的分区属性的前缀时,改写第二查询操作符的分区属性可以获得更多的分区属性完全相同的满足前驱-后继关系的查询操作符对(如第一查询操作符和第二查询操作符),进而可以更大程度的减少构成物理查询计划的物理查询任务的数量。Compared with the prior art, the partition operator of the second query operator can be deleted only when the partition attribute of the first query operator is exactly the same as that of the second query operator. When the partition attribute of the second query operator is the prefix of the partition attribute of the second query operator, rewriting the partition attribute of the second query operator can obtain more query operator pairs that satisfy the predecessor-successor relationship (such as the first query operator and the second query operator), which can further reduce the number of physical query tasks that constitute the physical query plan.
并且通过本方案,当第一查询操作符是第二查询操作符的间接前驱操作符,即第一查询操作符与第二查询操作符之间间隔有第三查询操作符,第三查询操作符能够使用广播查询算法实现时,可以将第三查询操作符改写为广播查询操作符,以便于可以在第二查询操作符的分区属性与第一查询操作符的分区属性相同时,删除第二查询操作符的分区操作符,将第一查询操作符、第二查询操作符和广播查询操作符生成一个物理查询任务,减少构成物理查询计划的物理查询任务的数量。And through this solution, when the first query operator is an indirect precursor operator of the second query operator, that is, there is a third query operator between the first query operator and the second query operator, and the third query operator When the broadcast query algorithm can be used, the third query operator can be rewritten as a broadcast query operator, so that when the partition attribute of the second query operator is the same as the partition attribute of the first query operator, the second query can be deleted. The partition operator of the operator generates a physical query task from the first query operator, the second query operator and the broadcast query operator, thereby reducing the number of physical query tasks constituting the physical query plan.
与现有技术中,仅可以在第一查询操作符是第二查询操作符的直接前驱操作符时,才能够删除第二查询操作符的分区操作符,使分区属性完全相同的第一查询操作符与第二查询操作符生成一个物理查询任务相比,当第一查询操作符与第二查询操作符之间间隔有第三查询操作符时,将第三查询操作符改写为广播查询操作符,以使广播查询操作符与分区属性相同的第一查询操作符和第二查询操作符可以生成一个物理查询任务,进而可以更大程度的减少构成物理查询计划的物理查询任务的数量。As in the prior art, the partition operator of the second query operator can be deleted only when the first query operator is the direct predecessor operator of the second query operator, so that the first query operation with the same partition attributes Compared with the second query operator to generate a physical query task, when there is a third query operator between the first query operator and the second query operator, the third query operator is rewritten as a broadcast query operator , so that the broadcast query operator and the second query operator with the same partition attribute can generate one physical query task, thereby reducing the number of physical query tasks constituting the physical query plan to a greater extent.
需要说明的是,本发明实施例中提供的逻辑查询计划实例只是示例性的几种,起到说明的作用,并不代表本发明实施例只可以应用于这几种逻辑查询计划实例中。It should be noted that the logical query plan instances provided in the embodiments of the present invention are merely exemplary, and serve for illustrative purposes, and do not mean that the embodiments of the present invention can only be applied to these logical query plan instances.
实施例3Example 3
本发明实施例提供一种查询计划转化装置,如图8所示,包括:提取单元31、第一改写单元32、删除单元33和生成单元34。An embodiment of the present invention provides a query plan conversion device, as shown in FIG. 8 , including: an extraction unit 31 , a first rewrite unit 32 , a deletion unit 33 and a generation unit 34 .
提取单元31,用于从逻辑查询计划中提取第一查询操作符和第二查询操作符,第一查询操作符为第二查询操作符的前驱操作符。The extraction unit 31 is configured to extract a first query operator and a second query operator from the logical query plan, where the first query operator is a precursor operator of the second query operator.
第一改写单元32,用于若提取单元31提取的第一查询操作符的分区属性为提取单元提取的第二查询操作符的分区属性的前缀,则改写逻辑查询计划中第二查询操作符的分区属性,以使得第二查询操作符的分区属性与第一查询操作符的分区属性相同。The first rewriting unit 32 is configured to rewrite the partition attribute of the second query operator in the logical query plan if the partition attribute of the first query operator extracted by the extracting unit 31 is the prefix of the partition attribute of the second query operator extracted by the extracting unit. Partition attributes so that the partition attributes of the second query operator are the same as the partition attributes of the first query operator.
删除单元33,用于在第一改写单元32所述逻辑查询计划中第二查询操作符的分区属性后,从逻辑查询计划中删除第二查询操作符的分区操作符。The deletion unit 33 is configured to delete the partition operator of the second query operator from the logical query plan after rewriting the partition attribute of the second query operator in the logical query plan described by the first rewriting unit 32 .
生成单元34,用于在删除单元33删除第二查询操作符的分区操作符后,根据第二查询操作符、第一查询操作符以及第一查询操作符的分区操作符生成一个物理查询任务,以构成物理查询计划。The generating unit 34 is configured to generate a physical query task according to the second query operator, the first query operator and the partition operator of the first query operator after the deletion unit 33 deletes the partition operator of the second query operator, to form a physical query plan.
进一步的,如图9所示,所述第一改写单元32,可以包括:第一改写模块321和第二改写模块322。Further, as shown in FIG. 9 , the first rewriting unit 32 may include: a first rewriting module 321 and a second rewriting module 322 .
第一改写模块321,用于若第一查询操作符的分区属性为第二查询操作符的分区属性的前缀,且第一查询操作符与第二查询操作符之间间隔有第三查询操作符,所述第三查询操作符能够使用广播查询算法实现,则将第三查询操作符改写为广播查询操作符,广播查询操作符没有分区操作符。The first rewriting module 321 is used for if the partition attribute of the first query operator is a prefix of the partition attribute of the second query operator, and a third query operator is spaced between the first query operator and the second query operator , the third query operator can be implemented using a broadcast query algorithm, then the third query operator is rewritten as a broadcast query operator, and the broadcast query operator does not have a partition operator.
第二改写模块322,用于改写逻辑查询计划中第二查询操作符的分区属性,以使得第二查询操作符的分区属性与第一查询操作符的分区属性相同。The second rewriting module 322 is configured to rewrite the partition attribute of the second query operator in the logical query plan, so that the partition attribute of the second query operator is the same as the partition attribute of the first query operator.
所述生成单元34,还用于根据第二查询操作符、广播查询操作符、所述第一查询操作符以及第一查询操作符的分区操作符生成一个物理查询任务,以构成物理查询计划。The generating unit 34 is further configured to generate a physical query task according to the second query operator, the broadcast query operator, the first query operator and the partition operator of the first query operator to form a physical query plan.
其中,第一查询操作符为第三查询操作符的直接前驱操作符,且第三查询操作符为第二查询操作符的直接前驱操作符。The first query operator is the direct predecessor operator of the third query operator, and the third query operator is the direct predecessor operator of the second query operator.
进一步的,如图10所示,本发明实施例提供的查询计划转化装置,还可以包括:第二改写单元35。Further, as shown in FIG. 10 , the query plan conversion apparatus provided by the embodiment of the present invention may further include: a second rewriting unit 35 .
第二改写单元35,用于若提取单元31提取的第一查询操作符的分区属性与提取单元31提取的第二查询操作符的分区属性相同,且第一查询操作符与第二查询操作符之间间隔有第三查询操作符,所述第三查询操作符能够使用广播查询算法实现,则将第三查询操作符改写为广播查询操作符,广播查询操作符没有分区操作符。The second rewriting unit 35 is used for if the partition attribute of the first query operator extracted by the extraction unit 31 is the same as the partition attribute of the second query operator extracted by the extraction unit 31, and the first query operator and the second query operator There is a third query operator in between, and the third query operator can be implemented using a broadcast query algorithm, then the third query operator is rewritten as a broadcast query operator, and the broadcast query operator has no partition operator.
所述删除单元33,还用于在第二改写单元35将第三查询操作符改写为广播查询操作符后,从逻辑查询计划中删除第二查询操作符的分区操作符。The deleting unit 33 is further configured to delete the partition operator of the second query operator from the logical query plan after the second rewriting unit 35 rewrites the third query operator as a broadcast query operator.
所述生成单元34,还用于在所述删除单元33删除第二查询操作符的分区操作符后,根据第二查询操作符、广播查询操作符、第一查询操作符以及第一查询操作符的分区操作符生成一个物理查询任务,以构成物理查询计划。The generating unit 34 is further configured to, after the deleting unit 33 deletes the partition operator of the second query operator, according to the second query operator, the broadcast query operator, the first query operator and the first query operator The partition operator generates a physical query task to form the physical query plan.
其中,第一查询操作符为第三查询操作符的直接前驱操作符,且第三查询操作符为第二查询操作符的直接前驱操作符。The first query operator is the direct predecessor operator of the third query operator, and the third query operator is the direct predecessor operator of the second query operator.
进一步的,生成单元34,具体用于采用任务流关联性优化JFC技术,根据第二查询操作符、第一查询操作符以及第一查询操作符的分区操作符生成一个物理查询任务,以构成物理查询计划。Further, the generating unit 34 is specifically configured to use the task flow correlation optimization JFC technology to generate a physical query task according to the second query operator, the first query operator and the partition operator of the first query operator, to form a physical query task. query plan.
本发明实施例提供的查询计划转化装置,若第一查询操作符的分区属性为第二查询操作符的分区属性的前缀,便可以通过改写逻辑查询计划中第二查询操作符的分区属性,以便于可以在第二查询操作符的分区属性与第一查询操作符的分区属性相同时,删除第二查询操作符的分区操作符,以根据第一查询操作符和第二查询操作符生成一个物理查询任务,减少构成物理查询计划的物理查询任务的数量。In the query plan conversion device provided by the embodiment of the present invention, if the partition attribute of the first query operator is a prefix of the partition attribute of the second query operator, the partition attribute of the second query operator in the logical query plan can be rewritten so as to When the partition attribute of the second query operator is the same as the partition attribute of the first query operator, the partition operator of the second query operator can be deleted to generate a physical query operator according to the first query operator and the second query operator. Query tasks, reducing the number of physical query tasks that make up the physical query plan.
与现有技术中,仅可以在第一查询操作符的分区属性与第二查询操作符的分区属性完全相同时,才能够删除第二查询操作符的分区操作符相比,当第一查询操作符的分区属性为第二查询操作符的分区属性的前缀时,改写第二查询操作符的分区属性可以获得更多的分区属性完全相同的满足前驱-后继关系的查询操作符对(如第一查询操作符和第二查询操作符),进而可以更大程度的减少构成物理查询计划的物理查询任务的数量。Compared with the prior art, the partition operator of the second query operator can be deleted only when the partition attribute of the first query operator is exactly the same as that of the second query operator. When the partition attribute of the second query operator is the prefix of the partition attribute of the second query operator, rewriting the partition attribute of the second query operator can obtain more query operator pairs that satisfy the predecessor-successor relationship (such as the first query operator and the second query operator), which can further reduce the number of physical query tasks that constitute the physical query plan.
并且通过本方案,当第一查询操作符是第二查询操作符的间接前驱操作符,即第一查询操作符与第二查询操作符之间间隔有第三查询操作符,第三查询操作符能够使用广播查询算法实现时,可以将第三查询操作符改写为广播查询操作符,以便于可以在第二查询操作符的分区属性与第一查询操作符的分区属性相同时,删除第二查询操作符的分区操作符,将第一查询操作符、第二查询操作符和广播查询操作符生成一个物理查询任务,减少构成物理查询计划的物理查询任务的数量。And through this solution, when the first query operator is an indirect precursor operator of the second query operator, that is, there is a third query operator between the first query operator and the second query operator, and the third query operator When the broadcast query algorithm can be used, the third query operator can be rewritten as a broadcast query operator, so that when the partition attribute of the second query operator is the same as the partition attribute of the first query operator, the second query can be deleted. The partition operator of the operator generates a physical query task from the first query operator, the second query operator and the broadcast query operator, thereby reducing the number of physical query tasks constituting the physical query plan.
与现有技术中,仅可以在第一查询操作符是第二查询操作符的直接前驱操作符时,才能够删除第二查询操作符的分区操作符,使分区属性完全相同的第一查询操作符与第二查询操作符生成一个物理查询任务相比,当第一查询操作符与第二查询操作符之间间隔有第三查询操作符时,将第三查询操作符改写为广播查询操作符,以使广播查询操作符与分区属性相同的第一查询操作符和第二查询操作符可以生成一个物理查询任务,进而可以更大程度的减少构成物理查询计划的物理查询任务的数量。As in the prior art, the partition operator of the second query operator can be deleted only when the first query operator is the direct predecessor operator of the second query operator, so that the first query operation with the same partition attributes Compared with the second query operator to generate a physical query task, when there is a third query operator between the first query operator and the second query operator, the third query operator is rewritten as a broadcast query operator , so that the broadcast query operator and the second query operator with the same partition attribute can generate one physical query task, thereby reducing the number of physical query tasks constituting the physical query plan to a greater extent.
实施例4Example 4
本发明实施例提供一种查询计划转化装置,如图11所示,包括:存储器41和处理器42;所述存储器41与所述处理器42连接。An embodiment of the present invention provides a query plan conversion apparatus, as shown in FIG. 11 , including: a memory 41 and a processor 42 ; the memory 41 is connected to the processor 42 .
所述存储器41,用于存储一组程序代码,所述存储器41为所述查询计划转化装置的计算机存储介质,所述计算机存储介质包括:非易失性存储介质。The memory 41 is used to store a set of program codes, and the memory 41 is a computer storage medium of the query plan transformation apparatus, and the computer storage medium includes: a non-volatile storage medium.
所述处理器42,用于执行所述存储器41存储所述程序代码,并具体用于执行以下操作:从逻辑查询计划中提取第一查询操作符和第二查询操作符,所述第一查询操作符为所述第二查询操作符的前驱操作符;若所述第一查询操作符的分区属性为所述第二查询操作符的分区属性的前缀,则改写所述逻辑查询计划中所述第二查询操作符的分区属性,以使得所述第二查询操作符的分区属性与所述第一查询操作符的分区属性相同;从所述逻辑查询计划中删除所述第二查询操作符的分区操作符,并根据所述第二查询操作符、所述第一查询操作符以及所述第一查询操作符的分区操作符生成一个物理查询任务,以构成物理查询计划。The processor 42 is configured to execute the program code stored in the memory 41, and is specifically configured to perform the following operations: extracting a first query operator and a second query operator from a logical query plan, the first query The operator is the predecessor operator of the second query operator; if the partition attribute of the first query operator is the prefix of the partition attribute of the second query operator, rewrite the partition attribute of the second query operator, so that the partition attribute of the second query operator is the same as the partition attribute of the first query operator; delete the partition attribute of the second query operator from the logical query plan A partition operator is used, and a physical query task is generated according to the second query operator, the first query operator, and the partition operator of the first query operator, so as to form a physical query plan.
所述存储器41和所述处理器42通过总线连接并完成相互间的通信。The memory 41 and the processor 42 are connected through a bus and communicate with each other.
其中,所述总线可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component Interconnect,PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used in FIG. 11, but it does not mean that there is only one bus or one type of bus.
进一步的,所述处理器42,还用于若所述第一查询操作符的分区属性为所述第二查询操作符的分区属性的前缀,且所述第一查询操作符与所述第二查询操作符之间间隔有第三查询操作符,所述第三查询操作符能够使用广播查询算法实现,则将所述第三查询操作符改写为广播查询操作符,所述广播查询操作符没有分区操作符;改写所述逻辑查询计划中所述第二查询操作符的分区属性,以使得所述第二查询操作符的分区属性与所述第一查询操作符的分区属性相同;从所述逻辑查询计划中删除所述第二查询操作符的分区操作符,并根据所述第二查询操作符、所述广播查询操作符、所述第一查询操作符以及所述第一查询操作符的分区操作符生成一个物理查询任务,以构成所述物理查询计划。Further, the processor 42 is further configured to, if the partition attribute of the first query operator is a prefix of the partition attribute of the second query operator, and the first query operator and the second query operator A third query operator is spaced between query operators, and the third query operator can be implemented using a broadcast query algorithm, then the third query operator is rewritten as a broadcast query operator, and the broadcast query operator does not have partition operator; rewrite the partition attribute of the second query operator in the logical query plan, so that the partition attribute of the second query operator is the same as the partition attribute of the first query operator; from the The partition operator of the second query operator is deleted from the logical query plan, and according to the second query operator, the broadcast query operator, the first query operator and the first query operator The partition operator generates a physical query task to compose the physical query plan.
其中,所述第一查询操作符为所述第三查询操作符的前驱操作符,且所述第三查询操作符为所述第二查询操作符的前驱操作符。The first query operator is a predecessor operator of the third query operator, and the third query operator is a predecessor operator of the second query operator.
进一步的,所述处理器42,还用于若所述第一查询操作符的分区属性与所述第二查询操作符的分区属性相同,且所述第一查询操作符与所述第二查询操作符之间间隔有第三查询操作符,所述第三查询操作符能够使用广播查询算法实现,则将所述第三查询操作符改写为广播查询操作符,所述广播查询操作符没有分区操作符;从所述逻辑查询计划中删除所述第二查询操作符的分区操作符,并根据所述第二查询操作符、所述广播查询操作符、所述第一查询操作符以及所述第一查询操作符的分区操作符生成一个物理查询任务,以构成所述物理查询计划。Further, the processor 42 is further configured to, if the partition attribute of the first query operator is the same as the partition attribute of the second query operator, and the first query operator and the second query There is a third query operator spaced between the operators, and the third query operator can be implemented using a broadcast query algorithm, then the third query operator is rewritten as a broadcast query operator, and the broadcast query operator has no partition operator; delete the partition operator of the second query operator from the logical query plan, and use the second query operator, the broadcast query operator, the first query operator, and the The partition operator of the first query operator generates a physical query task to form the physical query plan.
其中,所述第一查询操作符为所述第三查询操作符的前驱操作符,且所述第三查询操作符为所述第二查询操作符的前驱操作符。The first query operator is a predecessor operator of the third query operator, and the third query operator is a predecessor operator of the second query operator.
进一步的,所述处理器42,还用于在从所述逻辑查询计划中删除所述第二查询操作符的分区操作符之前,改写所述第一查询操作符的排序属性,以使得所述第一查询操作符的排序属性与所述第二查询操作符的排序属性相同。Further, the processor 42 is further configured to rewrite the sorting attribute of the first query operator before deleting the partition operator of the second query operator from the logical query plan, so that the The ordering property of the first query operator is the same as the ordering property of the second query operator.
其中,改写前的所述第一查询操作符的排序属性与第一查询操作符的分区属性相同,所述第二查询操作符的排序属性与改写前的所述第二查询操作符的分区属性相同;所述排序属性用于对所述逻辑查询计划的所述查询操作符所操作的数据表中的数据进行分区排序。Wherein, the sorting attribute of the first query operator before rewriting is the same as the partition attribute of the first query operator, and the sorting attribute of the second query operator is the same as the partitioning attribute of the second query operator before the rewriting The same; the sorting attribute is used to perform partition sorting on the data in the data table operated by the query operator of the logical query plan.
进一步的,所述处理器42,还用于采用任务流关联性优化JFC技术,根据所述第二查询操作符、所述第一查询操作符以及所述第一查询操作符的分区操作符生成一个物理查询任务,以构成所述物理查询计划。Further, the processor 42 is further configured to adopt the task flow dependency optimization JFC technology, and generate the generated data according to the second query operator, the first query operator and the partition operator of the first query operator. A physical query task to compose the physical query plan.
需要说明的是,本发明实施例提供的查询计划转化装置中部分功能模块的具体描述可以参考方法实施例中的对应内容,本实施例这里不再详细赘述。It should be noted that, for the specific description of some functional modules in the query plan conversion device provided in the embodiment of the present invention, reference may be made to the corresponding content in the method embodiment, which will not be described in detail in this embodiment.
本发明实施例提供的查询计划转化装置,若第一查询操作符的分区属性为第二查询操作符的分区属性的前缀,便可以通过改写逻辑查询计划中第二查询操作符的分区属性,以便于可以在第二查询操作符的分区属性与第一查询操作符的分区属性相同时,删除第二查询操作符的分区操作符,以根据第一查询操作符和第二查询操作符生成一个物理查询任务,减少构成物理查询计划的物理查询任务的数量。In the query plan conversion device provided by the embodiment of the present invention, if the partition attribute of the first query operator is a prefix of the partition attribute of the second query operator, the partition attribute of the second query operator in the logical query plan can be rewritten so as to When the partition attribute of the second query operator is the same as the partition attribute of the first query operator, the partition operator of the second query operator can be deleted to generate a physical query operator according to the first query operator and the second query operator. Query tasks, reducing the number of physical query tasks that make up the physical query plan.
与现有技术中,仅可以在第一查询操作符的分区属性与第二查询操作符的分区属性完全相同时,才能够删除第二查询操作符的分区操作符相比,当第一查询操作符的分区属性为第二查询操作符的分区属性的前缀时,改写第二查询操作符的分区属性可以获得更多的分区属性完全相同的满足前驱-后继关系的查询操作符对(如第一查询操作符和第二查询操作符),进而可以更大程度的减少构成物理查询计划的物理查询任务的数量。Compared with the prior art, the partition operator of the second query operator can be deleted only when the partition attribute of the first query operator is exactly the same as that of the second query operator. When the partition attribute of the second query operator is the prefix of the partition attribute of the second query operator, rewriting the partition attribute of the second query operator can obtain more query operator pairs that satisfy the predecessor-successor relationship (such as the first query operator and the second query operator), which can further reduce the number of physical query tasks that constitute the physical query plan.
并且通过本方案,当第一查询操作符是第二查询操作符的间接前驱操作符,即第一查询操作符与第二查询操作符之间间隔有第三查询操作符,第三查询操作符能够使用广播查询算法实现时,可以将第三查询操作符改写为广播查询操作符,以便于可以在第二查询操作符的分区属性与第一查询操作符的分区属性相同时,删除第二查询操作符的分区操作符,将第一查询操作符、第二查询操作符和广播查询操作符生成一个物理查询任务,减少构成物理查询计划的物理查询任务的数量。And through this solution, when the first query operator is an indirect precursor operator of the second query operator, that is, there is a third query operator between the first query operator and the second query operator, and the third query operator When the broadcast query algorithm can be used, the third query operator can be rewritten as a broadcast query operator, so that when the partition attribute of the second query operator is the same as the partition attribute of the first query operator, the second query can be deleted. The partition operator of the operator generates a physical query task from the first query operator, the second query operator and the broadcast query operator, thereby reducing the number of physical query tasks constituting the physical query plan.
与现有技术中,仅可以在第一查询操作符是第二查询操作符的直接前驱操作符时,才能够删除第二查询操作符的分区操作符,使分区属性完全相同的第一查询操作符与第二查询操作符生成一个物理查询任务相比,当第一查询操作符与第二查询操作符之间间隔有第三查询操作符时,将第三查询操作符改写为广播查询操作符,以使广播查询操作符与分区属性相同的第一查询操作符和第二查询操作符可以生成一个物理查询任务,进而可以更大程度的减少构成物理查询计划的物理查询任务的数量。As in the prior art, the partition operator of the second query operator can be deleted only when the first query operator is the direct predecessor operator of the second query operator, so that the first query operation with the same partition attributes Compared with the second query operator to generate a physical query task, when there is a third query operator between the first query operator and the second query operator, the third query operator is rewritten as a broadcast query operator , so that the broadcast query operator and the second query operator with the same partition attribute can generate one physical query task, thereby reducing the number of physical query tasks constituting the physical query plan to a greater extent.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。From the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be allocated as required. It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. For the specific working process of the system, apparatus and unit described above, reference may be made to the corresponding process in the foregoing method embodiments, and details are not described herein again.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed by the present invention. should be included within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410588240.8A CN105630789B (en) | 2014-10-28 | 2014-10-28 | A kind of inquiry plan method for transformation and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410588240.8A CN105630789B (en) | 2014-10-28 | 2014-10-28 | A kind of inquiry plan method for transformation and device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN105630789A CN105630789A (en) | 2016-06-01 |
| CN105630789B true CN105630789B (en) | 2019-07-12 |
Family
ID=56045743
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201410588240.8A Expired - Fee Related CN105630789B (en) | 2014-10-28 | 2014-10-28 | A kind of inquiry plan method for transformation and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN105630789B (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11599539B2 (en) | 2018-12-26 | 2023-03-07 | Palantir Technologies Inc. | Column lineage and metadata propagation |
| CN109902101B (en) * | 2019-02-18 | 2021-04-02 | 国家计算机网络与信息安全管理中心 | Transparent partitioning method and device based on spark SQL |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101436192A (en) * | 2007-11-16 | 2009-05-20 | 国际商业机器公司 | Method and apparatus for optimizing inquiry aiming at vertical storage type database |
| CN102323946A (en) * | 2011-09-05 | 2012-01-18 | 天津神舟通用数据技术有限公司 | Implementation method for operator reuse in parallel database |
| CN102831139A (en) * | 2011-03-25 | 2012-12-19 | 微软公司 | Co-range partition for query plan optimization and data-parallel programming model |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9251207B2 (en) * | 2007-11-29 | 2016-02-02 | Microsoft Technology Licensing, Llc | Partitioning and repartitioning for data parallel operations |
-
2014
- 2014-10-28 CN CN201410588240.8A patent/CN105630789B/en not_active Expired - Fee Related
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101436192A (en) * | 2007-11-16 | 2009-05-20 | 国际商业机器公司 | Method and apparatus for optimizing inquiry aiming at vertical storage type database |
| CN102831139A (en) * | 2011-03-25 | 2012-12-19 | 微软公司 | Co-range partition for query plan optimization and data-parallel programming model |
| CN102323946A (en) * | 2011-09-05 | 2012-01-18 | 天津神舟通用数据技术有限公司 | Implementation method for operator reuse in parallel database |
Also Published As
| Publication number | Publication date |
|---|---|
| CN105630789A (en) | 2016-06-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112269789B (en) | Method and device for storing data, and method and device for reading data | |
| US8396852B2 (en) | Evaluating execution plan changes after a wakeup threshold time | |
| JP7047228B2 (en) | Data query methods, devices, electronic devices, readable storage media, and computer programs | |
| CN107463693A (en) | A kind of data processing method, device, terminal and computer-readable recording medium | |
| CN110807002B (en) | A workflow-based report generation method, system, device and storage medium | |
| CN105740405B (en) | Method and apparatus for storing data | |
| CN107526746B (en) | Method and apparatus for managing document index | |
| CN104881466A (en) | Method and device for processing data fragments and deleting garbage files | |
| JP2014149564A (en) | Information processing apparatus, information processing method and program | |
| WO2017096892A1 (en) | Index construction method, search method, and corresponding device, apparatus, and computer storage medium | |
| CN112528067B (en) | Storage method, reading method, device and equipment of graph database | |
| CN107784030A (en) | A kind of method and device for handling Connection inquiring | |
| CN108205593A (en) | A kind of method and device of inquiry | |
| CN114443699A (en) | Information query method, apparatus, computer equipment, and computer-readable storage medium | |
| CN102932416B (en) | A kind of intermediate data storage method of information flow task, processing method and device | |
| CN111400301A (en) | Data query method, device and equipment | |
| CN105630789B (en) | A kind of inquiry plan method for transformation and device | |
| CN113407565B (en) | Cross-database data query method, device and equipment | |
| US9201937B2 (en) | Rapid provisioning of information for business analytics | |
| CN118138601A (en) | A data processing method, distributed storage system, device and storage medium | |
| JP5867208B2 (en) | Data model conversion program, data model conversion method, and data model conversion apparatus | |
| CN110727672A (en) | Data mapping relation query method and device, electronic equipment and readable medium | |
| CN118069024A (en) | Data storage method and device, storage medium and electronic equipment | |
| CN107844546A (en) | A kind of file system metadata management system and method | |
| CN115062044A (en) | A data query method, device, equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190712 Termination date: 20201028 |