CN114911817A

CN114911817A - A data processing method, device, electronic device and storage medium

Info

Publication number: CN114911817A
Application number: CN202210412172.4A
Authority: CN
Inventors: 米书杰; 孟海峰; 李云天; 马闯
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-04-19
Filing date: 2022-04-19
Publication date: 2022-08-16
Anticipated expiration: 2042-04-19
Also published as: CN114911817B

Abstract

The disclosure provides a data processing method and device, electronic equipment and a storage medium, and relates to the field of data processing, in particular to the field of databases. The specific implementation scheme is as follows: determining at least one target query statement meeting a specified condition from the at least one query statement; performing privacy removal processing on the at least one target query statement respectively to obtain at least one statement to be utilized; clustering the at least one sentence to be utilized according to the text content of the at least one sentence to be utilized to obtain at least one sentence cluster; and outputting a corresponding processing task aiming at each statement cluster in the at least one statement cluster. By the scheme, the user information can be considered not to be leaked, and the slow query statement can be efficiently processed.

Description

A data processing method, device, electronic device and storage medium

技术领域technical field

本公开涉及数据处理领域，尤其涉及数据库领域，具体涉及一种数据处理方法、装置、电子设备及存储介质。The present disclosure relates to the field of data processing, in particular to the field of databases, and in particular to a data processing method, apparatus, electronic device and storage medium.

背景技术Background technique

业务系统可以基于用户给定的查询信息，生成针对数据库的查询语句，例如SQL(Structured Query Language，结构化查询语言)语句，并将所生成的查询语句发送至数据库系统，以使得数据库系统进行数据查询。The business system can generate query statements for the database based on the query information given by the user, such as SQL (Structured Query Language, Structured Query Language) statements, and send the generated query statements to the database system, so that the database system can process data. Inquire.

但是，针对数据库查询而言，存在慢查询的问题，例如慢SQL(Structured QueryLanguage，结构化查询语言)问题。由于慢查询的执行时间过长，一直是影响数据库性能的重要因素，并且，极大影响用户的体验。However, for database query, there is a problem of slow query, for example, a problem of slow SQL (Structured QueryLanguage, structured query language). Because the execution time of slow queries is too long, it has always been an important factor affecting database performance, and greatly affects user experience.

发明内容SUMMARY OF THE INVENTION

本公开提供了一种数据处理方法、装置、电子设备以及存储介质。The present disclosure provides a data processing method, apparatus, electronic device, and storage medium.

根据本公开的一方面，提供了一种数据处理方法，包括：According to an aspect of the present disclosure, a data processing method is provided, comprising:

从至少一个查询语句中，确定符合指定条件的至少一个目标查询语句；其中，所述指定条件包括执行时长超过预定时长阈值；From at least one query statement, determine at least one target query statement that meets a specified condition; wherein, the specified condition includes that the execution duration exceeds a predetermined duration threshold;

分别对所述至少一个目标查询语句进行去隐私化处理，得到至少一个待利用语句；Perform deprivation processing on the at least one target query statement respectively to obtain at least one statement to be utilized;

根据所述至少一个待利用语句的文本内容，对所述至少一个待利用语句进行聚类，得到至少一个语句簇；Clustering the at least one sentence to be used according to the text content of the at least one sentence to be used to obtain at least one sentence cluster;

针对所述至少一个语句簇中的每一语句簇，输出对应的处理任务。For each sentence cluster in the at least one sentence cluster, a corresponding processing task is output.

根据本公开的第二方面，提供了一种数据处理装置，包括：According to a second aspect of the present disclosure, there is provided a data processing apparatus, comprising:

第一确定模块，用于从至少一个查询语句中，确定符合指定条件的至少一个目标查询语句；其中，所述指定条件包括执行时长超过预定时长阈值；a first determination module, configured to determine, from at least one query statement, at least one target query statement that meets a specified condition; wherein the specified condition includes that the execution duration exceeds a predetermined duration threshold;

第一处理模块，用于分别对所述至少一个目标查询语句进行去隐私化处理，得到至少一个待利用语句；a first processing module, configured to perform deprivation processing on the at least one target query statement to obtain at least one to-be-utilized statement;

聚类模块，用于根据所述至少一个待利用语句的文本内容，对所述至少一个待利用语句进行聚类，得到至少一个语句簇；a clustering module, configured to perform clustering on the at least one sentence to be used according to the text content of the at least one sentence to be used to obtain at least one sentence cluster;

输出模块，用于针对所述至少一个语句簇中的每一语句簇，输出对应的处理任务。The output module is configured to output a corresponding processing task for each sentence cluster in the at least one sentence cluster.

根据本公开的第三方面，提供了一种电子设备，包括：According to a third aspect of the present disclosure, there is provided an electronic device, comprising:

至少一个处理器；以及at least one processor; and

与所述至少一个处理器通信连接的存储器；其中，a memory communicatively coupled to the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行任一种数据处理方法。The memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the data processing methods.

根据本公开的第四方面，提供了一种存储有计算机指令的非瞬时计算机可读存储介质，其中，所述计算机指令用于使所述计算机执行任一种数据处理方法。According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to perform any data processing method.

根据本公开的第五方面，还提供了一种计算机程序产品，包括计算机程序，所述计算机程序在被处理器执行时实现任一种数据处理方法。According to a fifth aspect of the present disclosure, there is also provided a computer program product, comprising a computer program that, when executed by a processor, implements any one of the data processing methods.

应当理解，本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征，也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or critical features of embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.

附图说明Description of drawings

附图用于更好地理解本方案，不构成对本公开的限定。其中：The accompanying drawings are used for better understanding of the present solution, and do not constitute a limitation to the present disclosure. in:

图1是本公开实施例所提供的数据处理方法的流程图；1 is a flowchart of a data processing method provided by an embodiment of the present disclosure;

图2是本公开实施例所提供的数据处理方法的另一流程图；FIG. 2 is another flowchart of the data processing method provided by an embodiment of the present disclosure;

图3是本公开实施例所提供的针对一处理任务的处理界面示意图；3 is a schematic diagram of a processing interface for a processing task provided by an embodiment of the present disclosure;

图4是本公开实施例所提供的数据处理方法所适用的系统架构的示意图；4 is a schematic diagram of a system architecture to which the data processing method provided by an embodiment of the present disclosure is applicable;

图5是本公开实施例所提供的数据处理装置的结构示意图；5 is a schematic structural diagram of a data processing apparatus provided by an embodiment of the present disclosure;

图6是本公开实施例所提供的数据处理装置的另一结构示意图；6 is another schematic structural diagram of a data processing apparatus provided by an embodiment of the present disclosure;

图7是用来实现本公开实施例的数据处理方法的电子设备的框图。FIG. 7 is a block diagram of an electronic device used to implement the data processing method of an embodiment of the present disclosure.

具体实施方式Detailed ways

以下结合附图对本公开的示范性实施例做出说明，其中包括本公开实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本公开的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

业务系统作为面向用户的系统，可以基于用户给定的查询信息，生成针对数据库的查询语句，例如SQL(Structured Query Language，结构化查询语言)语句，并将所生成的查询语句发送至数据库系统，以使得数据库系统进行数据查询。示例性的，业务系统可以为文库平台，云存储平台、各类信息处理系统，等等。As a user-oriented system, the business system can generate query statements for the database based on the query information given by the user, such as SQL (Structured Query Language, Structured Query Language) statements, and send the generated query statements to the database system. In order to make the database system perform data query. Exemplarily, the business system may be a library platform, a cloud storage platform, various information processing systems, and so on.

但是，针对数据库查询而言，由于业务系统的代码内容不够优化等因素，可能导致慢查询的问题，例如慢SQL(Structured Query Language，结构化查询语言)问题。由于慢查询的执行时间过长，一直是影响数据库性能的重要因素，并且，极大影响用户的体验。并且，随着数据库构架的复杂化与海量数据的增加，慢查询带来的问题也日益突出：导致整个数据库集群的同步延迟、业务的主从延迟；主机及数据库资源消耗大，影响其他查询的性能，数据库吞吐能力下降；查询请求堆积，最终导致数据库业务响应慢，甚至完全无法响应，等等。However, for database query, due to factors such as insufficient optimization of the code content of the business system, a slow query problem may be caused, such as a slow SQL (Structured Query Language, structured query language) problem. Because the execution time of slow queries is too long, it has always been an important factor affecting database performance, and greatly affects user experience. In addition, with the complexity of database architecture and the increase of massive data, the problems brought by slow queries are becoming more and more prominent: leading to synchronization delay of the entire database cluster, master-slave delay of business; large consumption of host and database resources, affecting the performance of other queries. Performance, database throughput capacity declines; query requests pile up, which eventually leads to slow or even completely unresponsive database services, and so on.

当前存在对慢SQL的简易处理方式，但是只是对开发环境的慢SQL进行识别然后通知对应的处理人，从而进行相应的处理，并没有对线上的业务系统已经存在的慢SQL问题进行处理，并且由于识别出的慢SQL中可能包含用户信息，会导致用户信息泄露。There is currently a simple way to deal with slow SQL, but it only identifies the slow SQL in the development environment and then notifies the corresponding handler, so as to carry out corresponding processing, and does not deal with the existing slow SQL problem in the online business system. And since the identified slow SQL may contain user information, user information may be leaked.

基于上述问题，本公开实施例提供了一种数据处理方法、装置、电子设备以及存储介质，以实现兼顾用户信息不被泄露的同时，高效处理慢查询语句的目的。Based on the above problems, the embodiments of the present disclosure provide a data processing method, apparatus, electronic device, and storage medium, so as to achieve the purpose of efficiently processing slow query sentences while keeping user information from being leaked.

下面首先对本公开实施例提供的一种数据处理方法进行介绍。The following first introduces a data processing method provided by an embodiment of the present disclosure.

其中，本公开实施例提供的一种数据处理方法可以应用于电子设备，该电子设备可以为终端设备或服务器，本公开不对电子设备的具体形态进行限定。另外，本公开实施例所提供的一种数据处理方法可以应用于集群、分布式或其他任一含有慢查询语句处理需求的应用场景，本公开实施例对于具体场景不做限定。Wherein, a data processing method provided by an embodiment of the present disclosure may be applied to an electronic device, and the electronic device may be a terminal device or a server, and the present disclosure does not limit the specific form of the electronic device. In addition, a data processing method provided by an embodiment of the present disclosure may be applied to a cluster, a distributed application, or any other application scenario that requires processing a slow query statement, and the embodiment of the present disclosure does not limit the specific scenario.

另外，本公开实施例所提供的一种数据处理方法的执行主体可以为数据处理装置。示例性的，当该数据处理方法应用于终端设备时，该数据处理装置可以为运行于终端设备中的功能软件，例如：用于进行慢查询语句处理的工具软件；当然，该数据处理装置也可以为现有客户端中的插件，例如：用于进行软件处理的客户端中的插件。示例性的，当该数据处理方法应用于服务器时，该数据处理装置可以为运行于服务器中的计算机程序，该计算机程序可以用于发起慢查询语句处理。In addition, the execution body of the data processing method provided by the embodiment of the present disclosure may be a data processing apparatus. Exemplarily, when the data processing method is applied to a terminal device, the data processing device may be functional software running in the terminal device, such as tool software for processing slow query sentences; It can be a plug-in in an existing client, eg, a plug-in in a client for software processing. Exemplarily, when the data processing method is applied to a server, the data processing apparatus may be a computer program running in the server, and the computer program may be used to initiate slow query statement processing.

并且，本公开所涉及的慢查询是指查询时长超过预定时间阈值，相应的，慢查询语句为查询时长超过预定时长阈值的查询语句。其中，该预定时长阈值可以根据实际情况设定，例如：对于需要即时响应的查询场景而言，预定时长阈值可以例如0.5秒、1秒、2秒等等；而对于其他对于响应速度要求较低的场景而言，预定时长阈值可以例如0.5分钟、1分钟等等。Moreover, the slow query involved in the present disclosure refers to a query whose duration exceeds a predetermined time threshold, and correspondingly, a slow query statement is a query whose query duration exceeds the predetermined duration threshold. Wherein, the predetermined duration threshold can be set according to the actual situation, for example: for a query scenario that requires instant response, the predetermined duration threshold can be, for example, 0.5 seconds, 1 second, 2 seconds, etc.; for others, the response speed requirements are lower For the scenario of , the predetermined duration threshold may be, for example, 0.5 minutes, 1 minute, and so on.

另外，示例性的，针对采用SQL语句进行查询的数据库而言，所谓的慢查询的问题，可以称为慢SQL的问题。本公开并不对慢查询所利用的查询语句的类型进行具体限定。In addition, exemplarily, for a database that uses SQL statements to query, the so-called slow query problem may be referred to as the slow SQL problem. The present disclosure does not specifically limit the types of query statements utilized by the slow query.

本公开实施例提供的一种数据处理方法，可以包括如下步骤：A data processing method provided by an embodiment of the present disclosure may include the following steps:

本方案中，在确定出至少一个查询语句中的、符合指定条件的至少一个目标查询语句后，通过对至少一个目标查询语句进行去隐私化处理，使得信息避免被泄露；并且，考虑到去隐私化处理后的至少一个待利用语句中包含具有共性内容的语句，因此，对至少一个待利用语句进行聚类，并按照聚类得到的至少一个语句簇中的每一语句簇分别输出处理任务，这样，相对于为每个目标查询语句设置处理任务而言，处理任务的数量大大降低。可见，通过本方案可以在兼顾用户信息不被泄露的同时，高效处理慢查询语句。In this solution, after at least one target query statement in at least one query statement that meets the specified conditions is determined, the at least one target query statement is deprived of privacy processing, so that information is prevented from being leaked; and, considering the deprivation of privacy The at least one sentence to be used after the transformation process contains a sentence with common content, therefore, the at least one sentence to be used is clustered, and the processing task is output according to each sentence cluster in the at least one sentence cluster obtained by the clustering, In this way, the number of processing tasks is greatly reduced compared to setting processing tasks for each target query statement. It can be seen that through this solution, slow query statements can be efficiently processed while taking into account that user information is not leaked.

下面结合附图，对本公开所提供的一种数据处理方法进行示例性介绍。A data processing method provided by the present disclosure will be exemplarily introduced below with reference to the accompanying drawings.

如图1所示，本公开所提供的一种数据处理方法，可以包括以下步骤：As shown in FIG. 1, a data processing method provided by the present disclosure may include the following steps:

S101：从至少一个查询语句中，确定符合指定条件的至少一个目标查询语句；S101: From at least one query statement, determine at least one target query statement that meets a specified condition;

其中，所述指定条件包括执行时长超过预定时长阈值；Wherein, the specified condition includes that the execution duration exceeds a predetermined duration threshold;

本公开实施例中，目标业务系统运行时会产生日志数据，日志数据中包括至少一个查询语句；那么，可以从至少一个查询语句中，确定符合指定条件的至少一个目标查询语句，从而对符合指定条件的至少一个目标查询语句进行后续的处理。其中，所述指定条件包括执行时长超过预定时长阈值，相应的，目标查询语句可以为慢查询语句，另外，在实际应用中，慢查询语句也可以称为具有慢查询状态的语句或者存在慢查询问题的语句。示例性的，业务系统可以为文库平台，云存储平台、各类信息处理系统，等等。In the embodiment of the present disclosure, log data is generated when the target business system is running, and the log data includes at least one query statement; then, at least one target query statement that meets the specified condition can be determined from the at least one query statement, so that the target query statement that meets the specified condition can be determined. At least one target query statement of the condition undergoes subsequent processing. Wherein, the specified condition includes that the execution time exceeds a predetermined time length threshold. Correspondingly, the target query statement may be a slow query statement. In addition, in practical applications, the slow query statement may also be referred to as a statement with a slow query state or a slow query statement. question statement. Exemplarily, the business system may be a library platform, a cloud storage platform, various information processing systems, and so on.

其中，从至少一个查询语句中，确定符合指定条件的至少一个目标查询语句的具体实现方式，本公开实施例并不做限定，任一种能够识别查询语句是否符合指定条件的方式，均可以应用于本公开实施例。示例性的，从至少一个查询语句中，确定符合指定条件的至少一个目标查询语句的方式可以包括：获取至少一个查询语句；针对每一查询语句，若基于该查询语句的执行时长和/或EXPLAIN结果，识别出该查询语句符合指定条件，将该查询语句确定为目标查询语句。其中，若基于该查询语句的执行时长，识别该查询语句是否符合指定条件，则可以预先设置预定时长阈值，在对该查询语句进行识别时，将该查询语句的执行时长与预定时长阈值进行比对，若超过预定时长阈值，则判定该查询语句符合指定条件；若基于该查询语句的EXPLAIN结果，识别该查询语句是否符合指定条件，则可以对该查询语句执行EXPLAIN命令(用于对查询语句的查询执行计划进行检测)，得到语法分析结果，即EXPLAIN结果，并对EXPLAIN结果进行分数评估，从而基于评分分数是否满足针对慢查询所设置的分数条件，即基于评分分数是否满足针对指定条件所设置的分数条件，来确定该查询语句是否符合指定条件；而若基于该查询语句的执行时长和EXPLAIN结果，识别该查询语句是否符合指定条件，则可以在执行时长超过预定时长阈值，且评估分数满足针对慢查询所设置的分数条件时，判定该查询语句符合指定条件。可以理解的是，EXPLAIN结果可以包括多列内容：查询类型(select_type)、扫描的方式(type)、扫描的行数(rows)、实际使用的索引(key)等等，可以对影响查询效率的列的不同取值设置不同评估分数，从而基于总的评估分数来识别是否符合指定条件。The specific implementation manner of determining at least one target query statement that meets the specified condition from the at least one query statement is not limited in the embodiment of the present disclosure, and any method that can identify whether the query statement meets the specified condition can be applied. in the embodiments of the present disclosure. Exemplarily, from at least one query statement, the method of determining at least one target query statement that meets the specified condition may include: obtaining at least one query statement; for each query statement, if based on the execution time and/or EXPLAIN of the query statement As a result, it is recognized that the query statement meets the specified condition, and the query statement is determined as the target query statement. Wherein, if it is determined whether the query statement meets the specified condition based on the execution time of the query statement, a predetermined duration threshold can be preset, and when the query statement is identified, the execution duration of the query statement is compared with the predetermined duration threshold. Yes, if it exceeds the predetermined duration threshold, it is determined that the query statement meets the specified conditions; if it is identified based on the EXPLAIN result of the query statement whether the query statement meets the specified conditions, the EXPLAIN command (used to query the query statement) can be executed. Check the query execution plan), get the grammatical analysis result, that is, the EXPLAIN result, and evaluate the EXPLAIN result based on whether the score meets the score conditions set for the slow query, that is, whether the score meets the specified conditions. The set score conditions are used to determine whether the query statement meets the specified conditions; and if the query statement meets the specified conditions based on the execution time of the query statement and the EXPLAIN result, it can be executed when the execution time exceeds the predetermined duration threshold and the score is evaluated. When the score condition set for the slow query is satisfied, it is determined that the query statement meets the specified condition. It can be understood that the EXPLAIN result can include multiple columns: query type (select_type), scan method (type), number of rows scanned (rows), index actually used (key), etc., which can affect the query efficiency. Different values of the column set different evaluation scores, so as to identify whether the specified conditions are met based on the total evaluation score.

另外，在实际应用时，考虑到各种运行环境下，目标业务系统都可以调用数据库系统从而产生日志数据，而所产生的日志数据中包括至少一个查询语句，因此，可以灵活的选定所需的运行环境的至少一个查询语句，进行指定运行环境下的慢查询的处理。那么，从至少一个查询语句中，确定符合指定条件的至少一个目标查询语句的方式可以包括：从指定运行环境下产生的至少一个查询语句中，确定符合指定条件的至少一个目标查询语句；其中，所述指定运行环境包括：研发环境、测试环境、异构环境和生产环境中的至少一个环境。需要说明的是，本公开实施例可以对指定运行环境下产生的日志数据进行采集，可以是实时采集，也可以定时采集等，这都是合理的；所谓研发环境，即开发环境，此时为对代码层面的开发，或编程；所谓测试环境，即对研发环境所产生的代码内容进行测试的环境；所谓异构环境，也称准入环境，即为测试环境的进一步测试，此时是完全模拟生产环境，对研发环境产生的代码内容进行进一步测试；生产环境，此时所开发的代码内容已经上线，供用户使用。In addition, in practical applications, considering that under various operating environments, the target business system can call the database system to generate log data, and the generated log data includes at least one query statement. Therefore, the desired business system can be flexibly selected. At least one query statement of the running environment, to process the slow query in the specified running environment. Then, the method of determining at least one target query statement that meets the specified condition from the at least one query statement may include: from at least one query statement generated under the specified operating environment, determining at least one target query statement that meets the specified condition; wherein, The specified operating environment includes at least one of a research and development environment, a test environment, a heterogeneous environment, and a production environment. It should be noted that, in this embodiment of the present disclosure, log data generated in a specified operating environment can be collected, which can be real-time or scheduled, which is reasonable; the so-called R&D environment, that is, the development environment, is The development or programming at the code level; the so-called test environment is the environment for testing the code content generated by the R&D environment; the so-called heterogeneous environment, also known as the access environment, is the further testing of the test environment, which is completely Simulate the production environment, and further test the code content generated in the R&D environment; in the production environment, the code content developed at this time has been launched for users to use.

基于上述实现方式，可以灵活地对任一指定运行环境下的符合指定条件的目标查询语句进行处理，最终实现对业务代码进行灵活优化。并且，可以选择多个运行环境，从而层层拦截以及优化慢查询，达到全面处理的效果。Based on the above implementation manner, the target query statement that meets the specified conditions in any specified operating environment can be flexibly processed, and finally the business code can be flexibly optimized. Moreover, multiple operating environments can be selected, so as to intercept and optimize slow queries layer by layer to achieve the effect of comprehensive processing.

需要说明的是，上述对指定运行环境、确定符合指定条件的各个目标查询语句的方式的说明仅仅作为示例，并不应构成对本公开的限定。It should be noted that the above description of the manner of specifying the operating environment and determining each target query statement that meets the specified conditions is only an example, and should not constitute a limitation of the present disclosure.

S102：分别对至少一个目标查询语句进行去隐私化处理，得到至少一个待利用语句；S102: Perform deprivation processing on at least one target query statement to obtain at least one statement to be utilized;

其中，每一目标查询语句中都可能包含有用户信息相关的内容，此时，为了保证信息的安全以及保护个人隐私，可以对该目标查询语句进行去隐私化处理，也就是对目标查询语句中的用户信息进行去隐私化。可以理解的是，通过对目标查询语句中的用户信息进行去隐私化，可以实现对目标查询语句的数据脱敏。Among them, each target query statement may contain content related to user information. At this time, in order to ensure the security of information and protect personal privacy, the target query statement can be deprived, that is, the target query statement can be deprived. de-private user information. It can be understood that by depriving the user information in the target query sentence, data desensitization of the target query sentence can be realized.

示例性的，对该目标查询语句进行去隐私化处理的方式可以为：关键信息隐藏，或关键信息替换，其中，这里的关键信息即为需要去隐私化的用户信息。例如：目标查询语句中包含有个人姓名、住址等的信息，则可以将这些信息隐藏，可以不显示这部分内容，也可以将这部分内容替换为固定字符，例如“？或*”。Exemplarily, the method of depriving the target query statement may be: hiding key information, or replacing key information, where the key information here is user information that needs to be deprived. For example, if the target query statement contains personal name, address, etc., this information can be hidden, this part of the content can be not displayed, or this part of the content can be replaced with fixed characters, such as "? or *".

另外，可以预先确定出涉及到用户信息的目标字段，在进行去隐私化时，将目标字段对应的字段内容进行去隐私化。例如：一慢查询语句为SELECT*FROM Persons WHEREcity＝‘Beijing’；预先设定目标字段为：FROM，WHERE，则去隐私化处理后为SELECT*FROM？WHERE％？＝‘？’。In addition, the target field related to the user information can be determined in advance, and when deprivation is performed, the field content corresponding to the target field is deprived. For example: a slow query statement is SELECT*FROM Persons WHEREcity='Beijing'; the preset target fields are: FROM, WHERE, then after deprivation processing, it is SELECT*FROM? WHERE%? ='? ’.

需要强调的是，上述的对目标查询语句进行去隐私化处理的具体实现方式仅仅作为示例，并不应该构成对本公开实施例的限定。另外，本公开的技术方案中，所涉及的用户信息的收集、存储、使用、加工、传输、提供和公开等处理，均符合相关法律法规的规定，且不违背公序良俗。It should be emphasized that the above-mentioned specific implementation manner of depriving the target query statement is only an example, and should not constitute a limitation to the embodiments of the present disclosure. In addition, in the technical solutions of the present disclosure, the collection, storage, use, processing, transmission, provision, and disclosure of user information involved are all in compliance with relevant laws and regulations, and do not violate public order and good customs.

S103：根据至少一个待利用语句的文本内容，对至少一个待利用语句进行聚类，得到至少一个语句簇；S103: According to the text content of the at least one sentence to be used, cluster the at least one sentence to be used to obtain at least one sentence cluster;

由于查询语句是将具体的查询信息赋予到某类查询类型的语句中后所形成的的语句，而查询信息通常涉及到用户信息，那么，为了防止用户信息泄露，并且考虑到去隐私化处理后的至少一个目标查询语句中包含具有共性内容的语句，可以对至少一个待利用语句，进行聚类，得到至少一个语句簇，从而根据聚类的结果，执行后续的数据处理，最终实现对慢查询语句的高效处理。Since the query statement is a statement formed by assigning specific query information to a certain type of query type statement, and the query information usually involves user information, in order to prevent user information from leaking, and considering the deprivation process At least one target query statement contains a statement with common content, and at least one statement to be used can be clustered to obtain at least one statement cluster, so as to perform subsequent data processing according to the clustering result, and finally realize the slow query Efficient processing of statements.

可以理解的是，若至少一个待利用语句都相同，则可以得到一个语句簇；若至少一个待利用语句中存在相同的语句，则可以得到的语句簇的数量少于至少一个待利用语句的数量。It can be understood that if at least one to-be-utilized statement is the same, a statement cluster can be obtained; if at least one to-be-utilized statement has the same statement, the number of available statement clusters is less than the number of at least one to-be-utilized statement. .

需要说明的是，根据所述至少一个待利用语句的文本内容，对至少一个待利用语句进行聚类的方式可以存在多种。It should be noted that, according to the text content of the at least one to-be-used sentence, there may be various manners for clustering the at least one to-be-used sentence.

示例性的，在一种实现方式中，可以对比至少一个待利用语句的字符串，将字符串相同的目标查询语句，归为一个簇，从而得到至少一个语句簇。Exemplarily, in an implementation manner, the character string of at least one to-be-utilized sentence may be compared, and target query sentences with the same character string are grouped into a cluster, so as to obtain at least one sentence cluster.

示例性的，在另一种实现方式中，根据所述至少一个待利用语句的文本内容，对至少一个待利用语句进行聚类，得到至少一个语句簇，可以包括：针对所述至少一个待利用语句中的每一待利用语句进行哈希运算，得到所述待利用语句对应的签名；根据所述待利用语句对应的签名，得到至少一个语句簇。Exemplarily, in another implementation manner, according to the text content of the at least one to-be-used sentence, clustering at least one to-be-used sentence to obtain at least one sentence cluster may include: for the at least one to-be-used sentence Hash operation is performed on each to-be-utilized statement in the statement to obtain a signature corresponding to the to-be-utilized statement; and at least one statement cluster is obtained according to the signature corresponding to the to-be-utilized statement.

对待利用语句进行哈希运算得到的签名可以为8、16、32位的字符串，该签名可以作为目标查询语句的标识，从而用来对待利用语句进行检索与分析。通过哈希运算，可以得到固定大小的字符串，从而根据各个哈希运算结果，更高效的对待利用语句进行聚类。其中，根据所述待利用语句对应的签名，得到至少一个语句簇的条件可以为：签名相同，或者，签名相似度大于预定阈值，等等。The signature obtained by hashing the statement to be used can be a character string of 8, 16, or 32 bits, and the signature can be used as the identification of the target query statement, so as to be used for retrieval and analysis of the statement to be used. Through the hash operation, a string of a fixed size can be obtained, so that the sentences to be used can be clustered more efficiently according to the results of each hash operation. Wherein, according to the signature corresponding to the to-be-utilized sentence, the condition for obtaining at least one sentence cluster may be: the signatures are the same, or the signature similarity is greater than a predetermined threshold, and so on.

S104：针对至少一个语句簇中的每一语句簇，输出对应的处理任务；S104: For each statement cluster in the at least one statement cluster, output the corresponding processing task;

其中，所述处理任务为用于指示针对该语句簇进行指定处理的任务。Wherein, the processing task is a task for instructing to perform specified processing on the statement cluster.

当对去隐私化后的各个目标查询语句聚类完成之后，可以针对聚类得到的每一语句簇，进行处理任务的输出。相对于现有技术而言，本方案并不是对每一个慢查询语句进行任务输出，这样避免了大量的人力去对每一个慢查询语句进行处理，从而实现高效地对慢查询语句进行处理。需要说明的是，在实际应用中，聚类后得到的语句簇的数量大大少于聚类前的语句数量。After the clustering of each target query sentence after deprivation is completed, the output of the processing task can be performed for each sentence cluster obtained by the clustering. Compared with the prior art, this solution does not perform task output for each slow query statement, thus avoiding a lot of manpower to process each slow query statement, thereby realizing efficient processing of the slow query statement. It should be noted that, in practical applications, the number of sentence clusters obtained after clustering is much less than the number of sentences before clustering.

其中，所述指定处理可以为对该语句簇中的语句对应的业务代码内容进行优化的处理，相应的，处理任务可以为用于指示处理人员对该语句簇中的语句对应的业务代码内容进行优化的任务；这样，通过输出针对语句簇的处理任务，处理人员可以对该语句簇中语句对应的代码内容进行定位，从而完成对代码内容的优化。当然，指定处理还可以为其他类型的处理，例如：对该语句簇中的语句对应的业务代码内容仅仅进行定位的处理，等等，本公开实施例对此不做限定。Wherein, the designated processing may be a processing of optimizing the content of the business code corresponding to the statement in the statement cluster, and correspondingly, the processing task may be used to instruct the processing personnel to perform the content of the business code corresponding to the statement in the statement cluster. In this way, by outputting the processing task for the statement cluster, the processing personnel can locate the code content corresponding to the statement in the statement cluster, thereby completing the optimization of the code content. Of course, the specified processing may also be other types of processing, such as: processing of only locating the business code content corresponding to the statement in the statement cluster, etc., which is not limited in this embodiment of the present disclosure.

示例性的，在一种实现方式中，针对至少一个语句簇中每一语句簇，输出对应的处理任务，包括：针对每一语句簇，确定该语句簇的语句标识，作为该语句簇的任务内容，例如：将该语句簇中的语句的签名作为任务内容；针对每一语句簇，输出包含有该语句簇的语句标识的处理任务。其中，每一语句簇的语句标识关联有该语句簇的一待利用语句，这样，处理人员可以通过每一语句簇的语句标识，从包含有关于语句标识与各语句簇的映射关系的数据表中，定位到该语句簇的一待利用语句，然后利用所定位的语句，进行后续的代码内容的定位以及优化。Exemplarily, in an implementation manner, outputting a corresponding processing task for each statement cluster in at least one statement cluster includes: for each statement cluster, determining a statement identifier of the statement cluster as a task of the statement cluster. The content, for example: the signature of the statement in the statement cluster as the task content; for each statement cluster, output the processing task including the statement identifier of the statement cluster. The statement identifier of each statement cluster is associated with a to-be-used statement of the statement cluster. In this way, the processing personnel can use the statement identifier of each statement cluster to obtain information from the data table containing the mapping relationship between the statement identifier and each statement cluster. , locate a to-be-used statement of the statement cluster, and then use the located statement to locate and optimize the subsequent code content.

示例性的，在一种实现方式中，针对至少一个语句簇中的每一语句簇，输出对应的处理任务，包括：针对每一语句簇，确定该语句簇对应处理对象，并输出对应的处理任务；其中，所述处理对象为该语句簇中的一待利用语句。需要说明的是，为了实现高效且直观的对慢查询语句进行处理的目的，在输出针对每一语句簇的处理任务时，可以将每一语句簇所涉及到的语句内容，进行输出。并且，由于同一语句簇中所包含的语句相同，因此，针对每一语句簇，可以利用该语句簇中的任一待利用语句，即去隐私化后的任一查询语句，构成该语句簇对应的任务内容。Exemplarily, in an implementation manner, for each statement cluster in at least one statement cluster, outputting a corresponding processing task includes: for each statement cluster, determining a processing object corresponding to the statement cluster, and outputting the corresponding processing task. task; wherein, the processing object is a statement to be used in the statement cluster. It should be noted that, for the purpose of efficiently and intuitively processing slow query statements, when outputting the processing task for each statement cluster, the statement content involved in each statement cluster may be output. In addition, since the sentences contained in the same sentence cluster are the same, for each sentence cluster, any to-be-used sentence in the sentence cluster, that is, any query sentence after deprivation, can be used to form the corresponding sentence of the sentence cluster. task content.

另外，针对从指定运行环境下产生的至少一个查询语句，确定符合指定条件的至少一个目标查询语句的情况而言，针对所得到的每一语句簇，输出对应的处理任务，包括：In addition, in the case of determining at least one target query statement that meets the specified conditions from at least one query statement generated under the specified operating environment, for each obtained statement cluster, output corresponding processing tasks, including:

针对所述至少一个语句簇中的每一语句簇，向指定运行环境所对应的处理端，输出该语句簇对应的处理任务。其中，指定运行环境所对应的处理端为指定运行环境的处理人员所使用的处理端，通过向指定运行环境所对应的处理端输出处理任务，可以使得指定运行环境的处理人员，能够获知处理任务，从而对处理任务进行相应的处理。For each statement cluster in the at least one statement cluster, the processing task corresponding to the statement cluster is output to the processing terminal corresponding to the specified operating environment. The processing terminal corresponding to the designated operating environment is the processing terminal used by the processing personnel of the designated operating environment. By outputting processing tasks to the processing terminal corresponding to the designated operating environment, the processing personnel of the designated operating environment can be informed of the processing tasks. , so as to perform corresponding processing on the processing task.

基于上述的数据处理方法，在本公开的另一实施例中，还提供了另一种数据处理方法，如图2所示，可以包括如下步骤：Based on the above data processing method, in another embodiment of the present disclosure, another data processing method is also provided, as shown in FIG. 2 , which may include the following steps:

S201：从至少一个查询语句中，确定符合指定条件的至少一个目标查询语句；S201: From at least one query statement, determine at least one target query statement that meets a specified condition;

S202：分别对至少一个目标查询语句进行去隐私化处理，得到至少一个待利用语句；S202: Perform deprivation processing on at least one target query statement respectively to obtain at least one statement to be utilized;

S203：根据至少一个待利用语句的文本内容，对至少一个待利用语句进行聚类，得到至少一个语句簇；S203: According to the text content of the at least one sentence to be used, cluster the at least one sentence to be used to obtain at least one sentence cluster;

S204：针对至少一个语句簇中的每一语句簇，输出对应的处理任务；S204: For each statement cluster in the at least one statement cluster, output the corresponding processing task;

其中，所述处理任务为用于指示针对该语句簇进行指定处理的任务；Wherein, the processing task is a task for instructing to perform specified processing on the statement cluster;

步骤S201-S204与上述S101-S104的内容类似，在此不做赘述。Steps S201-S204 are similar to the above-mentioned contents of S101-S104, and will not be repeated here.

S205：确定待执行的查询语句；S205: Determine the query statement to be executed;

由于系统运行过程中不仅存在已经执行的查询语句，还会产生新的、未执行的查询语句；因此，当接收到任一未执行的查询语句时，可以将该未执行的查询语句作为待执行的查询语句，从而针对该待执行的查询语句，执行后续的数据处理步骤。Since there are not only already executed query statements, but also new and unexecuted query statements during system operation; therefore, when any unexecuted query statement is received, the unexecuted query statement can be regarded as pending execution the query statement, so that the subsequent data processing steps are performed for the query statement to be executed.

需要说明的是，待执行的查询语句可以是从未被执行的查询语句，也可以是针对之前执行过的查询语句改进后的查询语句，在此并不做限定。It should be noted that the to-be-executed query statement may be a query statement that has never been executed, or may be an improved query statement for a previously executed query statement, which is not limited herein.

S206：对待执行的查询语句进行去隐私化处理，得到待分析语句；S206: Perform deprivation processing on the query statement to be executed to obtain the statement to be analyzed;

当目标业务系统生成携带有查询语句的业务请求，即产生待执行查询语句时，在待执行的查询语句下发给数据库系统之前，可以对待执行的查询语句中的用户信息进行去隐私化处理，得到待分析语句，并对待分析语句执行后续的处理方法的步骤，从而达到对慢查询语句的实时管控。When the target business system generates a business request with a query statement, that is, when a query statement to be executed is generated, before the query statement to be executed is sent to the database system, the user information in the query statement to be executed can be deprived of privacy processing. The statement to be analyzed is obtained, and the steps of the subsequent processing method are performed on the statement to be analyzed, so as to achieve real-time management and control of the slow query statement.

对待执行的查询语句中的用户信息进行去隐私化处理的方式与上述的对目标查询语句中用户信息的去隐私化处理的方式相同。示例性的，在一种实现方式中，对待执行的查询语句进行去隐私化处理的方式为：对待执行的查询语句中包含用户信息的部分进行隐藏，或者信息替换；例如：将包含个人姓名、地址等信息的部分替换为“？或*”等，当然，并不限于此。The manner of performing the deprivation processing on the user information in the query statement to be executed is the same as the manner in which the deprivation processing of the user information in the target query statement is performed above. Exemplarily, in an implementation manner, the way of depriving the query statement to be executed is: hide the part of the query statement to be executed that contains user information, or replace the information; Parts of information such as addresses are replaced with "? or *", etc., of course, it is not limited to this.

本公开的技术方案中，所涉及的用户信息的收集、存储、使用、加工、传输、提供和公开等处理，均符合相关法律法规的规定，且不违背公序良俗。In the technical solutions of the present disclosure, the collection, storage, use, processing, transmission, provision, and disclosure of user information involved are all in compliance with relevant laws and regulations, and do not violate public order and good customs.

S207：若检测到待分析语句属于至少一个语句簇的任一个，则待执行的查询语句进行拦截和/或报警；S207: If it is detected that the statement to be analyzed belongs to any one of at least one statement cluster, the query statement to be executed is intercepted and/or alarmed;

需要说明的是，至少一个语句簇为上述对目标查询语句进行聚类得到的，目标查询语句为慢查询语句，也就是，目标查询语句是符合指定条件的语句，因此，若待分析语句为属于至少一语句簇的任一个，则该待分析语句对应的待执行的查询语句为慢查询语句，也就是待执行的查询语句为符合指定条件的语句，可以直接对待执行的查询语句进行拦截和/或报警。可以理解的是，所谓的拦截，即将待执行的查询语句拦截在数据库系统外，使得数据库系统接收不到该待执行的查询语句；并且，在实际应用中，报警方式可以包括：发送报警通知、报警信号、发出警报声音、或任一种能够提醒对应的处理人员的报警通知。示例性的，检测待分析语句是否属于至少一个语句簇的任一个的方式可以为：检测待分析语句的字符串是否与至少一个语句簇的任一个的字符串相同，或者，检测待分析语句哈希后得到的结果是否与至少一个语句簇的任一个的哈希结果相同，若相同，则检测到待分析语句属于至少一个语句簇的任一个。It should be noted that at least one statement cluster is obtained by clustering the target query statement above, and the target query statement is a slow query statement, that is, the target query statement is a statement that meets the specified conditions. Therefore, if the statement to be analyzed belongs to At least any one of the statement clusters, the query statement to be executed corresponding to the statement to be analyzed is a slow query statement, that is, the query statement to be executed is a statement that meets the specified conditions, and the query statement to be executed can be directly intercepted and/or or call the police. It can be understood that the so-called interception means that the query statement to be executed is intercepted outside the database system, so that the database system cannot receive the query statement to be executed; and, in practical applications, the alarm method may include: sending an alarm notification, Alarm signal, alarm sound, or any kind of alarm notification that can remind the corresponding processing personnel. Exemplarily, the method of detecting whether the statement to be analyzed belongs to any one of the at least one statement cluster may be: detecting whether the string of the statement to be analyzed is the same as the string of any one of the at least one statement cluster, or, detecting whether the statement to be analyzed is the same. Check whether the obtained result is the same as the hash result of any one of the at least one sentence cluster, and if it is the same, it is detected that the sentence to be analyzed belongs to any one of the at least one sentence cluster.

另外，为了避免由于对任一慢查询语句进行拦截和/或报警，可能导致的对目标业务系统造成损失的问题，因此，可以对待分析语句进行再次识别，然后决定是否拦截和/或报警，从而实现灵活的对慢查询语句进行过滤，防止对生产环境产生误伤。In addition, in order to avoid the loss of the target business system that may be caused by intercepting and/or alarming any slow query statement, it is possible to re-identify the statement to be analyzed, and then decide whether to intercept and/or alarm, thereby Implement flexible filtering of slow query statements to prevent accidental damage to the production environment.

基于该种处理思路，示例性的，在一种实现方式中，若检测到待分析语句属于至少一个语句簇的任一个，则对待执行的查询语句进行拦截和/或报警，可以包括：Based on this processing idea, exemplarily, in an implementation manner, if it is detected that the statement to be analyzed belongs to any one of at least one statement cluster, the query statement to be executed is intercepted and/or alarmed, which may include:

若检测到所述待分析语句属于所述至少一个语句簇的任一个，且该任一个语句簇所对应的指定评判指标符合预定阈值条件，则对所述待执行的查询语句进行拦截和/或报警。If it is detected that the to-be-analyzed statement belongs to any one of the at least one statement cluster, and the specified evaluation index corresponding to any one of the statement clusters meets a predetermined threshold condition, intercept and/or intercept the to-be-executed query statement Call the police.

示例性的，指定评判指标可以为执行时长、加锁时长(慢查询语句加锁的时间)、EXPLAIN结果等。这些结果可以预先记录得到，或者统计或计算得出，这都是合理的。例如：关于慢查询的指定评判指标为执行时长，此时，若该任一个语句簇所对应的执行时长超过预先设定的一时长阈值，则可以对待执行的查询语句进行拦截和/或报警。可以理解的是，该任一个语句簇所对应的执行时长可以由该任一个语句簇所包括的目标查询语句的执行时长进行求平均所得。另外，可以理解的是，若待分析语句不属于所述至少一个语句簇中任一个，或者，待分析语句属于至少一个语句簇中任一个但该任一个语句簇对应的关于慢查询的指定评判指标不符合预定阈值条件，则可以将该待分析语句对应的待执行的查询语句进行下发，也就是将该待分析语句下发给数据库系统，以由数据库系统对该待分析语句进行响应。Exemplarily, the specified evaluation index may be execution time, locking time (time for slow query statement locking), EXPLAIN result, and the like. These results can be pre-recorded, or statistically or calculated, which is reasonable. For example, the specified evaluation index for slow queries is execution time. At this time, if the execution time corresponding to any one of the statement clusters exceeds a preset time-length threshold, the query statement to be executed can be intercepted and/or alarmed. It can be understood that the execution duration corresponding to any one of the statement clusters may be obtained by averaging the execution durations of the target query statements included in the any one of the statement clusters. In addition, it can be understood that, if the statement to be analyzed does not belong to any one of the at least one statement cluster, or, the statement to be analyzed belongs to any one of the at least one statement cluster but the specified judgment on slow query corresponding to any one statement cluster If the indicator does not meet the predetermined threshold condition, the query statement to be executed corresponding to the to-be-analyzed statement can be delivered, that is, the to-be-analyzed statement is delivered to the database system, so that the database system can respond to the to-be-analyzed statement.

本实施例所提供的方案，不但可以在兼顾用户信息不被泄露的同时，高效处理慢查询语句，而且可以对业务系统新产生的慢查询语句进行过滤，达到实时对慢查询语句进行管控的目的。并且，设置预定阈值条件，对慢查询语句进行合理的过滤，防止对生产环境产生误伤。The solution provided by this embodiment can not only efficiently process slow query statements while taking into account that user information is not leaked, but also filter the slow query statements newly generated by the business system, so as to achieve the purpose of real-time management and control of slow query statements . In addition, set predetermined threshold conditions to reasonably filter slow query statements to prevent accidental damage to the production environment.

可选的，基于上述的实施例，在本公开的另一实施例所提供的数据处理方法中，在针对所述至少一个语句簇中的每一语句簇，输出对应的处理任务之后，还可以包括：Optionally, based on the above-mentioned embodiment, in the data processing method provided by another embodiment of the present disclosure, after outputting the corresponding processing task for each sentence cluster in the at least one sentence cluster, you can also include:

当接收到针对任一处理任务的指定调控指令时，对该处理任务响应所述指定调控指令。When a designated regulation instruction for any processing task is received, the processing task responds to the designated regulation instruction.

其中，针对任一处理任务的指定调控指令为用于对该处理任务的处理人员、状态信息等进行调控的指令。示例性的，指定调控指令可以为：用于表征将任务移交他人的指令、用于将任务标记为豁免任务的指令、用于将任务的状态设置为已解决状态的指令、用于为任务设置备注信息的指令，等等。其中，标为豁免：标记豁免后此类签名则不再报警通知，需处理人员审核。Wherein, the specified regulation instruction for any processing task is an instruction for regulating the processing personnel, status information, etc. of the processing task. Exemplarily, the specified regulation instruction may be: an instruction for characterizing the handover of a task to another person, an instruction for marking a task as an exempt task, an instruction for setting the status of a task to a resolved state, an instruction for setting a task Instructions for remarking information, etc. Among them, marked as exemption: After marking the exemption, such signatures will no longer be notified to the police and need to be reviewed by the processing personnel.

可以理解的是，在处理任务的输出界面中，可以设置有每一指定调控指令对应的触发按钮。这样，处理人员表通过对触发按钮进行点击操作，可以发出相应的指定调控指令，从而数据处理装置可以在接收到指定调控指令后，对该处理任务响应该指定调控指令。It can be understood that, in the output interface of the processing task, a trigger button corresponding to each specified regulation instruction may be set. In this way, by clicking the trigger button, the processing personnel table can issue a corresponding specified regulation instruction, so that the data processing device can respond to the specified regulation instruction for the processing task after receiving the specified regulation instruction.

为了便于方案理解，图3给出了针对一处理任务的处理界面示意图。如图3所示，针对一处理任务，输出界面中可以展示有该处理任务的任务内容，以及其他信息，其他信息包括当前归属人、状态、类型、各个指定调控指令对应的触发按钮。其中，若该处理任务为基于生产环境下产生的日志数据所确定出的任务，则类型为线上类型；若该处理任务为基于其他运行环境下产生的日志数据所确定出的任务，则类型为线下类型。In order to facilitate the understanding of the solution, FIG. 3 shows a schematic diagram of a processing interface for a processing task. As shown in FIG. 3 , for a processing task, the output interface can display the task content of the processing task and other information, including the current owner, status, type, and trigger buttons corresponding to each specified control instruction. Among them, if the processing task is a task determined based on log data generated in a production environment, the type is an online type; if the processing task is a task determined based on log data generated in other operating environments, the type Offline type.

可见，通过对于任一处理任务的指定调控指令的响应，从而使得处理任务被处理人员灵活的处理。It can be seen that the processing task is flexibly processed by the processing personnel through the response to the specified regulation instruction for any processing task.

下面结合另一具体实施例，对本公开提供的一种慢查询语句处理方法的原理内容进行详细介绍。The principle content of a slow query statement processing method provided by the present disclosure will be described in detail below with reference to another specific embodiment.

如图4所示的系统架构图，图4提供了以下五个部分：The system architecture diagram shown in Figure 4, Figure 4 provides the following five parts:

数据库简易架构410：数据库系统中包含主从节点(master节点以及slave节点)、主备节点以及代理实例；其中，代理实例将外部的写流量传递到主节点master，以使得主节点完成数据写入过程；将外部的读流量传递到各个从节点slave-01至slave-n，以使得从节点完成数据读取过程。其中，在业务系统对于数据库进行访问的过程中，可以产生日志文件(其中包含有上述的至少一个查询语句)。Simple database architecture 410: The database system includes master-slave nodes (master nodes and slave nodes), master-slave nodes, and proxy instances; wherein the proxy instances pass external write traffic to the master node, so that the master node completes data writing Process; pass the external read traffic to each slave node slave-01 to slave-n, so that the slave node completes the data reading process. Wherein, in the process of accessing the database by the business system, a log file (which contains the above at least one query statement) may be generated.

慢查询语句收集与加工420：可以从数据库简易架构中所产生的日志文件中，进行实时日志采集，然后对采集的日志进行慢查询语句识别(对应上述从至少一个查询语句中，确定符合指定条件的至少一个目标查询语句)，然后签名信息生成(对应上述对目标查询语句进行去隐私化，然后哈希运算，得到目标查询语句对应的签名)，最后进行相关信息存储。该部分可以存储查询语句的执行时长，解析行数：解析查询语句的代码行数，返回行数：针对查询语句返回的数据的行数，来源IP：即查询语句的来源的IP，请求时间：查询语句的请求时间等信息。Slow query statement collection and processing 420: Real-time log collection can be performed from log files generated in the simple database architecture, and then slow query statement identification is performed on the collected logs (corresponding to the above-mentioned from at least one query statement, it is determined that the specified conditions are met. at least one target query statement), then the signature information is generated (corresponding to the above-mentioned deprivation of the target query statement, followed by a hash operation to obtain the signature corresponding to the target query statement), and finally related information is stored. This part can store the execution time of the query statement, the number of parsing lines: the number of lines of code for parsing the query statement, the number of returned lines: the number of lines of data returned for the query statement, the source IP: the IP of the source of the query statement, the request time: Information such as the request time of the query statement.

慢查询语句治理430：接收慢查询语句收集与加工部分推送的数据，对同类慢查询语句聚合(对应上述对至少一个待利用语句进行聚类，得到至少一个语句簇)，存储聚合后信息：即存储聚合后得到的每一语句簇，然后根据聚合得到的结果，对每一语句簇进行处理(对应上述针对所述至少一个语句簇中的每一语句簇，输出对应的处理任务)。所谓的处理可以包括：通知报警：对对应的慢查询语句进行报警，并通知给对应的处理人；认领功能：处理人认领属于自己处理的处理任务；移交功能：将不属于自己处理的处理任务移交给对应的处理人；分配功能：将对应的处理任务分配给对应的处理人；设置排期：设置处理任务的期限；设置备注：设置处理任务的备注，例如：某个处理任务的特殊要求：当天必须处理完等等。Slow query statement management 430: Receive the data pushed by the slow query statement collection and processing part, aggregate the slow query statements of the same type (corresponding to the above-mentioned clustering of at least one to-be-used statement to obtain at least one statement cluster), and store the aggregated information: i.e. Each sentence cluster obtained after the aggregation is stored, and then each sentence cluster is processed according to the result obtained by the aggregation (corresponding to the above for each sentence cluster in the at least one sentence cluster, the corresponding processing task is output). The so-called processing can include: notification and alarm: alarm the corresponding slow query statement and notify the corresponding handler; claim function: the handler claims the processing task that belongs to his own processing; handover function: the processing task that does not belong to his own processing Hand over to the corresponding handler; assign function: assign the corresponding processing task to the corresponding handler; set schedule: set the deadline for the processing task; set remarks: set the remarks of the processing task, for example: the special requirements of a processing task : Must be processed on the day and so on.

慢查询语句识别与拦截440：当任一业务请求到来时(对应上述确定待执行的查询语句)，对该业务请求中的查询语句中的用户信息进行去隐私化处理，得到待分析语句(对应上述对待执行的查询语句进行去隐私化处理，得到待分析语句)；请求慢查询语句治理部分的数据，然后根据慢查询语句治理部分的数据识别该待分析语句是否为慢查询语句(对应上述若检测到待分析语句属于至少一个语句簇的任一个，则该待分析语句对应的待执行的查询语句为慢查询语句)，若不是慢查询语句，则正常下发至数据库系统；若是慢查询语句，再次判断是否达到阈值(对应上述若检测到所述待分析语句属于所述至少一个语句簇的任一个，且该任一个语句簇所对应的关于慢查询的指定评判指标符合预定阈值条件)若达到阈值，则对该慢查询语句进行拦截报警，若未达到，则正常下发该慢查询语句至数据库。Slow query statement identification and interception 440: when any service request arrives (corresponding to the above-mentioned query statement to be determined to be executed), perform deprivation processing on the user information in the query statement in the service request, and obtain the statement to be analyzed (corresponding to the query statement to be executed). The above query statement to be executed is deprived to obtain the statement to be analyzed); the data of the management part of the slow query statement is requested, and then according to the data of the management part of the slow query statement, whether the statement to be analyzed is a slow query statement (corresponding to the above if If it is detected that the statement to be analyzed belongs to any one of at least one statement cluster, the query statement to be executed corresponding to the statement to be analyzed is a slow query statement). If it is not a slow query statement, it will be sent to the database system normally; if it is a slow query statement , and judge again whether the threshold is reached (corresponding to the above if it is detected that the statement to be analyzed belongs to any one of the at least one statement cluster, and the specified evaluation index on the slow query corresponding to any one statement cluster meets the predetermined threshold condition) if If the threshold is reached, the slow query statement will be intercepted and an alarm will be issued. If it is not reached, the slow query statement will be sent to the database normally.

慢查询语句拦截450：开发环境第一道拦截；测试环境第二道拦截；异构环境第三道拦截；生产环境第四道拦截；本部分扩展了多道拦截环境，能够尽量的把慢查询语句在生产环境之前进行拦截，并及时优化业务代码。多环境拦截可以根据业务需求进行合理的分配，例如：可以只对开发环境和测试环境进行应用；或开发环境、测试环境、异构环境、生产环境都应用，但是开发环境和测试环境占的比重较大。Slow query statement interception 450: The first interception in the development environment; the second interception in the test environment; the third interception in the heterogeneous environment; the fourth interception in the production environment; Statements are intercepted before the production environment, and business code is optimized in time. Multi-environment interception can be reasonably allocated according to business needs. For example, only the development environment and test environment can be applied; or the development environment, test environment, heterogeneous environment, and production environment can be applied, but the proportion of development environment and test environment larger.

本实施例中，可以实现对业务产生的日志数据进行动态监听，实时性更强，然后对慢查询语句进行去隐私化处理，不仅能对属于同一类的慢查询语句进行聚类，还能保证用户数据的隐秘性，避免用户数据泄露带来的安全隐患；可以高效的对任一环境下的每一类的慢查询语句进行高效的处理，避免线上无人问津的情况；并且，设置阈值，更加灵活的对慢查询语句进行合理过滤，防止对生产环境产生误伤；多个环境的多道拦截策略可以把慢查询语句拦截在生产环境之前，并及时优化业务代码，避免对业务产生影响；并且，本方案对于复杂的主从架构或分布式架构而言，所提升的慢查询处理效率尤为明显。In this embodiment, it is possible to dynamically monitor the log data generated by the business, with stronger real-time performance, and then perform deprivation processing on the slow query statements, which can not only cluster the slow query statements belonging to the same category, but also ensure that The privacy of user data avoids potential security risks caused by user data leakage; it can efficiently process each type of slow query statement in any environment to avoid the situation that no one cares about online; and, set a threshold , more flexibly filter slow query statements reasonably to prevent accidental injury to the production environment; the multi-channel interception strategy of multiple environments can intercept slow query statements before the production environment, and optimize business code in time to avoid business impact; In addition, for complex master-slave architectures or distributed architectures, the improved slow query processing efficiency is particularly evident.

根据本公开的实施例，本公开还提供了一种数据处理装置，如图5所示，该装置包括：According to an embodiment of the present disclosure, the present disclosure also provides a data processing apparatus, as shown in FIG. 5 , the apparatus includes:

第一确定模块510，用于从至少一个查询语句中，确定符合指定条件的至少一个目标查询语句；其中，所述指定条件包括执行时长超过预定时长阈值；The first determination module 510 is configured to determine, from at least one query statement, at least one target query statement that meets a specified condition; wherein, the specified condition includes that the execution duration exceeds a predetermined duration threshold;

第一处理模块520，用于分别对所述至少一个目标查询语句进行去隐私化处理，得到至少一个待利用语句；The first processing module 520 is configured to perform deprivation processing on the at least one target query statement respectively to obtain at least one to-be-utilized statement;

聚类模块530，用于根据所述至少一个待利用语句的文本内容，对所述至少一个待利用语句进行聚类，得到至少一个语句簇；Clustering module 530, configured to perform clustering on the at least one sentence to be used according to the text content of the at least one sentence to be used to obtain at least one sentence cluster;

输出模块540，用于针对所述至少一个语句簇中的每一语句簇，输出对应的处理任务。The output module 540 is configured to output a corresponding processing task for each sentence cluster in the at least one sentence cluster.

本方案中，在确定出至少一个查询语句中的、符合指定条件的至少一个目标查询语句后，通过对至少一个目标查询语句进行去隐私化处理，使得信息避免被泄露；并且，考虑到去隐私化处理后的至少一个待利用语句中包含具有共性内容的语句，因此，对至少一个待利用语句进行聚类，并按照聚类得到的至少一个语句簇中的每一语句簇分别输出处理任务，这样，相对于为每个目标查询语句设置处理任务而言，处理任务的数量大大降低。可见，通过本方案可以在兼顾用户信息不被泄露的同时，高效处理慢查询语句。In this solution, after at least one target query statement in at least one query statement that meets the specified conditions is determined, the at least one target query statement is deprived of privacy processing, so that information is prevented from being leaked; and, considering the deprivation of privacy The at least one sentence to be used after the transformation process contains a sentence with common content. Therefore, the at least one sentence to be used is clustered, and the processing tasks are respectively output according to each sentence cluster in the at least one sentence cluster obtained by the clustering, In this way, the number of processing tasks is greatly reduced compared to setting processing tasks for each target query statement. It can be seen that through this solution, slow query statements can be efficiently processed while taking into account that user information is not leaked.

可选地，所述输出模块具体用于：Optionally, the output module is specifically used for:

针对每一语句簇，确定该语句簇对应的处理对象，并输出对应的处理任务；其中，所述处理对象为该语句簇中的一待利用语句。For each sentence cluster, a processing object corresponding to the sentence cluster is determined, and a corresponding processing task is output; wherein, the processing object is a to-be-used sentence in the sentence cluster.

可选地，所述确定模块具体用于：Optionally, the determining module is specifically used for:

从指定运行环境下产生的至少一个查询语句中，确定符合指定条件的至少一个目标查询语句；From at least one query statement generated under the specified operating environment, determine at least one target query statement that meets the specified condition;

所述输出模块具体用于：The output module is specifically used for:

针对所述至少一个语句簇中的每一语句簇，向所述指定运行环境所对应的处理端，输出该语句簇对应的处理任务。For each statement cluster in the at least one statement cluster, the processing task corresponding to the statement cluster is output to the processing terminal corresponding to the specified operating environment.

可选地，所述聚类模块具体用于：Optionally, the clustering module is specifically used for:

针对所述至少一个待利用语句中的每一待利用语句进行哈希运算，得到所述待利用语句对应的签名；Hash operation is performed on each to-be-utilized statement in the at least one to-be-utilized statement to obtain a signature corresponding to the to-be-utilized statement;

根据所述待利用语句对应的签名，得到至少一个语句簇。At least one sentence cluster is obtained according to the signature corresponding to the to-be-utilized sentence.

可选地，如图6所示，所述装置还包括：Optionally, as shown in Figure 6, the device further includes:

第二确定模块650，用于确定待执行的查询语句；a second determining module 650, configured to determine the query statement to be executed;

第二处理模块660，用于对所述待执行的查询语句进行去隐私化处理，得到待分析语句；The second processing module 660 is configured to perform deprivation processing on the query statement to be executed to obtain the statement to be analyzed;

拦截模块670，用于若检测到所述待分析语句属于所述至少一个语句簇的任一个，则对所述待执行的查询语句进行拦截和/或报警。The interception module 670 is configured to intercept and/or alarm the query statement to be executed if it is detected that the statement to be analyzed belongs to any one of the at least one statement cluster.

可选地，所述拦截模块具体用于：Optionally, the interception module is specifically used for:

可选地，所述装置还包括：Optionally, the device further includes:

响应模块，用于当接收到针对任一处理任务的指定调控指令时，对该处理任务响应所述指定调控指令。The response module is configured to respond to the specified control instruction for any processing task when a specified control instruction is received for the processing task.

根据本公开的实施例，本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.

本公开实施例提供了一种电子设备，包括：An embodiment of the present disclosure provides an electronic device, including:

至少一个处理器；以及at least one processor; and

本公开实施例提供了一种存储有计算机指令的非瞬时计算机可读存储介质，其中，所述计算机指令用于使所述计算机执行任一种数据处理方法。Embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute any data processing method.

本公开提供了一种计算机程序产品，包括计算机程序，所述计算机程序在被处理器执行时实现任一种数据处理方法。The present disclosure provides a computer program product, including a computer program that, when executed by a processor, implements any data processing method.

图7示出了可以用来实施本公开的实施例的示例电子设备700的示意性框图。电子设备旨在表示各种形式的数字计算机，诸如，膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本公开的实现。FIG. 7 shows a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

如图7所示，设备700包括计算单元701，其可以根据存储在只读存储器(ROM)702中的计算机程序或者从存储单元708加载到随机访问存储器(RAM)703中的计算机程序，来执行各种适当的动作和处理。在RAM 703中，还可存储设备700操作所需的各种程序和数据。计算单元701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。As shown in FIG. 7 , the device 700 includes a computing unit 701 that can be executed according to a computer program stored in a read only memory (ROM) 702 or loaded into a random access memory (RAM) 703 from a storage unit 708 Various appropriate actions and handling. In the RAM 703, various programs and data necessary for the operation of the device 700 can also be stored. The computing unit 701 , the ROM 702 , and the RAM 703 are connected to each other through a bus 704 . An input/output (I/O) interface 705 is also connected to bus 704 .

设备700中的多个部件连接至I/O接口705，包括：输入单元706，例如键盘、鼠标等；输出单元707，例如各种类型的显示器、扬声器等；存储单元708，例如磁盘、光盘等；以及通信单元709，例如网卡、调制解调器、无线通信收发机等。通信单元709允许设备700通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706, such as a keyboard, mouse, etc.; an output unit 707, such as various types of displays, speakers, etc.; a storage unit 708, such as a magnetic disk, an optical disk, etc. ; and a communication unit 709, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 709 allows the device 700 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

计算单元701可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元701的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元701执行上文所描述的各个方法和处理，例如数据处理方法。例如，在一些实施例中，数据处理方法可被实现为计算机软件程序，其被有形地包含于机器可读介质，例如存储单元708。在一些实施例中，计算机程序的部分或者全部可以经由ROM 702和/或通信单元709而被载入和/或安装到设备700上。当计算机程序加载到RAM 703并由计算单元701执行时，可以执行上文描述的数据处理方法的一个或多个步骤。备选地，在其他实施例中，计算单元701可以通过其他任何适当的方式(例如，借助于固件)而被配置为执行数据处理方法。Computing unit 701 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of computing units 701 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various specialized artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 executes the various methods and processes described above, such as data processing methods. For example, in some embodiments, a data processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708 . In some embodiments, part or all of the computer program may be loaded and/or installed on device 700 via ROM 702 and/or communication unit 709 . When a computer program is loaded into RAM 703 and executed by computing unit 701, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the data processing method by any other suitable means (eg, by means of firmware).

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、复杂可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described herein above may be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips system (SOC), complex programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that The processor, which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.

用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器，使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行，作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, performs the functions/functions specified in the flowcharts and/or block diagrams. Action is implemented. The program code may execute entirely on the machine, partly on the machine, partly on the machine and partly on a remote machine as a stand-alone software package or entirely on the remote machine or server.

在本公开的上下文中，机器可读介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

为了提供与用户的交互，可以在计算机上实施此处描述的系统和技术，该计算机具有：用于向用户显示信息的显示装置(例如，CRT(阴极射线管)或者LCD(液晶显示器)监视器)；以及键盘和指向装置(例如，鼠标或者轨迹球)，用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器，也可以为分布式系统的服务器，或者是结合了区块链的服务器。A computer system can include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, a distributed system server, or a server combined with blockchain.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行，只要能够实现本公开公开的技术方案所期望的结果，本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, no limitation is imposed herein.

上述具体实施方式，并不构成对本公开保护范围的限制。本领域技术人员应该明白的是，根据设计要求和其他因素，可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等，均应包含在本公开保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements, and improvements made within the spirit and principles of the present disclosure should be included within the protection scope of the present disclosure.

Claims

1. A data processing method, comprising:

From at least one query statement, determine at least one target query statement that meets a specified condition; wherein, the specified condition includes that the execution duration exceeds a predetermined duration threshold;

Perform deprivation processing on the at least one target query statement respectively to obtain at least one statement to be utilized;

Clustering the at least one sentence to be used according to the text content of the at least one sentence to be used to obtain at least one sentence cluster;

For each sentence cluster in the at least one sentence cluster, a corresponding processing task is output.

2. The method according to claim 1, wherein, for each statement cluster in the at least one statement cluster, outputting a corresponding processing task, comprising:

For each statement cluster, determine the processing object corresponding to the statement cluster, and output the corresponding processing task;

Wherein, the processing object is a to-be-used statement in the statement cluster.

3. The method according to claim 1, wherein the determining from at least one query statement at least one target query statement that meets a specified condition comprises:

From at least one query statement generated under the specified operating environment, determine at least one target query statement that meets the specified condition;

For each statement cluster in the at least one statement cluster, output the corresponding processing task, including:

For each statement cluster in the at least one statement cluster, the processing task corresponding to the statement cluster is output to the processing terminal corresponding to the specified operating environment.

4. The method according to claim 1, wherein the at least one sentence to be used is clustered according to the text content of the at least one sentence to be used to obtain at least one sentence cluster, comprising:

Hash operation is performed on each to-be-utilized statement in the at least one to-be-utilized statement to obtain a signature corresponding to the to-be-utilized statement;

At least one sentence cluster is obtained according to the signature corresponding to the to-be-utilized sentence.

5. The method according to any one of claims 1-4, after the at least one sentence to be used is clustered according to the text content of the at least one sentence to be used to obtain at least one sentence cluster, the Methods also include:

Determine the query statement to be executed;

Perform deprivation processing on the query statement to be executed to obtain the statement to be analyzed;

If it is detected that the statement to be analyzed belongs to any one of the at least one statement cluster, intercept and/or alarm the query statement to be executed.

6. The method according to claim 5, wherein, if it is detected that the statement to be analyzed belongs to any one of the at least one statement cluster, the query statement to be executed is intercepted and/or alarmed, including:

If it is detected that the to-be-analyzed statement belongs to any one of the at least one statement cluster, and the specified evaluation index corresponding to any one of the statement clusters meets a predetermined threshold condition, intercept and/or intercept the to-be-executed query statement Call the police.

7. The method according to claim 1, wherein, after outputting the corresponding processing task for each statement cluster in the at least one statement cluster, the method further comprises:

When a designated regulation instruction for any processing task is received, the processing task responds to the designated regulation instruction.

8. A data processing device, comprising:

a first determination module, configured to determine, from at least one query statement, at least one target query statement that meets a specified condition; wherein the specified condition includes that the execution duration exceeds a predetermined duration threshold;

a first processing module, configured to perform deprivation processing on the at least one target query statement to obtain at least one to-be-utilized statement;

a clustering module, configured to perform clustering on the at least one sentence to be used according to the text content of the at least one sentence to be used to obtain at least one sentence cluster;

The output module is configured to output a corresponding processing task for each sentence cluster in the at least one sentence cluster.

9. The apparatus according to claim 8, wherein the output module is specifically used for:

For each sentence cluster, a processing object corresponding to the sentence cluster is determined, and a corresponding processing task is output; wherein, the processing object is a to-be-used sentence in the sentence cluster.

10. The apparatus according to claim 8, wherein the first determining module is specifically configured to:

The output module is specifically used for:

11. The apparatus according to claim 8, wherein the clustering module is specifically used for:

12. The apparatus of any one of claims 8-11, further comprising:

The second determination module determines the query statement to be executed;

a second processing module, configured to perform deprivation processing on the query statement to be executed to obtain the statement to be analyzed;

An interception module, configured to intercept and/or alarm the query statement to be executed if it is detected that the statement to be analyzed belongs to any one of the at least one statement cluster.

13. The apparatus according to claim 12, wherein the interception module is specifically configured to:

14. The apparatus of claim 8, wherein the apparatus further comprises:

The response module is configured to respond to the specified control instruction for any processing task when a specified control instruction is received for the processing task.

15. An electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the execution of any of claims 1-7 Methods.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-7.