CN117520306B

CN117520306B - Data verification method and device and electronic equipment

Info

Publication number: CN117520306B
Application number: CN202311556293.7A
Authority: CN
Inventors: 钟声振; 林帅均; 王一博; 安红新; 陈坚; 关矛; 张�杰; 余东辉
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Internet Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Internet Co Ltd
Priority date: 2023-11-20
Filing date: 2023-11-20
Publication date: 2024-10-29
Anticipated expiration: 2043-11-20
Also published as: CN117520306A

Abstract

The application provides a data verification method, a data verification device and electronic equipment. The method comprises the following steps: determining N first data tables to be checked and second data tables to be checked corresponding to each first data table to be checked; wherein N is a positive integer; dividing the N first data tables to be checked into M data table groups based on sampling differences among the N first data tables to be checked; wherein M is a positive integer greater than 1; sequentially distributing the verification tasks corresponding to the M data table groups to M thread groups; and based on the M thread groups, according to the verification task, carrying out data verification on the data in the N first data tables to be verified and the corresponding second data tables to be verified. According to the scheme, the first data tables to be verified are grouped based on sampling differences among the N first data tables to be verified, so that verification tasks corresponding to different concurrent thread groups are distributed more uniformly, and the efficiency of data verification can be improved as a whole.

Description

Data verification method, device and electronic equipment

技术领域Technical Field

本申请涉及数据处理技术领域，尤其涉及一种数据校验方法、装置及电子设备。The present application relates to the field of data processing technology, and in particular to a data verification method, device and electronic equipment.

背景技术Background Art

通常当两个数据库之间的数据进行同步或者迁移的时候，有时会存在不同数据源之间的数据差异的问题，为了达到数据一致的目的，需要对数据表进行校验处理。但是，相关技术中的数据校验方法存在处理效率低，资源使用率低的问题。Usually when data between two databases is synchronized or migrated, there may be data differences between different data sources. In order to achieve data consistency, the data table needs to be verified. However, the data verification method in the related art has the problems of low processing efficiency and low resource utilization.

发明内容Summary of the invention

为了解决上述问题，本申请提供了一种数据校验方法、装置及电子设备。In order to solve the above problems, the present application provides a data verification method, device and electronic device.

根据本申请的第一方面，提供了一种数据校验方法，包括：According to a first aspect of the present application, a data verification method is provided, comprising:

确定N个第一待校验数据表，以及每个所述第一待校验数据表对应的第二待校验数据表；其中，N为正整数；Determine N first data tables to be verified, and a second data table to be verified corresponding to each of the first data tables to be verified; wherein N is a positive integer;

基于所述N个第一待校验数据表之间的取样差异，将所述N个第一待校验数据表划分为M个数据表组；其中，M为大于1的正整数；Based on the sampling differences between the N first data tables to be checked, the N first data tables to be checked are divided into M data table groups; wherein M is a positive integer greater than 1;

将所述M个数据表组所对应的校验任务依次分配至M个线程组；Allocate the verification tasks corresponding to the M data table groups to the M thread groups in sequence;

基于所述M个线程组，根据所述校验任务，对所述N个第一待校验数据表及其各自对应的第二待校验数据表中的数据进行数据校验。Based on the M thread groups and according to the verification task, data verification is performed on the data in the N first data tables to be verified and their respective corresponding second data tables to be verified.

根据本申请的第二方面，提供了一种数据校验装置，包括：According to a second aspect of the present application, a data verification device is provided, comprising:

确定模块，用于确定N个第一待校验数据表，以及每个所述第一待校验数据表对应的第二待校验数据表；其中，N为正整数；A determination module, used to determine N first data tables to be checked, and a second data table to be checked corresponding to each of the first data tables to be checked; wherein N is a positive integer;

分组模块，用于基于所述N个第一待校验数据表之间的取样差异，将所述N个第一待校验数据表划分为M个数据表组；其中，M为大于1的正整数；A grouping module, configured to divide the N first data tables to be checked into M data table groups based on sampling differences between the N first data tables to be checked; wherein M is a positive integer greater than 1;

分配模块，用于将所述M个数据表组所对应的校验任务依次分配至M个线程组；An allocation module, used for allocating the verification tasks corresponding to the M data table groups to the M thread groups in sequence;

校验模块，用于基于所述M个线程组，根据所述校验任务，对所述N个第一待校验数据表及其各自对应的第二待校验数据表中的数据进行数据校验。A verification module is used to perform data verification on the data in the N first data tables to be verified and their respective corresponding second data tables to be verified based on the M thread groups and according to the verification task.

根据本申请的第三方面，提供了一种电子设备，包括：处理器；用于存储处理器的可执行指令的存储器；其中，处理器被配置为执行所述指令，以实现上述第一方面所述的方法。According to a third aspect of the present application, an electronic device is provided, comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the instructions to implement the method described in the first aspect above.

根据本申请的第四方面，提供了一种计算机可读存储介质，当所述计算机可读存储介质中的指令由电子设备的处理器执行时，使得电子设备能够执行上述第一方面所述的方法。According to a fourth aspect of the present application, a computer-readable storage medium is provided. When instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device is enabled to execute the method described in the first aspect above.

根据本申请的技术方案，通过将N个第一待校验数据表划分为M个数据表组，并基于N个第一待校验数据表之间的取样差异，将M个数据表组所对应的校验任务依次分配至M个线程组，基于M个线程组，根据校验任务，对N个第一待校验数据表及其各自对应的第二待校验数据表中的数据进行数据校验。本方案通过基于各个第一待校验数据表之间的取样差异对第一待校验数据表分组，以降低每个数据表组所对应的校验任务的取样差异，从而使不同的并发线程组所对应的校验任务分配更均匀，进而可以从整体上提高数据校验的效率。According to the technical solution of the present application, by dividing N first data tables to be verified into M data table groups, and based on the sampling differences between the N first data tables to be verified, the verification tasks corresponding to the M data table groups are sequentially allocated to the M thread groups, and based on the M thread groups, according to the verification tasks, data verification is performed on the data in the N first data tables to be verified and their respective corresponding second data tables to be verified. This solution groups the first data tables to be verified based on the sampling differences between the first data tables to be verified to reduce the sampling differences of the verification tasks corresponding to each data table group, thereby making the verification tasks corresponding to different concurrent thread groups more evenly distributed, and thus improving the efficiency of data verification as a whole.

本申请附加的方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本申请的实践了解到。Additional aspects and advantages of the present application will be given in part in the description below, and in part will become apparent from the description below, or will be learned through the practice of the present application.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present application will become apparent and easily understood from the following description of the embodiments in conjunction with the accompanying drawings, in which:

图1为本申请实施例所提供的一种数据校验方法的流程图；FIG1 is a flow chart of a data verification method provided in an embodiment of the present application;

图2为本申请实施例所提供的另一种数据校验方法的流程图；FIG2 is a flow chart of another data verification method provided in an embodiment of the present application;

图3为本申请实施例所提供的又一种数据校验方法的流程图；FIG3 is a flow chart of another data verification method provided in an embodiment of the present application;

图4为本申请实施例中划分数据表组的示例图；FIG4 is an example diagram of dividing data table groups in an embodiment of the present application;

图5为本申请实施例所提供的又一种数据校验方法的流程图；FIG5 is a flow chart of another data verification method provided in an embodiment of the present application;

图6为本申请实施例中数据校验的示例图；FIG6 is an example diagram of data verification in an embodiment of the present application;

图7为本申请实施例所提供的一种数据校验装置的结构框图；FIG7 is a structural block diagram of a data verification device provided in an embodiment of the present application;

图8为本申请实施例所提供的一种电子设备的结构框图。FIG8 is a structural block diagram of an electronic device provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面详细描述本申请的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本申请，而不能理解为对本申请的限制。Embodiments of the present application are described in detail below, and examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and are intended to be used to explain the present application, and should not be construed as limiting the present application.

需要说明的是，通常当两个数据库之间的数据进行同步或者迁移的时候，有时会存在不同数据源之间的数据差异的问题，为了达到数据一致的目的，需要对数据表进行校验处理。It should be noted that, usually when data between two databases is synchronized or migrated, there may sometimes be data differences between different data sources. In order to achieve data consistency, the data table needs to be verified.

但是相关技术的数据校验方法采用的是串行的方式对数据表进行校验，整个校验过程耗时长，处理效率低，且机器上的资源得不到充分的利用，资源使用率低。However, the data verification method of the related art uses a serial method to verify the data table. The entire verification process is time-consuming, the processing efficiency is low, and the resources on the machine are not fully utilized, resulting in a low resource utilization rate.

图1为本申请实施例所提供的一种数据校验方法的流程图。需要说明的是，本申请实施例所提供的一种数据校验方法可应用于本申请实施例的一种数据校验装置，且该装置可配置于电子设备中。其中，电子设备可以为服务器、服务器集群等。如图1所示，该方法可以包括以下步骤：FIG1 is a flow chart of a data verification method provided in an embodiment of the present application. It should be noted that a data verification method provided in an embodiment of the present application can be applied to a data verification device in an embodiment of the present application, and the device can be configured in an electronic device. Among them, the electronic device can be a server, a server cluster, etc. As shown in FIG1, the method can include the following steps:

步骤101，确定N个第一待校验数据表，以及每个第一待校验数据表对应的第二待校验数据表；其中，N为正整数。Step 101, determining N first data tables to be checked and a second data table to be checked corresponding to each first data table to be checked; wherein N is a positive integer.

在本申请的一些实施例中，第一待校验数据表以及每个第一待校验数据表对应的第二待校验数据表可以通过校验任务来确定。其中，校验任务中包含待校验的源数据表和目标数据表，以及待校验的数据范围。作为一种示例，校验任务的类型可以为以下至少一种：指定行的数据校验类型、指定时间段的数据校验类型、增量数据校验类型、全量数据校验类型。In some embodiments of the present application, the first data table to be verified and the second data table to be verified corresponding to each first data table to be verified can be determined by a verification task. The verification task includes a source data table and a target data table to be verified, as well as a data range to be verified. As an example, the type of the verification task can be at least one of the following: a data verification type for a specified row, a data verification type for a specified time period, an incremental data verification type, and a full data verification type.

其中，指定行的数据校验类型的校验任务适用于少量的数据校验的情况，针对已知不一致的数据行范围的数据进行校验的场景。指定时间段的数据校验类型的校验任务可以针对系统数据在一段时间内出现故障的情景，以实现对某个时间段内的数据进行校验。增量数据校验类型的校验任务为对于正常运行的系统进行增量数据校验，以保障不同数据库之间的数据实时一致，适用于实时的数据校验场景。全量数据校验类型的校验任务为对于新建的数据库或者全量迁移的数据库进行数据校验。Among them, the verification task of the data verification type of the specified row is suitable for the case of a small amount of data verification, and the scenario of verifying the data in the range of known inconsistent data rows. The verification task of the data verification type of the specified time period can be used for the scenario where the system data fails for a period of time, so as to verify the data within a certain period of time. The verification task of the incremental data verification type is to perform incremental data verification on the normally running system to ensure the real-time consistency of data between different databases, which is suitable for real-time data verification scenarios. The verification task of the full data verification type is to perform data verification on the newly created database or the fully migrated database.

在本申请的一些实施例中，第一待校验数据表可以为校验任务中的源数据表，第二待校验数据表为校验任务中的目标数据表。其中，校验任务中一个第一待校验数据表可以对应一个第二待校验数据表，也可以是一个第一待校验数据表对应多个第二待校验数据表。通常，第一待校验数据表与第二待校验数据表均归属于不同的数据库。In some embodiments of the present application, the first data table to be verified may be a source data table in a verification task, and the second data table to be verified may be a target data table in the verification task. In the verification task, one first data table to be verified may correspond to one second data table to be verified, or one first data table to be verified may correspond to multiple second data tables to be verified. Usually, the first data table to be verified and the second data table to be verified belong to different databases.

步骤102，基于N个第一校验数据表之间的取样差异，将N个第一待校验数据表划分为M个数据表组；其中，M为大于1的正整数。Step 102 : Based on the sampling differences between the N first verification data tables, divide the N first to-be-verified data tables into M data table groups; wherein M is a positive integer greater than 1.

在本申请的一些实施例中，M可以为预先设定的值，也可以为基于当前硬件条件确定的值，此处不作限定。比如，根据硬件资源条件，确定当前可用于进行数据校验的资源，以确定当前可用于进行数据校验的线程组数量。需要说明的是，数据表组的数量与用于执行校验任务的线程组的数量一致。In some embodiments of the present application, M may be a pre-set value or a value determined based on current hardware conditions, which is not limited here. For example, based on hardware resource conditions, the resources currently available for data verification are determined to determine the number of thread groups currently available for data verification. It should be noted that the number of data table groups is consistent with the number of thread groups used to perform verification tasks.

其中，数据表组并不是用于容纳多个第一待校验数据表的存储工具，数据表组仅仅可以是用于实现对N个第一待校验数据表进行分组的中间过程，其可以是基于第一待校验数据表的标识信息的划分。比如，每个数据表组中包含至少第一待校验数据表的标识信息。The data table group is not a storage tool for accommodating multiple first data tables to be verified. The data table group can only be an intermediate process for implementing grouping of N first data tables to be verified, which can be based on the division of identification information of the first data tables to be verified. For example, each data table group contains at least identification information of the first data table to be verified.

在本申请的一些实施例中，N个第一待校验数据表之间的取样差异是指每个第一待校验数据表的取样难度的差异，比如，N个第一待校验数据表之间的取样差异可以为从每个第一待校验数据表取出相同数量的样本所对应的耗时之间的差异，也可以为每个第一待校验数据表的数据结构之间的差异，也可以为每个第一待校验数据表的数据量之间的差异，还可以为相同的时长从每个第一待校验数据表取出的样本数量的差异。In some embodiments of the present application, the sampling difference between N first data tables to be checked refers to the difference in sampling difficulty of each first data table to be checked. For example, the sampling difference between N first data tables to be checked can be the difference in time corresponding to taking out the same number of samples from each first data table to be checked, or it can be the difference between the data structures of each first data table to be checked, or it can be the difference between the data amounts of each first data table to be checked, or it can be the difference in the number of samples taken out from each first data table to be checked in the same duration.

作为一种可能的实现方式，基于N个第一校验数据表之间的取样差异，将N个第一待校验数据表划分为M个数据表组的实现方式可以包括：确定每个第一待校验数据表的待校验数据量；根据每个第一待校验数据表的待校验数据量，将N个第一待校验数据表划分为M个数据表组，以使每个数据表组所对应的总待校验数据量差别尽可能小，从而可以使各数据表组所对应的校任务的耗时较接近。As a possible implementation method, based on the sampling differences between the N first data tables to be verified, the implementation method of dividing the N first data tables to be verified into M data table groups may include: determining the amount of data to be verified of each first data table to be verified; according to the amount of data to be verified of each first data table to be verified, dividing the N first data tables to be verified into M data table groups, so that the difference in the total amount of data to be verified corresponding to each data table group is as small as possible, so that the time consumption of the verification tasks corresponding to each data table group can be made closer.

作为另一种可能的实现方式，由于不同的数据表的结构、数据量等的差别，所以不同数据表取出一定量数据样本的耗时有较大的差别，基于N个第一校验数据表之间的取样差异，将N个第一待校验数据表划分为M个数据表组的实现方式可以包括：确定每个第一待校验数据表的取样耗时；根据每个第一待校验数据表的取样耗时，将N个第一待校验数据表划分为M个数据表组，以使每个数据表组所对应的至少一个第一待校验数据表的总取样耗时之间的差别尽可能小。As another possible implementation method, due to differences in structure, data volume, etc. of different data tables, the time consumed to extract a certain amount of data samples from different data tables varies greatly. Based on the sampling differences between the N first verification data tables, an implementation method for dividing the N first data tables to be verified into M data table groups may include: determining the sampling time of each first data table to be verified; and dividing the N first data tables to be verified into M data table groups according to the sampling time of each first data table to be verified, so as to minimize the difference in total sampling time of at least one first data table to be verified corresponding to each data table group.

步骤103，将M个数据表组所对应的校验任务依次分配至M个线程组。Step 103: distribute the verification tasks corresponding to the M data table groups to the M thread groups in sequence.

也就是说，将M个数据表组所对应的校验任务一一对应的分配至M个线程组，以使每个线程组执行特定数据表组所对应的数据校验任务。其中，M个线程组可以为预先基于硬件条件配置的。That is, the verification tasks corresponding to the M data table groups are assigned to the M thread groups one by one, so that each thread group performs the data verification task corresponding to a specific data table group. The M thread groups may be pre-configured based on hardware conditions.

步骤104，基于M个线程组，根据校验任务，对N个第一待校验数据表及其各自对应的第二待校验数据表中的数据进行数据校验。Step 104 , based on the M thread groups and according to the verification task, data verification is performed on the data in the N first data tables to be verified and their corresponding second data tables to be verified.

也就是说，由M个线程组并行完成校验任务，以实现对N个第一待校验数据表及其各自对应的第二待校验数据表中的数据进行数据校验。That is, the M thread groups complete the verification task in parallel, so as to implement data verification on the data in the N first to-be-verified data tables and their respective corresponding second to-be-verified data tables.

作为一种示例，若数据表组1所对应的数据校验任务分配至线程组1，数据表组1中包含第一待校验数据表A、第一待校验数据表B、第一待校验数据表C，则线程组1根据校验任务，按照行范围，批量从第一待校验数据表A取出第一批第一待校验数据，并从第一待校验数据表A对应的第二待校验数据表中取出第一批第二待校验数据；将第一批第一待校验数据和第一批第二待校验数据逐行进行比对，获得第一批数据的校验结果；继续按照上述方法从第一待校验数据表A及其对应的第二待校验数据表中取出第二批待校验数据进行校验，以此类推，直至完成对第一待校验数据表A、第一待校验数据表B和第一待校验数据表C及其各自对应的第二待校验数据表中数据的数据校验为止。As an example, if the data verification task corresponding to data table group 1 is assigned to thread group 1, and data table group 1 includes a first data table A to be verified, a first data table B to be verified, and a first data table C to be verified, then thread group 1, according to the verification task and according to the row range, batches takes out the first batch of first data to be verified from the first data table A to be verified, and takes out the first batch of second data to be verified from the second data table to be verified corresponding to the first data table A to be verified; compares the first batch of first data to be verified with the first batch of second data to be verified row by row to obtain the verification results of the first batch of data; continues to take out the second batch of data to be verified from the first data table A to be verified and its corresponding second data table to be verified according to the above method for verification, and so on, until the data verification of the first data table A to be verified, the first data table B to be verified, the first data table C to be verified, and their respective corresponding second data tables to be verified is completed.

需要说明的是，校验任务的类型可以为以下至少一种：指定行的数据校验类型、指定时间段的数据校验类型、增量数据校验类型、全量数据校验类型。It should be noted that the type of the verification task can be at least one of the following: data verification type for a specified row, data verification type for a specified time period, incremental data verification type, and full data verification type.

举例而言，若校验任务为指定行的数据校验类型，则根据校验任务，对N个第一待校验数据表及其各自对应的第二待校验数据表中的数据进行数据校验，相当于对N个第一待校验数据表及其各自对应的第二待校验数据表中待校验行的数据进行数据校验；若校验任务为指定时间段的数据校验类型，则根据校验任务，对N个第一待校验数据表及其各自对应的第二待校验数据表中的数据进行数据校验，相当于对N个第一待校验数据表及其各自对应的第二待校验数据表中指定时间段的数据进行数据校验；若校验任务为增量数据校验类型，则根据校验任务，对N个第一待校验数据表及其各自对应的第二待校验数据表中的数据进行数据校验，相当于对N个第一待校验数据表及其各自对应的第二待校验数据表中新增数据进行数据校验；若校验任务为全量数据校验类型，则根据校验任务，对N个第一待校验数据表及其各自对应的第二待校验数据表中的数据进行数据校验，相当于对N个第一待校验数据表及其各自对应的第二待校验数据表中全部数据进行数据校验。For example, if the verification task is a data verification type for a specified row, then according to the verification task, data verification is performed on the data in the N first data tables to be verified and their corresponding second data tables to be verified, which is equivalent to performing data verification on the data of the rows to be verified in the N first data tables to be verified and their corresponding second data tables to be verified; if the verification task is a data verification type for a specified time period, then according to the verification task, data verification is performed on the data in the N first data tables to be verified and their corresponding second data tables to be verified, which is equivalent to performing data verification on the data in the specified time period in the N first data tables to be verified and their corresponding second data tables to be verified; if the verification task is an incremental data verification type, then according to the verification task, data verification is performed on the data in the N first data tables to be verified and their corresponding second data tables to be verified, which is equivalent to performing data verification on the newly added data in the N first data tables to be verified and their corresponding second data tables to be verified; if the verification task is a full data verification type, then according to the verification task, data verification is performed on the data in the N first data tables to be verified and their corresponding second data tables to be verified, which is equivalent to performing data verification on all the data in the N first data tables to be verified and their corresponding second data tables to be verified.

在本申请的一些实施例中，该方法还可以包括：获取数据校验结果，并根据数据校验结果，对存在差异的数据进行定位。其中数据校验结果中可以包括存在差异的数据所在数据表的标识信息、所在数据表的行标识和列标识等信息，所以可以根据数据校验结果，确定存在差异的数据的位置。此外，还可以对存在差异的数据进行修复，以实现数据库之间数据的一致性。In some embodiments of the present application, the method may further include: obtaining data verification results, and locating the data with differences according to the data verification results. The data verification results may include identification information of the data table where the data with differences is located, row identification and column identification of the data table where the data with differences is located, so the location of the data with differences can be determined according to the data verification results. In addition, the data with differences can also be repaired to achieve data consistency between databases.

根据本申请实施例的数据校验方法，通过基于N个第一待校验数据表之间的取样差异，将N个第一待校验数据表划分为M个数据表组，并将M个数据表组所对应的校验任务依次分配至M个线程组，基于M个线程组，根据校验任务，对N个第一待校验数据表及其各自对应的第二待校验数据表中的数据进行数据校验。本方案通过基于各个第一待校验数据表之间的取样差异对第一待校验数据表分组，以降低每个数据表组所对应的校验任务的取样差异，从而使不同的并发线程组所对应校验任务分配更均匀，进而可以从整体上提高数据校验的效率。According to the data verification method of the embodiment of the present application, the N first data tables to be verified are divided into M data table groups based on the sampling differences between the N first data tables to be verified, and the verification tasks corresponding to the M data table groups are sequentially assigned to the M thread groups. Based on the M thread groups, according to the verification tasks, data verification is performed on the data in the N first data tables to be verified and their respective corresponding second data tables to be verified. This solution groups the first data tables to be verified based on the sampling differences between the first data tables to be verified to reduce the sampling differences of the verification tasks corresponding to each data table group, thereby making the verification tasks corresponding to different concurrent thread groups more evenly distributed, thereby improving the efficiency of data verification as a whole.

接下来，将针对划分数据表组的过程进行详细介绍。Next, the process of dividing data table groups will be introduced in detail.

图2为本申请实施例所提供的另一种数据校验方法的流程图。如图2所示，基于上述实施例，图1中的步骤102可以包括以下步骤：FIG2 is a flow chart of another data verification method provided by an embodiment of the present application. As shown in FIG2, based on the above embodiment, step 102 in FIG1 may include the following steps:

步骤201，确定每个第一待校验数据表的取样耗时。Step 201, determining the sampling time consumption of each first to-be-checked data table.

其中，每个第一待校验数据表的取样耗时是指，针对每个第一待校验数据表，从该第一待校验数据表中取出预设数量的数据样本所消耗的时长。The sampling time consumption of each first data table to be verified refers to the time consumed for extracting a preset number of data samples from each first data table to be verified.

作为一种可能的实现方式，确定每个第一待校验数据表的取样耗时的实现过程可以包括：针对每个第一待校验数据表，对从第一待校验数据表中取出预设数量样本的过程进行模拟，并将模拟过程所耗费的时长作为第一待校验数据表的取样耗时。As a possible implementation method, the process of determining the sampling time of each first data table to be checked may include: for each first data table to be checked, simulating the process of taking a preset number of samples from the first data table to be checked, and taking the time consumed by the simulation process as the sampling time of the first data table to be checked.

步骤202，根据取样耗时，将N个第一待校验数据表划分为M个数据表组。Step 202: divide the N first to-be-checked data tables into M data table groups according to the sampling time consumption.

由于每个第一待校验数据表的结构和数据量均不同，所以从不同的第一待校验数据表中取出一定量的数据样本的耗时有各不相同。在数据校验过程中，从数据表中批量取出数据的过程的耗时与数据校验过程的耗时有直接的影响，所以可以根据取样耗时，将N个第一待校验数据划分为M个数据表组，以使每个数据表组中包含的第一待校验数据表的总的取样耗时之间的差值尽可能小，从而可以使各个线程组在执行对应的校验任务所消耗的时长较为接近，以从整体上提高数据校验的效率。Since the structure and data volume of each first data table to be verified are different, the time consumption of extracting a certain amount of data samples from different first data tables to be verified is different. In the data verification process, the time consumption of the process of batch extracting data from the data table has a direct impact on the time consumption of the data verification process, so the N first data to be verified can be divided into M data table groups according to the sampling time consumption, so that the difference between the total sampling time consumption of the first data table to be verified contained in each data table group is as small as possible, so that the time consumed by each thread group in executing the corresponding verification task can be relatively close, so as to improve the efficiency of data verification as a whole.

作为一种可能的实现方式，根据取样耗时，将N个第一待校验数据表划分为M个数据表组的实现过程可以包括以下步骤：As a possible implementation manner, according to the sampling time consumption, the implementation process of dividing the N first to-be-checked data tables into M data table groups may include the following steps:

步骤202-1，根据取样耗时，确定M个线程组的平均耗时。Step 202-1, determining the average time consumption of M thread groups according to the sampling time consumption.

作为一种可能的实现方式，可以将N个第一待校验数据表中每个第一待校验数据表的取样耗时进行加和计算，得到N个第一待校验数据表的总取样耗时；将N个第一待校验数据表的总取样耗时除以M，得到M个线程组的平均耗时。As a possible implementation method, the sampling time of each first data table to be checked in the N first data tables to be checked can be added up to obtain the total sampling time of the N first data tables to be checked; the total sampling time of the N first data tables to be checked is divided by M to obtain the average time of the M thread groups.

步骤202-2，根据平均耗时和取样耗时，将N个第一待校验数据表划分为M个数据表组。Step 202-2: divide the N first to-be-checked data tables into M data table groups according to the average time consumption and the sampling time consumption.

也就是说，将M个第一待校验数据表划为M个数据表组，以使每个数据表中所包含的第一待校验数据表的总取样耗时与平均耗时尽可能接近。That is, the M first data tables to be checked are divided into M data table groups, so that the total sampling time of the first data tables to be checked contained in each data table is as close as possible to the average time.

作为一种示例，即可以以预设的方式进行第一轮数据表组的划分，确定每个数据表组的耗时差，其中，每个数据表组的耗时差为每个数据表所包含的第一待校验数据表组的总耗时与平均耗时的差值；将N个数据表组的耗时差进行加和运行，得出总耗时差；根据每个数据表组的耗时差和总耗时差，进行下一轮数据表组的划分，连续迭代，直至总耗时差最小且不再变化为止。As an example, the first round of data table group division can be performed in a preset manner to determine the time difference of each data table group, wherein the time difference of each data table group is the difference between the total time consumption and the average time consumption of the first data table group to be checked contained in each data table; the time differences of N data table groups are added up to obtain the total time difference; based on the time difference of each data table group and the total time difference, the next round of data table group division is performed, and it is iterated continuously until the total time difference is minimized and no longer changes.

根据本申请实施例的数据校验方法，在将N个第一待校验数据表划分为M个数据表组时，先确定每个第一待校验数据表的取样耗时，根据取样耗时，将N个第一待校验数据表划分为M个数据表组，以使每个数据表组所包含的第一待校验数据表的总取样耗时接近，这样，可以使每个线程组数据校验的耗时接近，从而可以通过并发处理充分利用资源，也可以提升整体的校验效率。According to the data verification method of the embodiment of the present application, when N first data tables to be verified are divided into M data table groups, the sampling time of each first data table to be verified is first determined, and based on the sampling time, the N first data tables to be verified are divided into M data table groups, so that the total sampling time of the first data tables to be verified contained in each data table group is close. In this way, the time consumption of data verification of each thread group can be close, so that resources can be fully utilized through concurrent processing, and the overall verification efficiency can be improved.

接下来，将针对根据平均耗时和取样耗时，将N个第一待校验数据表划分为M个数据表组的过程进行详细介绍。Next, a process of dividing the N first to-be-checked data tables into M data table groups according to the average time consumption and the sampling time consumption will be described in detail.

图3为本申请实施例所提供的又一种数据校验方法的流程图。如图3所示，基于上述实施例，图2中的步骤202-2的实现过程可以包括以下步骤：FIG3 is a flow chart of another data verification method provided by an embodiment of the present application. As shown in FIG3 , based on the above embodiment, the implementation process of step 202-2 in FIG2 may include the following steps:

步骤301，将N个第一待校验数据表，按照取样耗时由大到小的顺序进行排序。Step 301 , sort the N first to-be-checked data tables in descending order of sampling time consumption.

步骤302，针对每个数据表组，分别将排序结果中的第i个第一待校验数据表和第N-i+1个第一待校验数据表划分到第i个数据表组；其中，i为正整数，且M小于或者等于M。Step 302: for each data table group, respectively divide the i-th first data table to be checked and the N-i+1-th first data table to be checked in the sorting result into the i-th data table group; wherein i is a positive integer, and M is less than or equal to M.

如图4所示，将排序结果中的第一个第一待校验数据表和第N个第一待校验数据表划分到第一个数据表组，将排序结果中的第二个第一待校验数据表和第N-1个第一待校验数据表划分到第二个数据表组，将排序结果中的第三个第一待校验数据表和第N-2个第一待校验数据表划分到第三个数据表组，以此类推，将第M个第一待校验数据表和第N-M+1个第一待校验数据表划分到第M数据表组。As shown in Figure 4, the first first data table to be checked and the Nth first data table to be checked in the sorting result are divided into the first data table group, the second first data table to be checked and the N-1th first data table to be checked in the sorting result are divided into the second data table group, the third first data table to be checked and the N-2th first data table to be checked in the sorting result are divided into the third data table group, and so on, the Mth first data table to be checked and the N-M+1th first data table to be checked are divided into the Mth data table group.

步骤303，若存在剩余的第一待校验数据表，根据平均耗时和取样耗时，继续将剩余的第一待校验数据表划分至M个数据表组。Step 303: If there are any remaining first data tables to be checked, the remaining first data tables to be checked are further divided into M data table groups according to the average time consumption and the sampling time consumption.

也就是说，若经过步骤302后，还存在未划分至数据表组的剩余的第一待校验数据表，可以继续对剩余的第一待校验数据表划分至M个数据表组中，以使最终划分每个数据表组包含的第一待校验数据表的总取样耗时与平均耗时的差值最小。That is to say, if after step 302, there are still remaining first data tables to be checked that have not been divided into data table groups, the remaining first data tables to be checked can continue to be divided into M data table groups so that the difference between the total sampling time and the average time of the first data tables to be checked included in each data table group is minimized.

作为一种可能的实现方式，步骤303的实现过程可以包括：As a possible implementation manner, the implementation process of step 303 may include:

步骤303-1，确定每个数据表组包含的第一待校验数据表的总取样耗时。Step 303 - 1 , determining the total sampling time consumption of the first to-be-checked data table included in each data table group.

作为一种示例，若经过步骤302的划分后，第一个数据表组中包括第一待校验数据表A和第一待校验数据表B，则第一个数据表组中的第一待校验数据表的总取样耗时为第一待校验数据表A和第一待校验数据表B的取样耗时之和。As an example, if after the division in step 302, the first data table group includes the first data table A to be checked and the first data table B to be checked, then the total sampling time of the first data table to be checked in the first data table group is the sum of the sampling times of the first data table A to be checked and the first data table B to be checked.

步骤303-2，从M个数据表组中确定S个目标数据表组；其中，S为小于或者等于M的正整数，且每个目标数据表组对应的总取样耗时小于平均耗时。Step 303 - 2 , determining S target data table groups from the M data table groups; wherein S is a positive integer less than or equal to M, and the total sampling time corresponding to each target data table group is less than the average time.

作为一种可能的实现方式，可以针对每个数据表组，将该数据表组所对应的总取样耗时分平均耗时进行比对，若该数据表组所对应的总取样耗时小于平均耗时，则将该数据表组作为目标数据表组。As a possible implementation, for each data table group, the total sampling time corresponding to the data table group may be compared with the average time. If the total sampling time corresponding to the data table group is less than the average time, the data table group is used as the target data table group.

步骤303-3，针对每个目标数据表组，分别从剩余的第一待校验数据表中取出一个第一待校验数据表，并将其划分至目标数据表组。Step 303 - 3 : for each target data table group, take out a first data table to be checked from the remaining first data tables to be checked, and divide it into the target data table group.

步骤303-4，返回执行确定每个数据表组包含的多个第一待校验数据表的总取样耗时的步骤，直至N个第一待校验数据表均被划分至M个数据表组。Step 303 - 4 , returning to the step of determining the total sampling time consumption of the plurality of first to-be-checked data tables included in each data table group, until the N first to-be-checked data tables are all divided into M data table groups.

也就是说，如图4所示，在将剩余的第一待校验数据表划分至M个数据表组的过程中可以包括至少一轮划分过程。每一轮划分过程，向每个目标数据表组分配一个剩余的第一待校验数据表，再返回执行步骤303-1，直至N个第一待校验数据表组均被划分到M个数据表为止。若第一个数据表组所对应的总取样耗时大于平均耗时，则不再将剩余的第一待校验数据表划分至该数据表组。That is, as shown in FIG4 , the process of dividing the remaining first data table to be checked into M data table groups may include at least one round of division process. In each round of division process, one remaining first data table to be checked is allocated to each target data table group, and then the process returns to step 303-1 until all N first data table groups to be checked are divided into M data tables. If the total sampling time corresponding to the first data table group is greater than the average sampling time, the remaining first data table to be checked is no longer divided into the data table group.

其中，在执行步骤303-3时，针对每个目标数据表组，可以根据该目标数据表组所包含的多个第一待校验数据表的总取样耗时、平均耗时和每个剩余的第一待校验数据表的取样耗时，从剩余的第一待校验数据表中取出一个第一待校验数据表，并将其划分至该目标数据表组，以使该目标数据表组所对应的总取样耗时接近于平均耗时。Among them, when executing step 303-3, for each target data table group, a first data table to be checked can be taken out from the remaining first data tables to be checked according to the total sampling time, the average time and the sampling time of each remaining first data table to be checked of the multiple first data tables to be checked contained in the target data table group, and it can be divided into the target data table group, so that the total sampling time corresponding to the target data table group is close to the average time.

根据本申请实施例的数据校验方法，在划分数据表组时，先对N个第一待校验数据表按照取样耗时由大到小的顺序进行排序，并针对每个数据表组，分别将排序结果中的第i个第一待校验数据表和第N-i+1个第一待校验数据表划分到第i个数据表组，若存在剩余的第一待校验数据表，根据平均耗时和取样耗时，继续将剩余的第一待校验数据表划分至M个数据表组中，以使每个数据表组所对应的总取样耗时接近于平均耗时，从而可以使每个线程组数据校验所耗费时长接近，不仅可以通过并发处理充分利用资源，也可以提升整体的校验效率。According to the data verification method of the embodiment of the present application, when dividing the data table groups, the N first data tables to be verified are first sorted in descending order of sampling time consumption, and for each data table group, the i-th first data table to be verified and the N-i+1-th first data table to be verified in the sorting results are respectively divided into the i-th data table group; if there are remaining first data tables to be verified, the remaining first data tables to be verified are further divided into M data table groups according to the average time consumption and the sampling time consumption, so that the total sampling time corresponding to each data table group is close to the average time consumption, so that the time consumed for data verification of each thread group can be close, which can not only make full use of resources through concurrent processing, but also improve the overall verification efficiency.

接下来，将针对数据校验过程进行详细介绍。Next, the data verification process will be introduced in detail.

图5为本申请实施例所提供的又一种数据校验方法的流程图。如图5所示，图1中的步骤104的实现过程可以包括：FIG5 is a flow chart of another data verification method provided by an embodiment of the present application. As shown in FIG5, the implementation process of step 104 in FIG1 may include:

步骤501，基于当前线程组，根据校验任务，对当前线程组所对应的待校验数据，按照数据列维度进行批量数据校验。Step 501 , based on the current thread group and according to the verification task, batch data verification is performed on the to-be-verified data corresponding to the current thread group according to the data column dimension.

可以理解，由于数据表中可以包含成千上万条数据，若对待校验数据逐行遍历的方式进行校验，则整个数据校验过程耗时较长。若按照列维度进行批量数据校验，则可以大大提高数据校验效率，减少数据校验过程的耗时。It is understandable that since a data table can contain tens of thousands of data, if the data to be verified is verified row by row, the entire data verification process will take a long time. If batch data verification is performed according to the column dimension, the data verification efficiency can be greatly improved and the time consumption of the data verification process can be reduced.

其中，按照列维度进行批量数据校验是指通过逐列进行校验的方式，且通过一次数据校验过程可以完成一列数据的数据校验。Among them, batch data verification according to the column dimension refers to a method of verifying column by column, and data verification of a column of data can be completed through one data verification process.

作为一种可能的实现方式，步骤501的实现过程可以包括：As a possible implementation manner, the implementation process of step 501 may include:

步骤501-1，根据校验任务，从当前线程组所对应的数据表组包含的第一待校验数据表中取出第j批第一待校验数据，同时从对应的第二待校验数据表中取出第j批第二待校验数据；其中j为正整数。Step 501-1, according to the verification task, take out the jth batch of first data to be verified from the first data table to be verified contained in the data table group corresponding to the current thread group, and at the same time take out the jth batch of second data to be verified from the corresponding second data table to be verified; wherein j is a positive integer.

在本申请的一些实施例中，从当前线程组所对应的数据表组包含的第一待校验数据表中取出第j批第一待校验数据时，可以按照时间范围来取数据，也可以按照预设的行范围来取数据，本申请对此不作限定。In some embodiments of the present application, when taking out the jth batch of first data to be verified from the first data table to be verified contained in the data table group corresponding to the current thread group, the data can be taken according to the time range or according to the preset row range, and the present application does not limit this.

需要说明的是，若当前线程组所对应的数据表组中包含多个第一待校验数据表，则当前线程组可以对以上多个第一待校验数据表及其各自对应的第二待校验数据表的数据以串行的方式进行数据校验。It should be noted that if the data table group corresponding to the current thread group includes multiple first data tables to be verified, the current thread group can perform data verification on the multiple first data tables to be verified and their corresponding second data tables to be verified in a serial manner.

步骤501-2，分别对第j批第一待校验数据和第j批第二待校验数据，按照主键进行排序，获得排序后的第一结果集和第二结果集。Step 501 - 2 , sorting the j-th batch of first data to be verified and the j-th batch of second data to be verified according to the primary key, to obtain a sorted first result set and a sorted second result set.

可以理解，按照主键排序后的第一结果集与第二结果集行与行之间是相互对应的，便于对数据进行校验。It can be understood that the rows of the first result set and the second result set sorted by the primary key correspond to each other, which facilitates data verification.

步骤501-3，将第一结果集中的每列数据映射为一个定长的第一数字字串。Step 501 - 3 , mapping each column of data in the first result set into a first digital string of fixed length.

其中，每列数据所映射的第一数字字串包含该列数据中各个数值所对应的数字，且第一数字字串的长度可以基于实际场景来确定，本申请对此不作限定。The first digital string mapped to each column of data includes the numbers corresponding to each numerical value in the column of data, and the length of the first digital string can be determined based on the actual scenario, which is not limited in the present application.

在本申请的一些实施例中，第一结果集中的每列数据所对应的第一数字字串可以由每列数据中各个数值所对应的数子组成。也就是说，将第一结果集中的每列数据映射为一个定长的第一数字字串的实现过程可以包括：将第一结果集的每列数据中的每个数值分别映射为一个定长的数字，得到每列数据中每个数值对应的数字所组成的定长的第一数字字串。其中，每列数据所对应的第一数字字串的长度为该列数据中每个数值对应的定长的数字的长度*该列数据的行数。In some embodiments of the present application, the first digital string corresponding to each column of data in the first result set may be composed of numbers corresponding to each value in each column of data. That is, the implementation process of mapping each column of data in the first result set to a fixed-length first digital string may include: mapping each value in each column of data in the first result set to a fixed-length number, respectively, to obtain a fixed-length first digital string composed of numbers corresponding to each value in each column of data. The length of the first digital string corresponding to each column of data is the length of the fixed-length number corresponding to each value in the column of data * the number of rows of the column of data.

需要说明的是，第一结果集中的每列数据的各个数值与定长的数字之间的映射为一对一的关系，即针对每列数据，同一数值所对应的数字是相同的，不同数值所对应的数字也是不同的。若第一结果集中某列数据包含多个相同的数值，则多个相同的数值映射为同一个数字。若第一结果集中不同列数据存在多个相同的数值，则不同列中相同数值可以映射为同一个数字，也可以映射为不同的数字。也就是说，对第一数据集中每列数据进行映射时，可以构建整个第一数据集中各个数值的映射关系，也可以针对第一数据集中的每列数据构建各个数值的映射关系。It should be noted that the mapping between each numerical value of each column of data in the first result set and a fixed-length number is a one-to-one relationship, that is, for each column of data, the number corresponding to the same numerical value is the same, and the numbers corresponding to different numerical values are also different. If a column of data in the first result set contains multiple identical numerical values, the multiple identical numerical values are mapped to the same number. If there are multiple identical numerical values in different columns of data in the first result set, the same numerical values in different columns can be mapped to the same number or to different numbers. In other words, when mapping each column of data in the first data set, a mapping relationship between each numerical value in the entire first data set can be constructed, or a mapping relationship between each numerical value can be constructed for each column of data in the first data set.

作为一种可能的实现方式，将第一结果集的每列数据中的每个数值分别映射为一个定长的数字的过程包括：基于预设的字典库，将第一结果集的每列数据中的每个数值分别映射为一个定长的数字，得到每列数据所对应的第一数字字串。其中字典库中包括各个数值与数字的映射关系。As a possible implementation, the process of mapping each value in each column of data of the first result set to a fixed-length number includes: based on a preset dictionary library, mapping each value in each column of data of the first result set to a fixed-length number to obtain a first digital string corresponding to each column of data. The dictionary library includes a mapping relationship between each value and a number.

作为另一种可能的实现方式，若对第一数据集中每列数据进行映射时，针对第一数据集中的每列数据分别构建各个数值的映射关系，则将第一结果集的每列数据中的每个数值分别映射为一个定长的数字的过程包括：如图6所示，针对第一结果集的每列数据，将该列数据中每个数值按照对应的主键映射为连续的定长数字，比如，可以将每个数值映射为一个将每个数值映射为1001，1002，1003…，若出现相同的数值则将基于已建立的映射关系，将相同的数值映射为同一个数字，将每列数据中各个数值所对应的数字按照顺序进行拼接，得到对应的第一数字字串。As another possible implementation method, if a mapping relationship for each numerical value is constructed for each column of data in the first data set when mapping each column of data in the first data set, then the process of mapping each numerical value in each column of data in the first result set to a fixed-length number includes: as shown in Figure 6, for each column of data in the first result set, each numerical value in the column of data is mapped to a continuous fixed-length number according to the corresponding primary key. For example, each numerical value can be mapped to 1001, 1002, 1003..., if the same numerical value appears, it will be mapped to the same number based on the established mapping relationship, and the numbers corresponding to each numerical value in each column of data are concatenated in order to obtain the corresponding first digital string.

步骤501-4，基于第一结果集的映射关系，将第二结果集中的每列数据映射为一个定长的第二数字字串。Step 501 - 4 , based on the mapping relationship of the first result set, map each column of data in the second result set into a second digital string of fixed length.

可以理解，数据校验的目的是判断第一结果集与第二结果集中的数据是否一致，通常大多数数据是一致的，可能存在个别数据不一致的问题，所以为了提升映射的效率，可以直接按照第一结果集的映射关系来对第二结果集中的每列数据进行映射。It can be understood that the purpose of data verification is to determine whether the data in the first result set is consistent with the data in the second result set. Usually, most of the data is consistent, but there may be some inconsistency in individual data. Therefore, in order to improve the efficiency of mapping, each column of data in the second result set can be directly mapped according to the mapping relationship of the first result set.

也就是说，在对第二结果集中的每列数据进行映射时，将第一结果集中各列数据及其各自对应的第一数值字串作为映射字典库。That is, when mapping each column of data in the second result set, each column of data in the first result set and its corresponding first numerical string are used as a mapping dictionary library.

作为一种示例，若第一结果集的映射关系为以整个第一结果集的维度构建的，则针对在对第二结果集中的每列数据，在对该列数据中每个数值进行映射时，先判断第一结果集中是否存在与其相同的数值；若第一结果集中存在与其相同的数值，则确定第一结果集中与其相同是数值所映射的目标数字，并将第二结果集中该列数据对应数值映射为目标数字；若第一结果集中未存在与其相同的数值，则将该数值映射为一个第一结果集所构建的映射关系中未出现过的定长的数字。As an example, if the mapping relationship of the first result set is constructed based on the dimensions of the entire first result set, then for each column of data in the second result set, when mapping each value in the column of data, first determine whether there is the same value in the first result set; if there is the same value in the first result set, then determine the target number mapped to the same value in the first result set, and map the corresponding value of the column of data in the second result set to the target number; if there is no same value in the first result set, then map the value to a fixed-length number that has not appeared in the mapping relationship constructed by the first result set.

作为另一种示例，若第一结果集的映射关系为以列维度构建的，则针对在对第二结果集中的每列数据，在对该列数据中每个数值进行映射时，先判断第一结果集的对应列是否存在与其相同的数值；若第一结果集的对应列存在与其相同的数值，则确定第一结果集中对应列中与其相同的数值所映射的目标数字，将第二结果集中该列数据对应数值映射为目标数字；若第一结果集的对应列中未存在与其相同的数值，则将该数值映射为一个第一结果集对应列所映射的数字中未出现过的定长的数字。As another example, if the mapping relationship of the first result set is constructed based on the column dimension, then for each column of data in the second result set, when mapping each value in the column of data, first determine whether the corresponding column of the first result set has the same value; if the corresponding column of the first result set has the same value, then determine the target number mapped to the same value in the corresponding column of the first result set, and map the corresponding value of the column of data in the second result set to the target number; if the corresponding column of the first result set does not have the same value, then map the value to a fixed-length number that has not appeared in the numbers mapped by the corresponding column of the first result set.

步骤501-5，将第一结果集中每列数据对应的第一数字字串分别与对应的第二数字字串进行比对，以实现第j批数据的数据校验。Step 501 - 5 , comparing the first digital string corresponding to each column of data in the first result set with the corresponding second digital string respectively, so as to implement data verification of the j-th batch of data.

可以理解，若第一结果集中某列数据所对应的第一数字字串与对应的第二数字字串一致，则说明第一结果集中该列数据与第二结果集中对应列的数据一致，若第一结果集中某列数据所对应的第一数字字串与对应的第二数字字串非一致，则说明第一结果集中该列数据与第二结果集中对应列的数据存在数据不一致的问题。It can be understood that if the first digital string corresponding to a column of data in the first result set is consistent with the corresponding second digital string, it means that the data in the column of the first result set is consistent with the data in the corresponding column of the second result set. If the first digital string corresponding to a column of data in the first result set is inconsistent with the corresponding second digital string, it means that there is a data inconsistency problem between the data in the column of the first result set and the data in the corresponding column of the second result set.

在本申请的一些实施例中，若第一结果集中的某列数据所对应的第一数字字串与对应的第二数字字串存在差异，可以根据数字字串的差异位置实现差异数据的定位。比如，如图6所示，若针对某列数据，第一数字字串与第二数字字串在pos位置处存在差异，则可以确定第一结果集与第二结果集在该列的第(pos/N)+(pos％N>0？1:0)行存在差异，需要对此处的数据进行修复。其中N为每个数值所映射的数字字串的长度，即将pos除以每个数值所映射的数字字串长度，若结果中没有余数，则得到的商作为差异所在行数，若结果中有余数，则将得到的商加1作为差异所在行数。In some embodiments of the present application, if there is a difference between the first digital string corresponding to a column of data in the first result set and the corresponding second digital string, the location of the difference data can be achieved according to the difference position of the digital string. For example, as shown in Figure 6, if there is a difference between the first digital string and the second digital string at the pos position for a column of data, it can be determined that the first result set and the second result set have a difference in the (pos/N)+(pos%N>0?1:0)th row of the column, and the data here needs to be repaired. Where N is the length of the digital string mapped to each value, that is, pos is divided by the length of the digital string mapped to each value. If there is no remainder in the result, the quotient obtained is used as the number of rows where the difference is located. If there is a remainder in the result, the quotient obtained plus 1 is used as the number of rows where the difference is located.

步骤501-6，将j置为j+1，并返回执行根据校验任务，从第一个数据表组所包含的第一待校验数据表中取出第j批第一待校验数据，同时从对应的第二待校验数据表中取出第j批第二待校验数据的步骤，直至完成第一个数据组所对应的校验任务。Step 501-6, set j to j+1, and return to execute the step of taking out the j-th batch of first data to be verified from the first data table to be verified contained in the first data table group according to the verification task, and at the same time taking out the j-th batch of second data to be verified from the corresponding second data table to be verified, until the verification task corresponding to the first data group is completed.

在本申请的一些实施例中，上述步骤501-3的实现过程还可以包括：判断第一结果集与第二结果集中数据的行数是否一致；若第一结果集与第二结果集中数据的行数一致，将第一结果集中的每列数据映射为一个定长的第一数字字串。若第一结果集与第二结果集中数据的行数非一致，则说明第一待校验数据表与对应的第二待校验数据表中可能存在数据遗漏的问题，所以针对这种情况，可以采用逐行校验的方式。In some embodiments of the present application, the implementation process of the above step 501-3 may also include: determining whether the number of rows of data in the first result set is consistent with that in the second result set; if the number of rows of data in the first result set is consistent with that in the second result set, mapping each column of data in the first result set to a first fixed-length digital string. If the number of rows of data in the first result set is inconsistent with that in the second result set, it means that there may be a problem of missing data in the first data table to be verified and the corresponding second data table to be verified, so for this situation, a row-by-row verification method can be adopted.

根据本申请实施例的数据校验方法，在基于M个线程组进行数据校验时，按照数据列维度进行批量数据校验，将每列数据映射为定长的数字字串，通过对应数字字串的比对实现对将每列数据的批量数据校验，从而可以进一步提升数据校验的效率。According to the data verification method of the embodiment of the present application, when performing data verification based on M thread groups, batch data verification is performed according to the data column dimension, each column of data is mapped to a fixed-length digital string, and batch data verification of each column of data is implemented by comparing the corresponding digital strings, thereby further improving the efficiency of data verification.

为了实现上述实施例，本申请提供了一种数据校验装置。In order to implement the above embodiment, the present application provides a data verification device.

图7为本申请实施例所提供的一种数据校验装置的结构框图。如图7所示，包括：FIG7 is a structural block diagram of a data verification device provided in an embodiment of the present application. As shown in FIG7 , it includes:

确定模块701，用于确定N个第一待校验数据表，以及每个第一待校验数据表对应的第二待校验数据表；其中，N为正整数；The determination module 701 is used to determine N first data tables to be checked, and a second data table to be checked corresponding to each first data table to be checked; wherein N is a positive integer;

分组模块702，用于基于N个第一待校验数据表之间的取样差异，将N个第一待校验数据表划分为M个数据表组；其中，M为大于1的正整数；A grouping module 702, configured to divide the N first data tables to be checked into M data table groups based on sampling differences between the N first data tables to be checked; wherein M is a positive integer greater than 1;

分配模块703，用于将M个数据表组所对应的校验任务依次分配至M个线程组；An allocation module 703 is used to allocate the verification tasks corresponding to the M data table groups to the M thread groups in sequence;

校验模块704，用于基于M个线程组，根据校验任务，对N个第一待校验数据表及其各自对应的第二待校验数据表中的数据进行数据校验。The verification module 704 is used to perform data verification on the data in the N first to-be-verified data tables and their corresponding second to-be-verified data tables according to the verification task based on the M thread groups.

在本申请的一些实施例中，分组模块702具体用于：In some embodiments of the present application, the grouping module 702 is specifically used to:

确定每个第一待校验数据表的取样耗时；Determine the sampling time of each first to-be-checked data table;

根据取样耗时，将N个第一待校验数据表划分为M个数据表组。According to the sampling time consumption, the N first to-be-checked data tables are divided into M data table groups.

作为一种可能的实现方式，分组模块702还用于：As a possible implementation manner, the grouping module 702 is further configured to:

针对每个第一待校验数据表，对从第一待校验数据表中取出预设数量样本的过程进行模拟，并将模拟过程所耗费的时长作为第一待校验数据表的取样耗时。For each first data table to be verified, a process of taking a preset number of samples from the first data table to be verified is simulated, and the duration of the simulation process is used as the sampling time of the first data table to be verified.

在本申请的一些实施例中，分组模块702还用于：In some embodiments of the present application, the grouping module 702 is further configured to:

根据取样耗时，确定M个线程组的平均耗时；According to the sampling time, determine the average time of M thread groups;

根据平均耗时和取样耗时，将N个第一待校验数据表划分为M个数据表组。According to the average time consumption and the sampling time consumption, the N first to-be-checked data tables are divided into M data table groups.

作为一种示例，分组模块702还用于：As an example, the grouping module 702 is further configured to:

将N个第一待校验数据表，按照取样耗时由大到小的顺序进行排序；Sort the N first to-be-checked data tables in descending order of sampling time consumption;

针对每个数据表组，分别将排序结果中的第i个第一待校验数据表和第N-i+1个第一待校验数据表划分到第i个数据表组；其中，i为正整数，且M小于或者等于M；For each data table group, the i-th first data table to be checked and the N-i+1-th first data table to be checked in the sorting result are respectively divided into the i-th data table group; wherein i is a positive integer, and M is less than or equal to M;

若存在剩余的第一待校验数据表，根据平均耗时和取样耗时，继续将剩余的第一待校验数据表划分至M个数据表组。If there are remaining first data tables to be checked, the remaining first data tables to be checked are further divided into M data table groups according to the average time consumption and the sampling time consumption.

作为另一种示例，分组模块702还用于：As another example, the grouping module 702 is further configured to:

确定每个数据表组包含的第一待校验数据表的总取样耗时；Determine the total sampling time of the first to-be-checked data table included in each data table group;

从M个数据表组中确定S个目标数据表组；其中，S为小于或者等于M的正整数，且每个目标数据表组对应的总取样耗时小于平均耗时；Determine S target data table groups from M data table groups; wherein S is a positive integer less than or equal to M, and the total sampling time corresponding to each target data table group is less than the average time;

针对每个目标数据表组，分别从剩余的第一待校验数据表中取出一个第一待校验数据表，并将其划分至目标数据表组；For each target data table group, taking out a first data table to be checked from the remaining first data tables to be checked, and dividing it into the target data table group;

返回执行确定每个数据表组包含的多个第一待校验数据表的总取样耗时，直至N个第一待校验数据表均被划分至M个数据表组。Return and execute to determine the total sampling time consumption of the multiple first data tables to be checked included in each data table group, until the N first data tables to be checked are all divided into M data table groups.

在本申请的一些实施例中，校验模块704具体用于：In some embodiments of the present application, the verification module 704 is specifically used to:

基于当前线程组，根据校验任务，对当前线程组所对应的待校验数据，按照数据列维度进行批量数据校验。Based on the current thread group and the verification task, batch data verification is performed on the data to be verified corresponding to the current thread group according to the data column dimension.

作为一种可能的实现方式，家用模块704还用于：As a possible implementation, the home module 704 is also used for:

根据校验任务，从当前线程组所对应的数据表组包含的第一待校验数据表中取出第j批第一待校验数据，同时从对应的第二待校验数据表中取出第j批第二待校验数据；According to the verification task, the jth batch of first data to be verified is taken out from the first data table to be verified contained in the data table group corresponding to the current thread group, and the jth batch of second data to be verified is taken out from the corresponding second data table to be verified;

分别对第j批第一待校验数据和第j批第二待校验数据，按照主键进行排序，获得排序后的第一结果集和第二结果集；Sort the j-th batch of first data to be verified and the j-th batch of second data to be verified according to the primary key to obtain a sorted first result set and a sorted second result set;

将第一结果集中的每列数据映射为一个定长的第一数字字串；Map each column of data in the first result set to a first digital string of fixed length;

基于第一结果集的映射关系，将第二结果集中的每列数据映射为一个定长的第二数字字串；Based on the mapping relationship of the first result set, each column of data in the second result set is mapped into a second digital string of fixed length;

将第一结果集中每列数据对应的第一数字字串分别与对应的第二数字字串进行比对，以实现第j批数据的数据校验；Compare the first digital string corresponding to each column of data in the first result set with the corresponding second digital string respectively, so as to implement data verification of the jth batch of data;

将j置为j+1，并返回执行根据校验任务，从第一个数据表组所包含的第一待校验数据表中取出第j批第一待校验数据，同时从对应的第二待校验数据表中取出第j批第二待校验数据的步骤，直至完成第一个数据组所对应的校验任务。Set j to j+1, and return to execute the step of taking out the jth batch of first data to be verified from the first data table to be verified contained in the first data table group according to the verification task, and at the same time taking out the jth batch of second data to be verified from the corresponding second data table to be verified, until the verification task corresponding to the first data group is completed.

作为另一种可能的实现方式，校验模块704还用于：As another possible implementation, the verification module 704 is further configured to:

判断第一结果集与第二结果集中数据的行数是否一致；Determine whether the number of rows of data in the first result set is consistent with that in the second result set;

若第一结果集与第二结果集中数据的行数一致，将第一结果集中的每列数据映射为一个定长的第一数字字串。If the number of rows of data in the first result set is consistent with that in the second result set, each column of data in the first result set is mapped to a first digital string of a fixed length.

在本申请的一些实施例中，校验模块704还用于：In some embodiments of the present application, the verification module 704 is further used to:

获取数据校验结果；Get data verification results;

根据数据校验结果，对存在差异的数据进行定位。According to the data verification results, the data with differences are located.

需要说明的是，所述校验任务的类型为以下至少一种：指定行的数据校验类型、指定时间段的数据校验类型、增量数据校验类型、全量数据校验类型。It should be noted that the type of the verification task is at least one of the following: a data verification type for a specified row, a data verification type for a specified time period, an incremental data verification type, and a full data verification type.

根据本申请实施例的数据校验装置，通过基于N个第一待校验数据表之间的取样差异，将N个第一待校验数据表划分为M个数据表组，并将M个数据表组所对应的校验任务依次分配至M个线程组，基于M个线程组，根据校验任务，对N个第一待校验数据表及其各自对应的第二待校验数据表中的数据进行数据校验。本方案通过基于各个第一待校验数据表之间的取样差异对第一待校验数据表分组，以降低每个数据表组所对应的校验任务的取样差异，从而使不同的并发线程组所对应校验任务分配更均匀，进而可以从整体上提高数据校验的效率。According to the data verification device of the embodiment of the present application, the N first data tables to be verified are divided into M data table groups based on the sampling differences between the N first data tables to be verified, and the verification tasks corresponding to the M data table groups are sequentially assigned to the M thread groups. Based on the M thread groups, according to the verification tasks, data verification is performed on the data in the N first data tables to be verified and their respective corresponding second data tables to be verified. This solution groups the first data tables to be verified based on the sampling differences between the first data tables to be verified to reduce the sampling differences of the verification tasks corresponding to each data table group, thereby making the verification tasks corresponding to different concurrent thread groups more evenly distributed, thereby improving the efficiency of data verification as a whole.

需要说明的是，上述对于数据校验方法实施例中的解释说明同样适用于本申请实施例的数据校验装置，此处不再赘述。It should be noted that the above explanations and descriptions in the data verification method embodiment are also applicable to the data verification device in the embodiment of the present application and will not be repeated here.

为了实现上述实施例，本申请提供了一种电子设备。In order to implement the above embodiments, the present application provides an electronic device.

图8为本申请实施例所提供的一种电子设备的结构框图。该电子设备可以为服务器、计算机等设备。如图8所示，该电子设备包括：FIG8 is a block diagram of an electronic device provided in an embodiment of the present application. The electronic device may be a server, a computer, or other device. As shown in FIG8 , the electronic device includes:

存储器810及处理器820，连接不同组件(包括存储器810和处理器820)的总线830，存储器810存储有处理器820可执行指令；其中，处理器880被配置为执行所述指令，以实现本申请实施例所述的数据校验方法。A memory 810 and a processor 820, a bus 830 connecting different components (including the memory 810 and the processor 820), the memory 810 stores instructions executable by the processor 820; wherein the processor 880 is configured to execute the instructions to implement the data verification method described in the embodiment of the present application.

总线830表示几类总线结构中的一种或多种，包括存储器总线或者存储器控制器，外围总线，图形加速端口，处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说，这些体系结构包括但不限于工业标准体系结构(ISA)总线，微通道体系结构(MAC)总线，增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。Bus 830 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor or a local bus using any of a variety of bus architectures. For example, these architectures include but are not limited to Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) local bus and Peripheral Component Interconnect (PCI) bus.

电子设备800典型地包括多种电子设备可读介质。这些介质可以是任何能够被电子设备800访问的可用介质，包括易失性和非易失性介质，可移动的和不可移动的介质。存储器810还可以包括易失性存储器形式的计算机系统可读介质，例如随机存取存储器(RAM)840和/或高速缓存存储器850。电子设备800可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例，存储系统860可以用于读写不可移动的、非易失性磁介质(图8未显示，通常称为“硬盘驱动器”)。尽管图8中未示出，可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器，以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下，每个驱动器可以通过一个或者多个数据介质接口与总线830相连。存储器810可以包括至少一个程序产品，该程序产品具有一组(例如至少一个)程序模块，这些程序模块被配置以执行本申请各实施例的功能。The electronic device 800 typically includes a variety of electronic device readable media. These media can be any available media that can be accessed by the electronic device 800, including volatile and non-volatile media, removable and non-removable media. The memory 810 can also include computer system readable media in the form of volatile memory, such as random access memory (RAM) 840 and/or cache memory 850. The electronic device 800 can further include other removable/non-removable, volatile/non-volatile computer system storage media. Just as an example, the storage system 860 can be used to read and write non-removable, non-volatile magnetic media (not shown in Figure 8, usually referred to as "hard drive"). Although not shown in Figure 8, a disk drive for reading and writing a removable non-volatile disk (such as a "floppy disk"), and an optical disk drive for reading and writing a removable non-volatile optical disk (such as a CD-ROM, DVD-ROM or other optical media) can be provided. In these cases, each drive can be connected to the bus 830 through one or more data medium interfaces. The memory 810 may include at least one program product. The program product has a set (eg, at least one) of program modules. The program modules are configured to execute the functions of various embodiments of the present application.

具有一组(至少一个)程序模块870的程序/实用工具880，可以存储在例如存储器810中，这样的程序模块870包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据，这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块870通常执行本申请所描述的实施例中的功能和/或方法。A program/utility 880 having a set (at least one) of program modules 870 may be stored, for example, in the memory 810, such program modules 870 including but not limited to an operating system, one or more application programs, other program modules, and program data, each of which or some combination may include an implementation of a network environment. The program modules 870 generally perform the functions and/or methods of the embodiments described herein.

电子设备800也可以与一个或多个外部设备890(例如键盘、指向设备、显示器891等)通信，还可与一个或者多个使得用户能与该电子设备800交互的设备通信，和/或与使得该电子设备800能与一个或多个其它计算设备进行通信的任何设备(例如网卡，调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口898进行。并且，电子设备800还可以通过网络适配器893与一个或者多个网络(例如局域网(LAN)，广域网(WAN)和/或公共网络，例如因特网)通信。如图所示，网络适配器893通过总线830与电子设备800的其它模块通信。应当明白，尽管图中未示出，可以结合电子设备800使用其它硬件和/或软件模块，包括但不限于：微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The electronic device 800 may also communicate with one or more external devices 890 (e.g., keyboard, pointing device, display 891, etc.), may also communicate with one or more devices that enable a user to interact with the electronic device 800, and/or communicate with any device that enables the electronic device 800 to communicate with one or more other computing devices (e.g., network card, modem, etc.). Such communication may be performed via an input/output (I/O) interface 898. In addition, the electronic device 800 may also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN) and/or public network, such as the Internet) via a network adapter 893. As shown, the network adapter 893 communicates with other modules of the electronic device 800 via a bus 830. It should be understood that, although not shown in the figure, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.

处理器860通过运行存储在存储器810中的程序，从而执行各种功能应用以及数据处理。The processor 860 executes various functional applications and data processing by running the programs stored in the memory 810 .

需要说明的是，本实施例的电子设备的实施过程和技术原理参见前述对本申请实施例的数据校验方法的解释说明，此处不再赘述。It should be noted that the implementation process and technical principles of the electronic device of this embodiment refer to the aforementioned explanation of the data verification method of the embodiment of the present application, and will not be repeated here.

为了实现上述实施例，本申请还提出一种计算机存储介质。In order to implement the above embodiments, the present application also proposes a computer storage medium.

其中，该存储介质中的指令由服务器的处理器执行时，使得服务器能够执行如前所述的数据校验方法。可选的，计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。When the instructions in the storage medium are executed by the processor of the server, the server can perform the data verification method as described above. Optionally, the computer-readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, etc.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" etc. means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present application. In this specification, the schematic representations of the above terms do not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described may be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled in the art may combine and combine the different embodiments or examples described in this specification and the features of the different embodiments or examples, without contradiction.

此外，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中，“多个”的含义是至少两个，例如两个，三个等，除非另有明确具体的限定。In addition, the terms "first" and "second" are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features. Therefore, the features defined as "first" and "second" may explicitly or implicitly include at least one of the features. In the description of this application, the meaning of "plurality" is at least two, such as two, three, etc., unless otherwise clearly and specifically defined.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为，表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分，并且本申请的优选实施方式的范围包括另外的实现，其中可以不按所示出或讨论的顺序，包括根据所涉及的功能按基本同时的方式或按相反的顺序，来执行功能，这应被本申请的实施例所属技术领域的技术人员所理解。Any process or method description in a flowchart or otherwise described herein may be understood to represent a module, fragment or portion of code comprising one or more executable instructions for implementing the steps of a custom logical function or process, and the scope of the preferred embodiments of the present application includes alternative implementations in which functions may not be performed in the order shown or discussed, including performing functions in a substantially simultaneous manner or in the reverse order depending on the functions involved, which should be understood by technicians in the technical field to which the embodiments of the present application belong.

在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何计算机可读介质中，以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用，或结合这些指令执行系统、装置或设备而使用。就本说明书而言，"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下：具有一个或多个布线的电连接部(电子装置)，便携式计算机盘盒(磁装置)，随机存取存储器(RAM)，只读存储器(ROM)，可擦除可编辑只读存储器(EPROM或闪速存储器)，光纤装置，以及便携式光盘只读存储器(CDROM)。另外，计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质，因为可以例如通过对纸或其他介质进行光学扫描，接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序，然后将其存储在计算机存储器中。The logic and/or steps represented in the flowchart or otherwise described herein, for example, can be considered as an ordered list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by an instruction execution system, device or apparatus (such as a computer-based system, a system including a processor, or other system that can fetch instructions from an instruction execution system, device or apparatus and execute the instructions), or in combination with these instruction execution systems, devices or apparatuses. For the purpose of this specification, "computer-readable medium" can be any device that can contain, store, communicate, propagate or transmit a program for use by an instruction execution system, device or apparatus, or in combination with these instruction execution systems, devices or apparatuses. More specific examples of computer-readable media (a non-exhaustive list) include the following: an electrical connection with one or more wires (electronic device), a portable computer disk box (magnetic device), a random access memory (RAM), a read-only memory (ROM), an erasable and programmable read-only memory (EPROM or flash memory), a fiber optic device, and a portable compact disk read-only memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which the program is printed, since the program may be obtained electronically, for example, by optically scanning the paper or other medium and then editing, interpreting or processing in other suitable ways if necessary, and then stored in a computer memory.

应当理解，本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如，如果用硬件来实现和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列(PGA)，现场可编程门阵列(FPGA)等。It should be understood that the various parts of the present application can be implemented by hardware, software, firmware or a combination thereof. In the above-mentioned embodiments, multiple steps or methods can be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented by hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or their combination: a discrete logic circuit having a logic gate circuit for implementing a logic function for a data signal, a dedicated integrated circuit having a suitable combination of logic gate circuits, a programmable gate array (PGA), a field programmable gate array (FPGA), etc.

本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，该程序在执行时，包括方法实施例的步骤之一或其组合。A person skilled in the art may understand that all or part of the steps in the method for implementing the above-mentioned embodiment may be completed by instructing related hardware through a program, and the program may be stored in a computer-readable storage medium, which, when executed, includes one or a combination of the steps of the method embodiment.

此外，在本申请各个实施例中的各功能单元可以集成在一个处理模块中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present application may be integrated into a processing module, or each unit may exist physically separately, or two or more units may be integrated into one module. The above-mentioned integrated module may be implemented in the form of hardware or in the form of a software functional module. If the integrated module is implemented in the form of a software functional module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.

尽管上面已经示出和描述了本申请的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本申请的限制，本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present application have been shown and described above, it can be understood that the above embodiments are exemplary and cannot be understood as limitations on the present application. Ordinary technicians in this field can change, modify, replace and modify the above embodiments within the scope of the present application.

Claims

1. A data verification method, comprising:

Determine N first data tables to be verified, and a second data table to be verified corresponding to each of the first data tables to be verified; wherein N is a positive integer;

Based on the sampling differences between the N first data tables to be checked, the N first data tables to be checked are divided into M data table groups; wherein M is a positive integer greater than 1;

Allocate the verification tasks corresponding to the M data table groups to the M thread groups in sequence;

Based on the M thread groups, and according to the verification task, data verification is performed on the data in the N first data tables to be verified and their respective corresponding second data tables to be verified;

The dividing the N first data tables to be checked into M data table groups based on the sampling differences between the N first data tables to be checked includes:

Determine the sampling time of each of the first data tables to be verified, wherein the sampling time of each of the first data tables to be verified refers to the time consumed for extracting a preset number of data samples from each of the first data tables to be verified;

Determine the average time consumption of the M thread groups according to the sampling time consumption;

According to the average time consumption and the sampling time consumption, the N first to-be-checked data tables are divided into M data table groups.

2. The method according to claim 1, characterized in that the step of determining the sampling time of each of the first to-be-checked data tables comprises:

For each of the first data tables to be verified, a process of taking a preset number of samples from the first data table to be verified is simulated, and the duration of the simulation process is used as the sampling time of the first data table to be verified.

3. The method according to claim 1, characterized in that the dividing the N first to-be-checked data tables into M data table groups according to the average time consumption and the sampling time consumption comprises:

Sorting the N first to-be-checked data tables in descending order of sampling time consumption;

For each of the data table groups, respectively divide the i-th first data table to be checked and the N-i+1-th first data table to be checked in the sorting result into the i-th data table group; wherein i is a positive integer, and M is less than or equal to M;

If there are remaining first data tables to be checked, the remaining first data tables to be checked continue to be divided into the M data table groups according to the average time consumption and the sampling time consumption.

4. The method according to claim 3, characterized in that the step of continuing to divide the remaining first data table to be checked into the M data table groups according to the average time consumption and the sampling time consumption comprises:

Determine the total sampling time of the first to-be-checked data table included in each of the data table groups;

Determine S target data table groups from the M data table groups; wherein S is a positive integer less than or equal to M, and the total sampling time corresponding to each of the target data table groups is less than the average time;

For each target data table group, taking out a first data table to be checked from the remaining first data tables to be checked, and dividing it into the target data table group;

Return to the step of determining the total sampling time consumption of the plurality of first to-be-checked data tables included in each of the data table groups until the N first to-be-checked data tables are all divided into the M data table groups.

5. The method according to claim 1, characterized in that the step of performing data verification on the data in the N first data tables to be verified and their respective corresponding second data tables to be verified based on the M thread groups and according to the verification task comprises:

Based on the current thread group and according to the verification task, batch data verification is performed on the to-be-verified data corresponding to the current thread group according to the data column dimension.

6. The method according to claim 5, characterized in that the step of performing batch data verification on the to-be-verified data corresponding to the current thread group according to the verification task according to the data column dimension comprises:

According to the verification task, taking out the jth batch of first data to be verified from the first data table to be verified included in the data table group corresponding to the current thread group, and taking out the jth batch of second data to be verified from the corresponding second data table to be verified; wherein j is a positive integer;

sorting the j-th batch of first data to be verified and the j-th batch of second data to be verified according to the primary key to obtain a sorted first result set and a sorted second result set;

Mapping each column of data in the first result set into a first digital string of fixed length;

Based on the mapping relationship of the first result set, mapping each column of data in the second result set to a second digital string of fixed length;

Compare the first digital string corresponding to each column of data in the first result set with the corresponding second digital string respectively, so as to implement data verification of the jth batch of data;

Set j to j+1, and return to execute the step of taking out the jth batch of first data to be verified from the first data table to be verified contained in the first data table group according to the verification task, and at the same time taking out the jth batch of second data to be verified from the corresponding second data table to be verified, until the verification task corresponding to the first data group is completed.

7. The method according to claim 6, wherein mapping each column of data in the first result set into a first digital string of fixed length comprises:

Determine whether the number of rows of data in the first result set is consistent with that in the second result set;

If the number of rows of data in the first result set is consistent with that in the second result set, each column of data in the first result set is mapped to a first digital string of a fixed length.

8. The method according to claim 1, further comprising:

Get data verification results;

According to the data verification result, the data with differences are located.

9. The method according to any one of claims 1-8 is characterized in that the type of the verification task is at least one of the following: a data verification type for a specified row, a data verification type for a specified time period, an incremental data verification type, and a full data verification type.

10. A data verification device, comprising:

A determination module, used to determine N first data tables to be checked, and a second data table to be checked corresponding to each of the first data tables to be checked; wherein N is a positive integer;

A grouping module, configured to divide the N first data tables to be checked into M data table groups based on sampling differences between the N first data tables to be checked; wherein M is a positive integer greater than 1;

An allocation module, used for allocating the verification tasks corresponding to the M data table groups to the M thread groups in sequence;

A verification module, configured to perform data verification on the data in the N first to-be-verified data tables and their respective corresponding second to-be-verified data tables based on the M thread groups and according to the verification task;

11. An electronic device, comprising:

processor;

a memory for storing executable instructions for the processor;

The processor is configured to execute the instructions to implement the method according to any one of claims 1 to 9.

12. A computer-readable storage medium, characterized in that when the instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device is enabled to execute the method as claimed in any one of claims 1 to 9.