CN115168307A

CN115168307A - Data synchronization method, system, device and storage medium supporting resuming transmission from breakpoints

Info

Publication number: CN115168307A
Application number: CN202210869514.5A
Authority: CN
Inventors: 陈钟浩; 管瑞峰; 姚海杰; 刘晋昊; 钟远东; 胡钊滨
Original assignee: Shanghai Zhijing Information Technology Co ltd
Current assignee: Shanghai Zhijing Information Technology Co ltd
Priority date: 2022-07-22
Filing date: 2022-07-22
Publication date: 2022-10-11
Anticipated expiration: 2042-07-22
Also published as: CN115168307B

Abstract

The invention relates to a data synchronization method, a system, equipment and a storage medium for supporting breakpoint continuous transmission, wherein the technical scheme is as follows: acquiring log file data of a source database; establishing a table according to the log file data; writing corresponding data nodes in the data of the table according to the data characteristics of the data in the table; transmitting the data written with the data nodes to a target database, judging whether information fed back by the target database and receiving the data nodes is received or not under the condition that the data nodes are transmitted in the process of transmitting the data written with the data nodes to the target database, and storing the position information and the corresponding storage time of the data nodes under the condition that the information fed back by the target database is received; the method and the device have the advantages that after the breakpoint, the data which are not successfully transmitted can be rapidly transmitted again, the data which are transmitted before are prevented from being synchronized again, and the efficiency of data synchronization is greatly improved.

Description

Data synchronization method, system, device and storage medium supporting resuming transmission from breakpoints

技术领域technical field

本发明涉及数据传输技术领域，更具体地说，它涉及一种支持断点续传的数据同步方法、系统、设备及存储介质。The present invention relates to the technical field of data transmission, and more particularly, to a data synchronization method, system, device and storage medium supporting resumed transmission from a breakpoint.

背景技术Background technique

现有的大数据平台通常采用离线数据同步,而各业务系统中存在需要历史数据重算的场景且历史数据量巨大，若仍采用离线的方式将历史数据重新同步一次会占有大量的计算资源且难以界定数据同步的范围，导致数据仓库中保存的历史数据与实际的业务数据存在一定的误差，其数据准确性无法支撑某些场景下的业务需求，另外由于业务或大屏等其他对实时性要求较高的需求，对平台提出了数据实时性的要求。Existing big data platforms usually use offline data synchronization, and there are scenarios in which historical data recalculation is required in each business system and the amount of historical data is huge. It is difficult to define the scope of data synchronization, resulting in a certain error between the historical data stored in the data warehouse and the actual business data, and the accuracy of the data cannot support the business needs in certain scenarios. The higher requirements put forward the requirements for the real-time data of the platform.

发明内容SUMMARY OF THE INVENTION

针对现有技术存在的不足，本发明的目的在于提供一种支持断点续传的数据同步方法、系统、设备及存储介质，具有在断点后，可快速对未传输成功的数据重新进行传输，避免了对之前传输过的数据重新进行同步的功能优点。Aiming at the deficiencies of the prior art, the purpose of the present invention is to provide a data synchronization method, system, device and storage medium that support resumed transmission from a breakpoint, which can quickly retransmit data that has not been successfully transmitted after a breakpoint. , the functional advantage of avoiding resynchronization of previously transmitted data.

本发明的上述技术目的是通过以下技术方案得以实现的：The above-mentioned technical purpose of the present invention is achieved through the following technical solutions:

一种支持断点续传的数据同步方法，包括：A data synchronization method that supports resuming transmission from a breakpoint, comprising:

获取源数据库的日志文件数据；Get the log file data of the source database;

根据所述日志文件数据建表；Build a table according to the log file data;

根据所述表中数据的数据特征在所述表的数据中写入对应的数据节点；Write the corresponding data node in the data of the table according to the data characteristics of the data in the table;

将写入有数据节点的数据传输给目标数据库，在将写入有数据节点的数据传输给目标数据库的过程中传输数据节点的情况下，判断是否接收有所述目标数据库反馈的接收有该数据节点的信息，在有接收所述目标数据库反馈的该信息的情况下，存储该数据节点的位置信息和对应的存储时间；Transfer the data written in the data node to the target database, and in the case of transmitting the data node in the process of transmitting the data written in the data node to the target database, determine whether the data that is fed back by the target database is received. The information of the node, in the case of receiving the information fed back by the target database, store the location information of the data node and the corresponding storage time;

接收断点续传请求，确定所述断点续传请求的请求时间；Receive a resuming request from a breakpoint, and determine the request time of the resuming request from a breakpoint;

根据所述请求时间确定与其时间点最近的存储时间对应的目标数据节点；Determine the target data node corresponding to the storage time closest to the time point according to the request time;

根据所述目标数据节点和写入有数据节点的数据确定未传输给所述目标数据库的数据，并将其继续传输给目标数据库。The data not transmitted to the target database is determined according to the target data node and the data written to the data node, and the data is continuously transmitted to the target database.

可选的，所述数据特征包括：数据波峰和数据波谷；所述根据所述表中数据的数据特征在所述表的数据中写入对应的数据节点，包括：Optionally, the data features include: data peaks and data troughs; and writing corresponding data nodes in the data of the table according to the data features of the data in the table, including:

判断所述表中的数据是否有数据波峰，在有数据波峰的情况下，在该数据波峰处写入对应的数据节点；Determine whether the data in the table has a data peak, and if there is a data peak, write the corresponding data node at the data peak;

判断所述表中的数据是否有数据波谷，在有数据波谷的情况下，在该数据波谷处写入对应的数据节点。It is judged whether the data in the table has a data trough, and if there is a data trough, the corresponding data node is written at the data trough.

可选的，所述数据特征还包括：数据消耗时间；所述数据消耗时间根据实时网络传输速度确定；所述根据所述表中数据的数据特征在所述表的数据中写入对应的数据节点，还包括：Optionally, the data characteristics further include: data consumption time; the data consumption time is determined according to the real-time network transmission speed; the data is written in the table data according to the data characteristics of the table corresponding data node, which also includes:

判断所述表中相邻数据节点之间的数据所需的数据消耗时间是否大于预设时间阈值，在相邻数据节点之间的数据所需的数据消耗时间大于预设时间阈值的情况下，根据预设时间阈值和实时网络传输速度计算得到数据长度，在相邻数据节点中的前一数据节点处加上所述数据长度得到数据写入点，在数据写入点处写入对应的数据节点。Determine whether the data consumption time required for data between adjacent data nodes in the table is greater than a preset time threshold, and in the case that the data consumption time required for data between adjacent data nodes is greater than the preset time threshold, Calculate the data length according to the preset time threshold and the real-time network transmission speed, add the data length to the previous data node in the adjacent data nodes to obtain the data writing point, and write the corresponding data at the data writing point node.

可选的，所述根据所述日志文件数据建表，包括：Optionally, the building a table according to the log file data includes:

解析所述日志文件数据得到数据库名、表名、操作类型、主键和所有字段值；Parse the log file data to obtain database name, table name, operation type, primary key and all field values;

根据所述数据库名、表名、操作类型、主键和所有字段值进行建表。Create a table according to the database name, table name, operation type, primary key and all field values.

可选的，在所述根据所述日志文件数据建表之后，还包括：Optionally, after creating the table according to the log file data, the method further includes:

根据所述主键将所述表中的数据划分为多个数据行；dividing the data in the table into a plurality of data rows according to the primary key;

根据所述预设配置将各个数据行划分为多个数据块。Each data row is divided into a plurality of data blocks according to the preset configuration.

可选的，所述将写入有数据节点的数据传输给目标数据库，包括：Optionally, the transmitting the data written in the data node to the target database includes:

根据时间戳依次读取各个所述数据行中的所有数据块，其中，所述时间戳从日志文件数据中获取；Read all data blocks in each of the data rows in sequence according to the timestamp, wherein the timestamp is obtained from log file data;

在读取到所述数据块的前端的情况下，在该数据块的前端标记第一binlog位置，其中，所述第一binlog位置为从日志文件数据中获取的数据块前端的位置信息；In the case of reading the front end of the data block, mark the first binlog position at the front end of the data block, wherein the first binlog position is the position information of the front end of the data block obtained from the log file data;

在读取到所述数据块的后端的情况下，在该数据块的后端标记第二binlog位置，其中，所述第二binlog位置为从日志文件数据中获取的数据块后端的位置信息；When the back end of the data block is read, a second binlog position is marked at the back end of the data block, wherein the second binlog position is the position information of the back end of the data block obtained from the log file data;

将标记有所述第一binlog位置和第二binlog位置的数据块传输给目标数据端。The data block marked with the first binlog position and the second binlog position is transmitted to the target data end.

可选的，还包括：Optionally, also include:

在将写入有数据节点的数据传输目标数据库的过程中所述源数据库有数据变化的情况下，根据变化的数据对应的主键确定该变化的数据对应的数据行，将该变化的数据的位置与该数据行的所有第一binlog位置和所有第二binlog位置进行比较确定该变化的数据对应的数据块，根据该变化的数据对应的主键对与其对应的数据块的数据进行变化。In the case of data changes in the source database during the process of transferring the data written to the data node to the target database, the data row corresponding to the changed data is determined according to the primary key corresponding to the changed data, and the position of the changed data is determined. The data block corresponding to the changed data is determined by comparing with all the first binlog positions and all the second binlog positions of the data row, and the data of the corresponding data block is changed according to the primary key corresponding to the changed data.

一种支持断点续传的数据同步系统，包括：A data synchronization system supporting breakpoint resuming, including:

数据获取模块，用于获取源数据库的日志文件数据；The data acquisition module is used to acquire the log file data of the source database;

数据建表模块，用于根据所述日志文件数据建表；a data table building module for building a table according to the log file data;

节点写入模块，用于根据所述表中数据的数据特征在所述表的数据中写入对应的数据节点；a node writing module, configured to write corresponding data nodes in the data of the table according to the data characteristics of the data in the table;

传输判断模块，用于将写入有数据节点的数据传输给目标数据库，在将写入有数据节点的数据传输给目标数据库的过程中传输数据节点的情况下，判断是否接收有所述目标数据库反馈的接收有该数据节点的信息，在有接收所述目标数据库反馈的该信息的情况下，存储该数据节点的位置信息和对应的存储时间；The transmission judgment module is used to transmit the data written in the data node to the target database, and in the case of transmitting the data node in the process of transmitting the data written in the data node to the target database, determine whether the target database is received The information of the data node is received in the feedback, and in the case of receiving the information fed back by the target database, the location information of the data node and the corresponding storage time are stored;

接收请求模块，用于接收断点续传请求，确定所述断点续传请求的请求时间；a receiving request module, configured to receive a breakpoint resume request, and determine the request time of the breakpoint resume request;

节点确定模块，用于根据所述请求时间确定与其时间点最近的存储时间对应的目标数据节点；a node determination module, configured to determine the target data node corresponding to the storage time closest to its time point according to the request time;

数据续传模块，用于根据所述目标数据节点和写入有数据节点的数据确定未传输给所述目标数据库的数据，并将其继续传输给目标数据库。A data retransmission module, configured to determine the data not transmitted to the target database according to the target data node and the data written in the data node, and continue to transmit the data to the target database.

一种计算机设备,包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现上述的方法的步骤。A computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method when the processor executes the computer program.

一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现上述的方法的步骤。A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the above-mentioned method.

综上所述，本发明具有以下有益效果：在断点后，可快速对未传输成功的数据重新进行传输，避免了对之前传输过的数据重新进行同步，大大提高了数据同步的效率；且相比于对数据进行固定分段的划分，通过在数据波峰和数据波谷处写入对应的数据节点，以及通过实时网络传输速度确定数据消耗时间，然后根据数据消耗时间和预设时间阈值将相邻数据节点之间的数据传输的时间控制在预设时间阈值内，以实现了根据数据的特征对数据进行划分，能够动态的决定以多少数据为一个单位来进行处理，更适用于对数据的实时同步传输。To sum up, the present invention has the following beneficial effects: after the breakpoint, the data that has not been successfully transmitted can be quickly retransmitted, avoiding the resynchronization of the previously transmitted data, and greatly improving the efficiency of data synchronization; and Compared with dividing the data into fixed segments, the data consumption time is determined by writing the corresponding data nodes at the data peaks and data valleys, and the real-time network transmission speed, and then according to the data consumption time and the preset time threshold. The time of data transmission between adjacent data nodes is controlled within the preset time threshold, so as to realize the division of data according to the characteristics of the data, and it can dynamically decide how much data to process as a unit, which is more suitable for data processing. Real-time synchronous transmission.

附图说明Description of drawings

图1是本发明提供的支持断点续传的数据同步方法的流程示意图；Fig. 1 is the schematic flow chart of the data synchronization method supporting breakpoint resume provided by the present invention;

图2是本发明提供的支持断点续传的数据同步系统的结构框图；Fig. 2 is the structural block diagram of the data synchronization system supporting breakpoint resume provided by the present invention;

图3是本发明实施例中计算机设备的内部结构图。FIG. 3 is an internal structure diagram of a computer device in an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、特征和优点能够更加明显易懂，下面结合附图对本发明的具体实施方式做详细的说明。附图中给出了本发明的若干实施例。但是，本发明可以以许多不同的形式来实现，并不限于本文所描述的实施例。In order to make the objects, features and advantages of the present invention more clearly understood, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Several embodiments of the invention are presented in the accompanying drawings. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein.

在本发明中，除非另有明确的规定和限定，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。In the present invention, unless otherwise expressly specified and limited, the terms "first" and "second" are only used for description purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the indicated technical features. quantity. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature.

下面结合附图和实施例，对本发明进行详细描述。The present invention will be described in detail below with reference to the accompanying drawings and embodiments.

本发明提供了一种支持断点续传的数据同步方法,如图1所示，包括：The present invention provides a data synchronization method supporting breakpoint resuming, as shown in FIG. 1 , including:

步骤100、获取源数据库的日志文件数据；具体地，源数据库采用关系型数据库，通过直接监控关系型数据库(也就是源数据库)的binlog日志，从而获取日志文件数据，binlog是一个二进制格式的文件，用于记录用户对数据库更新的SQL语句信息；Step 100: Obtain the log file data of the source database; specifically, the source database adopts a relational database, and obtains log file data by directly monitoring the binlog log of the relational database (that is, the source database), and binlog is a binary format file , which is used to record the SQL statement information that the user updates to the database;

步骤200、根据所述日志文件数据建表；Step 200, building a table according to the log file data;

步骤300、根据所述表中数据的数据特征在所述表的数据中写入对应的数据节点；Step 300, write a corresponding data node in the data of the table according to the data characteristic of the data in the table;

步骤400、将写入有数据节点的数据传输给目标数据库，在将写入有数据节点的数据传输给目标数据库的过程中传输数据节点的情况下，判断是否接收有所述目标数据库反馈的接收有该数据节点的信息，在有接收所述目标数据库反馈的该信息的情况下，则说明该数据节点之前的数据均已传输至目标数据库，存储该数据节点的位置信息和对应的存储时间，所述存储时间为完成存储该数据节点的位置信息的时间点；Step 400: Transmit the data written with the data node to the target database, and in the case of transmitting the data node in the process of transmitting the data written with the data node to the target database, determine whether to receive the feedback of the target database. With the information of the data node, in the case of receiving the information fed back by the target database, it means that the data before the data node has been transmitted to the target database, and the location information of the data node and the corresponding storage time are stored, The storage time is the time point when the location information of the data node is stored;

步骤500、接收断点续传请求，确定所述断点续传请求的请求时间；Step 500: Receive a breakpoint resume request, and determine the request time of the breakpoint resume request;

步骤600、根据所述请求时间确定与其时间点最近的存储时间对应的目标数据节点；在实际应用中，在出现断点之前，可能会出现已经存储有多个数据节点对应的位置信息，通过所述断点续传请求的请求时间从而确定距离该请求时间最近的存储时间对应的数据节点，也就是确定了在断点之前最后存储的数据节点的位置信息，并将其作为目标数据节点；Step 600: Determine the target data node corresponding to the storage time closest to its time point according to the request time; in practical applications, before the breakpoint occurs, there may be location information corresponding to multiple data nodes that have been stored. Describe the request time of the breakpoint resuming request to determine the data node corresponding to the storage time closest to the request time, that is, to determine the location information of the data node that was last stored before the breakpoint, and use it as the target data node;

步骤700、根据所述目标数据节点和写入有数据节点的数据确定未传输给所述目标数据库的数据，并将其继续传输给目标数据库，其中，写入有数据节点的数据具体是指从根据所述表中数据的数据特征在所述表的数据中写入对应的数据节点生成的数据。Step 700: Determine the data not transmitted to the target database according to the target data node and the data written to the data node, and continue to transmit it to the target database, wherein the data written to the data node specifically refers to the data from the target database. Write the data generated by the corresponding data node in the data of the table according to the data characteristics of the data in the table.

在实际应用中，步骤100-步骤400为在数据传输正常，也就是未因网络中断或断电等故障而导致数据传输中断的情况下所执行的正常传输的步骤，在出现数据传输中断(也就是断点)然后重新启动数据传输任务后，也就是发出断点续传请求后，则开始执行步骤500-步骤700。本申请的支持断点续传的数据同步方法在断点后，可快速对未传输成功的数据重新进行传输，避免了对之前传输过的数据重新进行同步，大大提高了数据同步的效率。In practical applications, steps 100 to 400 are the normal transmission steps performed when the data transmission is normal, that is, the data transmission is not interrupted due to network interruption or power failure. is the breakpoint) and after restarting the data transmission task, that is, after a request for resuming the transmission from the breakpoint is issued, steps 500 to 700 are executed. After the breakpoint, the data synchronization method of the present application supporting resumed transmission can quickly retransmit the data that has not been successfully transmitted, thereby avoiding resynchronizing the previously transmitted data, and greatly improving the efficiency of data synchronization.

进一步地，所述数据特征包括：数据波峰和数据波谷；所述根据所述表中数据的数据特征在所述表的数据中写入对应的数据节点，包括：Further, the data characteristics include: data peaks and data troughs; the writing corresponding data nodes in the data of the table according to the data characteristics of the data in the table, including:

判断所述表中的数据是否有数据波峰，在有数据波峰的情况下，在该数据波峰处写入对应的数据节点；具体地，在有多个数据波峰的情况下，在各个数据波峰处写入与其对应的数据节点；Determine whether the data in the table has a data peak, and if there is a data peak, write the corresponding data node at the data peak; specifically, in the case of multiple data peaks, at each data peak Write to its corresponding data node;

判断所述表中的数据是否有数据波谷，在有数据波谷的情况下，在该数据波谷处写入对应的数据节点；具体地，在有多个数据波谷的情况下，在各个数据波谷处写入与其对应的数据节点。Determine whether the data in the table has a data trough, and in the case of a data trough, write the corresponding data node at the data trough; specifically, in the case of multiple data troughs, at each data trough Write to its corresponding data node.

在实际应用中，相比于对数据进行固定分段的划分，通过在数据波峰和数据波谷处写入对应的数据节点，以实现了根据数据的特征对数据进行划分，更适用于对数据的实时同步传输。In practical applications, compared to dividing data into fixed segments, by writing corresponding data nodes at data peaks and data troughs, the data can be divided according to the characteristics of the data, which is more suitable for data segmentation. Real-time synchronous transmission.

进一步地，所述数据特征还包括：数据消耗时间；所述数据消耗时间根据实时网络传输速度确定；所述根据所述表中数据的数据特征在所述表的数据中写入对应的数据节点，还包括：Further, the data characteristics further include: data consumption time; the data consumption time is determined according to the real-time network transmission speed; the data in the table is written according to the data characteristics of the data in the table corresponding data nodes ,Also includes:

判断所述表中相邻数据节点之间的数据所需的数据消耗时间是否大于预设时间阈值，所述预设时间阈值表示一个时间段，在相邻数据节点之间的数据所需的数据消耗时间大于预设时间阈值的情况下，也就是说明根据数据波峰和数据波谷确定的数据节点中有相邻数据节点之间的数据消耗时间过长，若在此相邻数据节点之间出现数据断点则不利于高效的数据同步，因此，根据预设时间阈值和实时网络传输速度计算得到数据长度，在相邻数据节点中的前一数据节点处加上所述数据长度得到数据写入点，在数据写入点处写入对应的数据节点。Determine whether the data consumption time required for data between adjacent data nodes in the table is greater than a preset time threshold, and the preset time threshold represents a time period, the data required for data between adjacent data nodes When the consumption time is greater than the preset time threshold, it means that the data consumption time between adjacent data nodes in the data nodes determined according to the data peaks and data troughs is too long. Breakpoints are not conducive to efficient data synchronization. Therefore, the data length is calculated according to the preset time threshold and the real-time network transmission speed, and the data write point is obtained by adding the data length to the previous data node in the adjacent data nodes. , write the corresponding data node at the data write point.

具体地，根据实时网络传输速度确定数据消耗时间，然后根据数据消耗时间和预设时间阈值将相邻数据节点之间的数据传输的时间控制在预设时间阈值内，能够动态的决定以多少数据为一个单位来进行处理，从而使得在出现数据传输断点后，无需花费过长的时间从目标数据节点处重新传输之前一传输过的数据，进一步提高了数据同步的效率。Specifically, the data consumption time is determined according to the real-time network transmission speed, and then the data transmission time between adjacent data nodes is controlled within the preset time threshold according to the data consumption time and the preset time threshold, so that how much data to use can be dynamically determined. The processing is performed as a unit, so that after a data transmission breakpoint occurs, it does not take a long time to retransmit the previously transmitted data from the target data node, which further improves the efficiency of data synchronization.

进一步地，所述根据所述日志文件数据建表，包括：Further, building a table according to the log file data includes:

进一步地，在所述根据所述日志文件数据建表之后，还包括：Further, after creating the table according to the log file data, it also includes:

在实际应用中，将所述表中数据根据主键划分为多个数据行，在进行数据同步的过程中，能够对多个数据行中的数据同时进行传输，实现对数据的多并发传输，从而能够进一步提高数据同步传输的效果，预设配置可为预设的数据长度，根据预设配置将各个数据行划分为多个数据块，在源数据库的数据发生变化的情况下，能够直接对发生变化的数据对应的数据块进行修正，以实现对数据的实时更新。In practical applications, the data in the table is divided into multiple data rows according to the primary key, and in the process of data synchronization, the data in the multiple data rows can be transmitted at the same time, so as to realize the multiple concurrent transmission of the data, thereby It can further improve the effect of data synchronous transmission. The preset configuration can be a preset data length, and each data row is divided into multiple data blocks according to the preset configuration. When the data in the source database changes, it can directly The data blocks corresponding to the changed data are revised to realize real-time updating of the data.

进一步地，所述将写入有数据节点的数据传输给目标数据库，包括：Further, the transmitting the data written with the data node to the target database includes:

进一步地，还包括：Further, it also includes:

在将写入有数据节点的数据传输目标数据库的过程中所述源数据库有数据变化的情况下，所述数据变化包括：数据增加、数据删除和数据修改，根据变化的数据对应的主键确定该变化的数据对应的数据行，将该变化的数据的位置与该数据行的所有第一binlog位置和所有第二binlog位置进行比较得到比较结果，根据比较结果确定该变化的数据对应的数据块，在比较结果为该变化的数据的位置位于一第一binlog位置和与该第一binlog位置对应的第二binlog位置之间的情况下，则确定该变化的数据对应的数据块为该第一binlog位置和与该第一binlog位置对应的数据块，根据该变化的数据对应的主键对与其对应的数据块的数据进行变化，若变化的数据是进行增加、删除或修改，则对对应数据库内的数据进行对应的增加、删除或修改操作。In the case of data changes in the source database in the process of transferring the data written into the data node to the target database, the data changes include: data addition, data deletion and data modification, and the data change is determined according to the primary key corresponding to the changed data. For the data row corresponding to the changed data, compare the position of the changed data with all the first binlog positions and all the second binlog positions of the data row to obtain a comparison result, and determine the data block corresponding to the changed data according to the comparison result, If the comparison result is that the position of the changed data is located between a first binlog position and a second binlog position corresponding to the first binlog position, it is determined that the data block corresponding to the changed data is the first binlog The position and the data block corresponding to the first binlog position are changed according to the primary key corresponding to the changed data. If the changed data is added, deleted or modified, the corresponding data in the database Data is added, deleted or modified accordingly.

通过数据块、第一binloug位置的标记和第二binlog位置的标记，加快了对变化的数据在表中的位置查找速度，不仅能够实现对数据的实时更新，还提高了数据同步的效率。Through the data block, the mark of the first binloug position and the mark of the second binlog position, the speed of searching for the position of the changed data in the table is accelerated, which can not only realize the real-time update of the data, but also improve the efficiency of data synchronization.

本发明的支持断点续传的数据同步方法，在断点后，可快速对未传输成功的数据重新进行传输，避免了对之前传输过的数据重新进行同步，大大提高了数据同步的效率；且相比于对数据进行固定分段的划分，通过在数据波峰和数据波谷处写入对应的数据节点，以及通过实时网络传输速度确定数据消耗时间，然后根据数据消耗时间和预设时间阈值将相邻数据节点之间的数据传输的时间控制在预设时间阈值内，以实现了根据数据的特征对数据进行划分，能够动态的决定以多少数据为一个单位来进行处理，更适用于对数据的实时同步传输。The data synchronization method of the present invention that supports resumed transmission from a breakpoint can quickly retransmit the data that has not been successfully transmitted after the breakpoint, avoids resynchronizing the previously transmitted data, and greatly improves the efficiency of data synchronization; And compared to dividing the data into fixed segments, the data consumption time is determined by writing the corresponding data nodes at the data peaks and data troughs, and the real-time network transmission speed, and then according to the data consumption time and preset time thresholds. The time of data transmission between adjacent data nodes is controlled within the preset time threshold, so as to realize the division of data according to the characteristics of the data, and it can dynamically decide how much data to process as a unit, which is more suitable for data processing. real-time synchronous transmission.

如图2所示，本发明还提供了一种应用的安装包的构建分发系统，包括：As shown in Figure 2, the present invention also provides a construction and distribution system for an application installation package, including:

数据获取模块10，用于获取源数据库的日志文件数据；The data acquisition module 10 is used for acquiring the log file data of the source database;

数据建表模块20，用于根据所述日志文件数据建表；A data table building module 20, for building a table according to the log file data;

节点写入模块30，用于根据所述表中数据的数据特征在所述表的数据中写入对应的数据节点；A node writing module 30, configured to write corresponding data nodes in the data of the table according to the data characteristics of the data in the table;

传输判断模块40，用于将写入有数据节点的数据传输给目标数据库，在将写入有数据节点的数据传输给目标数据库的过程中传输数据节点的情况下，判断是否接收有所述目标数据库反馈的接收有该数据节点的信息，在有接收所述目标数据库反馈的该信息的情况下，存储该数据节点的位置信息和对应的存储时间；The transmission judgment module 40 is used to transmit the data written with the data node to the target database, and in the case of transmitting the data node in the process of transmitting the data written with the data node to the target database, judge whether to receive the target database The information of the data node is received in the database feedback, and in the case of receiving the information fed back by the target database, the location information of the data node and the corresponding storage time are stored;

接收请求模块50，用于接收断点续传请求，确定所述断点续传请求的请求时间；The receiving request module 50 is used for receiving a request for resuming the transmission from a breakpoint, and determining the request time of the resuming request for the transmission from a breakpoint;

节点确定模块60，用于根据所述请求时间确定与其时间点最近的存储时间对应的目标数据节点；Node determination module 60, configured to determine the target data node corresponding to the storage time closest to its time point according to the request time;

数据续传模块70，用于根据所述目标数据节点和写入有数据节点的数据确定未传输给所述目标数据库的数据，并将其继续传输给目标数据库。The data retransmission module 70 is configured to determine the data not transmitted to the target database according to the target data node and the data written in the data node, and continue to transmit the data to the target database.

关于应用的安装包的构建分发系统的具体限定可以参见上文中对于应用的安装包的构建分发方法的限定，在此不再赘述。上述应用的安装包的构建分发系统的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the construction and distribution system of the application installation package, reference may be made to the above definition of the construction and distribution method of the application installation package, which will not be repeated here. Each module of the construction and distribution system of the installation package of the above application may be implemented in whole or in part by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

在一个实施例中，提供了一种计算机设备，该计算机设备可以是服务器，其内部结构图可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种应用的安装包的构建分发方法。In one embodiment, a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 3 . The computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The nonvolatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, a method for constructing and distributing an installation package of an application is implemented.

本领域技术人员可以理解，图3中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

在一个实施例中，提供了一种计算机设备,包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行计算机程序时实现以下步骤：In one embodiment, a computer device is provided, comprising a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:

在一个实施例中，所述数据特征包括：数据波峰和数据波谷；所述根据所述表中数据的数据特征在所述表的数据中写入对应的数据节点，包括：In one embodiment, the data features include: data peaks and data troughs; and writing corresponding data nodes in the data of the table according to the data features of the data in the table, including:

在一个实施例中，所述数据特征还包括：数据消耗时间；所述数据消耗时间根据实时网络传输速度确定；所述根据所述表中数据的数据特征在所述表的数据中写入对应的数据节点，还包括：In one embodiment, the data characteristic further includes: data consumption time; the data consumption time is determined according to the real-time network transmission speed; the data corresponding to the data in the table is written according to the data characteristic of the data in the table The data node also includes:

在一个实施例中，所述根据所述日志文件数据建表，包括：In one embodiment, the building a table according to the log file data includes:

在一个实施例中，在所述根据所述日志文件数据建表之后，还包括：In one embodiment, after the creating the table according to the log file data, the method further includes:

在一个实施例中，所述将写入有数据节点的数据传输给目标数据库，包括：In one embodiment, the transmitting the data written with the data node to the target database includes:

在一个实施例中，还包括：In one embodiment, it also includes:

在一个实施例中，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现以下步骤：In one embodiment, a computer-readable storage medium is provided on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:

在一个实施例中，还包括：In one embodiment, it also includes:

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限，RAM以多种形式可得，诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.

Claims

1. A data synchronization method supporting breakpoint continuous transmission is characterized by comprising the following steps:

acquiring log file data of a source database;

establishing a table according to the log file data;

writing corresponding data nodes in the data of the table according to the data characteristics of the data in the table;

transmitting data written with data nodes to a target database, judging whether information fed back by the target database and receiving the data nodes is received or not under the condition that the data nodes are transmitted in the process of transmitting the data written with the data nodes to the target database, and storing the position information and the corresponding storage time of the data nodes under the condition that the information fed back by the target database is received;

receiving a breakpoint continuous transmission request, and determining the request time of the breakpoint continuous transmission request;

determining a target data node corresponding to the storage time closest to the time point according to the request time;

and determining data which is not transmitted to the target database according to the target data node and the data written with the data node, and continuously transmitting the data to the target database.

2. The data synchronization method supporting breakpoint resume according to claim 1, wherein the data characteristics include: data peaks and data troughs; the writing of the corresponding data node in the data of the table according to the data characteristics of the data in the table includes:

judging whether the data in the table has a data wave crest or not, and writing a corresponding data node at the data wave crest under the condition that the data wave crest exists;

and judging whether the data in the table has data troughs or not, and writing corresponding data nodes in the data troughs under the condition that the data troughs exist.

3. The data synchronization method supporting breakpoint resume according to claim 2, wherein the data characteristics further include: data consumption time; the data consumption time is determined according to the real-time network transmission speed; the writing of the corresponding data node in the data of the table according to the data characteristics of the data in the table further comprises:

judging whether the data consumption time required by the data between the adjacent data nodes in the table is larger than a preset time threshold, calculating to obtain the data length according to the preset time threshold and the real-time network transmission speed under the condition that the data consumption time required by the data between the adjacent data nodes is larger than the preset time threshold, adding the data length to the previous data node in the adjacent data nodes to obtain a data writing point, and writing the data writing point into the corresponding data node.

4. The method for data synchronization supporting breakpoint resume according to claim 1, wherein the creating a table according to the log file data includes:

analyzing the log file data to obtain a database name, a table name, an operation type, a main key and all field values;

and building a table according to the database name, the table name, the operation type, the main key and all field values.

5. The method for data synchronization supporting breakpoint resume according to claim 4, further comprising, after said creating a table according to the log file data:

dividing data in the table into a plurality of data rows according to the primary key;

and dividing each data line into a plurality of data blocks according to the preset configuration.

6. The method for synchronizing data supporting breakpoint resuming according to claim 4, wherein the transmitting the data written with the data node to the target database includes:

sequentially reading all data blocks in each data row according to a time stamp, wherein the time stamp is acquired from log file data;

under the condition that the front end of the data block is read, marking a first binlog position at the front end of the data block, wherein the first binlog position is position information of the front end of the data block acquired from log file data;

under the condition that the rear end of the data block is read, marking a second binlog position at the rear end of the data block, wherein the second binlog position is position information of the rear end of the data block acquired from log file data;

and transmitting the data block marked with the first binlog position and the second binlog position to a target data end.

7. The data synchronization method supporting breakpoint resume according to claim 6, further comprising:

when the data of the source database is changed in the process of transferring the data written into the data node into the target database, determining a data row corresponding to the changed data according to a primary key corresponding to the changed data, comparing the positions of the changed data with all first binlog positions and all second binlog positions of the data row to determine a data block corresponding to the changed data, and changing the data of the data block corresponding to the changed data according to the primary key corresponding to the changed data.

8. A data synchronization system supporting breakpoint resuming, comprising:

the data acquisition module is used for acquiring log file data of a source database;

the data tabulation module is used for tabulating according to the log file data;

the node writing module is used for writing corresponding data nodes in the data of the table according to the data characteristics of the data in the table;

the transmission judging module is used for transmitting the data written with the data nodes to a target database, judging whether information fed back by the target database and receiving the data nodes is received or not under the condition that the data nodes are transmitted in the process of transmitting the data written with the data nodes to the target database, and storing the position information of the data nodes and the corresponding storage time under the condition that the information fed back by the target database is received;

the receiving request module is used for receiving a breakpoint resume request and determining the request time of the breakpoint resume request;

the node determining module is used for determining a target data node corresponding to the storage time closest to the time point according to the request time;

and the data continuous transmission module is used for determining the data which is not transmitted to the target database according to the target data node and the data written with the data node, and continuously transmitting the data to the target database.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.