CN116610752A

CN116610752A - Transactional distributed data synchronization method, device, system and storage medium

Info

Publication number: CN116610752A
Application number: CN202310579061.7A
Authority: CN
Inventors: 高飞
Original assignee: New H3C Technologies Co Ltd
Current assignee: New H3C Technologies Co Ltd
Priority date: 2023-05-19
Filing date: 2023-05-19
Publication date: 2023-08-18

Abstract

The present invention provides a transactional distributed data synchronization method, device, system and storage medium, which are used to solve the technical problem that data consistency cannot be guaranteed in the process of using Flink to synchronize data from a data source to a database table without a primary key. The present invention divides the process of Flink cluster distributed data synchronization into a pre-submission stage and a formal submission stage. In the pre-submission stage, a snapshot is created and the snapshot data is submitted to the target database in the pre-submission mode of the database transaction. After the snapshot is successfully created, the formal In the submission phase, the snapshot data is formally submitted to the destination database table. In the technical solution of the present invention, not only the processing state of the snapshot is recorded, but also the submission state of the transaction is recorded, and in the case of a Flink failure, the data synchronization task is restored according to the recorded snapshot and transaction state. The invention can realize the consistency of data synchronization of Flink from a distributed data source to a relational database table without a primary key.

Description

Transactional distributed data synchronization method, device, system and storage medium

技术领域technical field

本发明涉及通信及云计算技术领域，尤其涉及一种事务性分布式数据同步方法、装置、系统及存储介质。The present invention relates to the technical field of communication and cloud computing, in particular to a transactional distributed data synchronization method, device, system and storage medium.

背景技术Background technique

Kafka是由Apache软件基金会开发的一个开源流处理平台，由Scala和Java编写。Kafka是一种高吞吐量的分布式发布订阅消息系统。实现从分布式发布订阅消息系统(例如Kafka)到关系数据库(例如MySQL)的数据同步是一种常见的需求。Kafka is an open source stream processing platform developed by the Apache Software Foundation, written in Scala and Java. Kafka is a high-throughput distributed publish-subscribe messaging system. It is a common requirement to realize data synchronization from a distributed publish-subscribe message system (such as Kafka) to a relational database (such as MySQL).

Flink是一个框架和分布式处理引擎，用于在无界(流处理)和有边界(批处理)数据流上进行有状态的计算。Flink是实现Kafka到关系数据库的数据同步的常用工具。Flink中通用的数据处理流程包括一个或多个输入算子任务(source)、数据处理算子任务(transform)以及输出算子任务(sink)。Flink is a framework and distributed processing engine for stateful computation on unbounded (stream processing) and bounded (batch) data streams. Flink is a common tool for data synchronization from Kafka to relational databases. The general data processing process in Flink includes one or more input operator tasks (source), data processing operator tasks (transform) and output operator tasks (sink).

在使用Flink将Kafka集群中的数据同步到关系数据库的处理流程中，由于Flink集群的分布式特性、人工开发的处理逻辑等原因通常很容易发生集群的重启、任务的重新调度等异常现象，通常集群发生异常后会带来源和目的数据一致性问题，例如输出数据库中存在重复业务数据，输出数据库中存在丢失业务数据等。In the processing flow of using Flink to synchronize the data in the Kafka cluster to the relational database, due to the distributed nature of the Flink cluster and the processing logic of manual development, abnormal phenomena such as cluster restart and task rescheduling are usually prone to occur. When an exception occurs in the cluster, there will be problems with the consistency of source and destination data, such as duplicate business data in the output database, missing business data in the output database, and so on.

通过Flink的流处理能力和内在机制，如检查点(checkpoint)快照等技术可提升Kafka到MySQL数据同步的一致性，尤其在目的端MySQL库表中有对应的主键的情况下，这样可以通过MySQL中的幂等性SQL实现同步数据的一致性。但是在目的端MySQL库表中没有定义主键，使用Flink集群从数据源Kafka中同步数据至MySQL时在集群异常情况下则会容易产生数据缺失或数据重复的问题。Through Flink's stream processing capabilities and internal mechanisms, technologies such as checkpoint snapshots can improve the consistency of data synchronization from Kafka to MySQL, especially when there is a corresponding primary key in the destination MySQL database table, which can be passed through MySQL The idempotent SQL in implements the consistency of synchronous data. However, there is no primary key defined in the MySQL database table at the destination, and when the Flink cluster is used to synchronize data from the data source Kafka to MySQL, data loss or data duplication may easily occur when the cluster is abnormal.

发明内容Contents of the invention

有鉴于此，本发明提供一种事务性分布式数据同步方法、装置、系统及存储介质，用于解决使用Flink从数据源同步数据到无主键目的数据库表的过程中，无法保证数据一致性的技术问题。In view of this, the present invention provides a transactional distributed data synchronization method, device, system and storage medium, which are used to solve the problem that data consistency cannot be guaranteed in the process of using Flink to synchronize data from a data source to a database table without a primary key. technical problem.

基于本发明实施例的一方面，本发明提供了一种事务性分布式数据同步方法，该方法应用于Flink集群，该方法包括：Based on an aspect of the embodiments of the present invention, the present invention provides a transactional distributed data synchronization method, which is applied to a Flink cluster, and the method includes:

通过Flink集群将分布式数据源中的数据同步到目的无主键关系数据库表即目的数据库表中的过程分为预提交阶段和正式提交阶段；The process of synchronizing the data in the distributed data source to the destination non-primary key relational database table through the Flink cluster, that is, the destination database table, is divided into a pre-submission stage and a formal submission stage;

在预提交阶段中，完成快照创建以及快照数据的预提交，输出算子任务sink负责将快照数据预提交到目的数据库表中并记录快照完成状态及事务预提交状态；In the pre-submission phase, the snapshot creation and the pre-submission of the snapshot data are completed, and the output operator task sink is responsible for pre-submitting the snapshot data to the destination database table and recording the completion status of the snapshot and the pre-submission status of the transaction;

在正式提交阶段中，作业管理器Jobmanager负责在快照创建成功后指令sink算子任务正式提交快照数据到目的数据库表中并记录事务提交结果。In the formal submission phase, the job manager Jobmanager is responsible for instructing the sink operator task to formally submit the snapshot data to the destination database table and record the transaction submission result after the snapshot is successfully created.

进一步地，所述预提交阶段的步骤包括：Further, the steps in the pre-submission stage include:

作业管理器Jobmanager通过注入检查点分界线标记当前同步的快照数据；The job manager Jobmanager marks the currently synchronized snapshot data by injecting checkpoint boundaries;

输入算子任务source在检测到检查点分界线后记录快照偏移；The input operator task source records the snapshot offset after detecting the checkpoint boundary;

sink算子任务在检测到检查点分界线后，负责缓存当前快照数据、预提交当前快照数据到目的数据库表以及记录事务预提交状态；After the sink operator task detects the checkpoint boundary, it is responsible for caching the current snapshot data, pre-submitting the current snapshot data to the destination database table, and recording the pre-commit status of the transaction;

各算子任务在成功完成快照数据的处理后向Jobmanager发送快照完成反馈，sink算子任务在成功完成快照数据的预提交的处理后向Jobmanager发送预提交结果反馈；Each operator task sends a snapshot completion feedback to the Jobmanager after successfully completing the processing of the snapshot data, and the sink operator task sends a pre-submission result feedback to the Jobmanager after successfully completing the pre-submission processing of the snapshot data;

当Jobmanager接收到所有算子任务的快照完成反馈和sink算子任务的预提交结果反馈后，记录快照成功状态。After the Jobmanager receives the snapshot completion feedback of all operator tasks and the pre-submission result feedback of the sink operator task, it records the snapshot success status.

进一步地，所述正式提交阶段的步骤包括：Further, the steps in the formal submission stage include:

在Jobmanager根据快照完成反馈和预提交结果反馈判定创造创建成功及预提交成功后，向所有算子任务发送快照完成消息；After the Jobmanager judges that the creation and pre-submission are successful based on the snapshot completion feedback and pre-submission result feedback, it sends a snapshot completion message to all operator tasks;

输出算子任务sink接收到快照完成消息后正式将快照数据提交到目的数据库表中并记录事务正式提交成功与否的事务提交结果。After receiving the snapshot completion message, the output operator task sink formally submits the snapshot data to the destination database table and records the transaction submission result of whether the transaction is formally submitted successfully or not.

进一步地，若Flink集群在预提交阶段产生故障且导致快照创建未成功时，所述方法还包括如下故障恢复步骤:Further, if the Flink cluster fails during the pre-submit phase and causes the snapshot to be created unsuccessfully, the method also includes the following fault recovery steps:

Jobmanager从状态存储后端获取最近一次创建成功的快照的快照信息；Jobmanager obtains the snapshot information of the last successfully created snapshot from the state storage backend;

Jobmanager协调调度Taskmanager恢复数据同步任务；Jobmanager coordinates and schedules Taskmanager to restore data synchronization tasks;

sink算子任务根据所述快照信息从状态存储后端获取最近一次创建成功的快照的事务标识ID及快照数据；The sink operator task obtains the transaction identification ID and snapshot data of the last successfully created snapshot from the state storage backend according to the snapshot information;

sink算子任务根据事务标识ID获取事务提交状态，验证所述最近一次创建成功的快照的快照数据是否成功提交到目的数据库表中，若成功提交，则source算子任务获取下一次待消费的快照数据偏移，重新开始数据同步过程。The sink operator task obtains the transaction submission status according to the transaction ID, and verifies whether the snapshot data of the latest snapshot successfully created is successfully submitted to the destination database table. If the submission is successful, the source operator task obtains the next snapshot to be consumed Data offset, restart the data synchronization process.

进一步地，若Flink集群在预提交阶段产生故障且最近一次快照创建成功时，所述方法还包括如下故障恢复步骤:Further, if the Flink cluster fails during the pre-submit phase and the latest snapshot is created successfully, the method also includes the following fault recovery steps:

Jobmanager获取最近一次创建成功的快照的快照信息；Jobmanager obtains the snapshot information of the latest snapshot successfully created;

Jobmanager协调调度Taskmanager恢复数据同步任务的执行；Jobmanager coordinates and schedules Taskmanager to restore the execution of data synchronization tasks;

sink算子任务根据所述快照信息获取最近一次创建成功的快照的事务标识ID及快照数据；The sink operator task obtains the transaction identification ID and snapshot data of the last successfully created snapshot according to the snapshot information;

sink算子任务根据事务ID获取事务提交状态，判断对应快照数据是否正式提交到目的数据库表中；The sink operator task obtains the transaction submission status according to the transaction ID, and judges whether the corresponding snapshot data is formally submitted to the destination database table;

在事务未正式提交的状态下，sink算子任务将最近一次创建成功的快照的快照数据提交到目的数据库表中；In the state that the transaction is not officially committed, the sink operator task submits the snapshot data of the last successfully created snapshot to the destination database table;

source算子任务获取下一待消费的快照数据偏移继续进行数据同步。The source operator task obtains the data offset of the next snapshot to be consumed to continue data synchronization.

进一步地，所述Jobmanager和各算子任务在状态存储后端中记录快照信息，所述状态存储后端为分布式文件系统；所述分布式数据源为分布式发布订阅消息系统；所述sink算子任务将事务状态记录在分布式内存数据库中。Further, the Jobmanager and each operator task record snapshot information in the state storage backend, the state storage backend is a distributed file system; the distributed data source is a distributed publish-subscribe message system; the sink Operator tasks record transaction status in a distributed memory database.

基于本发明实施例的另一方面，本发明还提供一种事务性分布式数据同步系统，该系统包括Flink集群、状态存储后端、分布式内存数据库；Based on another aspect of the embodiments of the present invention, the present invention also provides a transactional distributed data synchronization system, which includes a Flink cluster, a state storage backend, and a distributed memory database;

所述Flink集群包括作业管理器Jobmanager、任务管理器Taskmanager，Taskmanager中至少运行有输入算子任务source和输出算子任务sink；The Flink cluster includes a job manager Jobmanager and a task manager Taskmanager, and the Taskmanager runs at least an input operator task source and an output operator task sink;

所述Flink集群用于将分布式数据源中的数据同步到目的无主键关系数据库表即目的数据库表中，所述数据同步过程分为预提交阶段和正式提交阶段；The Flink cluster is used to synchronize the data in the distributed data source to the target non-primary key relational database table, that is, the target database table, and the data synchronization process is divided into a pre-submission stage and a formal submission stage;

在预提交阶段中，完成快照创建以及快照数据的预提交，输出算子sink算子任务负责将快照数据预提交到目的数据库表中并记录快照完成状态及事务预提交状态；In the pre-submission phase, the creation of snapshots and the pre-submission of snapshot data are completed, and the output operator sink operator task is responsible for pre-submitting the snapshot data to the destination database table and recording the completion status of the snapshot and the pre-submission status of the transaction;

在正式提交阶段中，作业管理器Jobmanager负责在快照创建成功后指令sink算子任务正式提交快照数据到目的数据库表中并记录事务提交结果；In the formal submission phase, the job manager Jobmanager is responsible for instructing the sink operator task to formally submit the snapshot data to the destination database table and record the transaction submission results after the snapshot is successfully created;

所述状态存储后端用于存储快照信息和缓存快照数据；The state storage backend is used to store snapshot information and cache snapshot data;

所述分布式内存数据库用于存储事务状态。The distributed memory database is used to store transaction status.

进一步地，在所述预提交阶段中：Further, in the pre-submit phase:

所述Jobmanager通过注入检查点分界线标记当前同步的快照数据；The Jobmanager marks the currently synchronized snapshot data by injecting checkpoint boundaries;

所述source算子任务在检测到检查点分界线后记录快照偏移；The source operator task records the snapshot offset after detecting the checkpoint boundary;

所述sink算子任务在检测到检查点分界线后，负责缓存当前快照数据、预提交当前快照数据到目的数据库表以及记录事务预提交状态；After the sink operator task detects the checkpoint boundary, it is responsible for caching the current snapshot data, pre-submitting the current snapshot data to the destination database table, and recording the transaction pre-submission status;

所述Jobmanager还用于接收到所有算子任务的快照完成反馈和sink算子任务的预提交结果反馈，记录快照成功状态。The Jobmanager is also used to receive the snapshot completion feedback of all operator tasks and the pre-submission result feedback of the sink operator task, and record the success status of the snapshot.

进一步地，在所述正式提交阶段中：Further, in the formal submission stage:

所述Jobmanager还用于在根据快照完成反馈和预提交结果反馈判定创造创建成功及预提交成功后，向所有算子任务发送快照完成消息；The Jobmanager is also used to send a snapshot completion message to all operator tasks after the snapshot completion feedback and the pre-submission result feedback determine that the creation is successful and the pre-submission is successful;

所述输出算子任务sink接收到快照完成消息后正式将快照数据提交到目的数据库表中并记录事务正式提交成功与否的事务提交结果。After receiving the snapshot completion message, the output operator task sink formally submits the snapshot data to the target database table and records the transaction submission result of whether the transaction formally submits successfully or not.

基于本发明实施例的一方面，本发明还提供一种事务性分布式数据同步装置，该装置用于通过Flink集群将分布式数据源中的数据同步到目的无主键关系数据库表即目的数据库表中，该装置包括：Based on one aspect of the embodiments of the present invention, the present invention also provides a transactional distributed data synchronization device, which is used to synchronize the data in the distributed data source to the target non-primary key relational database table, that is, the target database table through the Flink cluster , the device includes:

预提交模块，用于完成所述数据同步过程中的预提交阶段的处理步骤，完成快照创建以及快照数据的预提交；其中，通过输出算子任务sink将快照数据预提交到目的数据库表中并记录快照完成状态及事务预提交状态；The pre-submission module is used to complete the processing steps of the pre-submission stage in the data synchronization process, complete the snapshot creation and the pre-submission of the snapshot data; wherein, the snapshot data is pre-submitted to the destination database table through the output operator task sink and Record snapshot completion status and transaction pre-commit status;

正式提交模块，用于完成所述数据同步过程中的正式提交阶段的处理步骤，将快照数据以事务模式正式提交的目的数据库表中；其中，通过作业管理器Jobmanager在快照创建成功后指令sink算子任务正式提交快照数据到目的数据库表中并记录事务提交状态并记录事务提交结果。The formal submission module is used to complete the processing steps of the formal submission phase in the data synchronization process, and submit the snapshot data to the destination database table formally in the transaction mode; wherein, after the snapshot is created successfully, the job manager Jobmanager instructs the sink to calculate The subtask formally submits the snapshot data to the target database table and records the transaction submission status and transaction submission results.

本发明提供的该装置可以软件、硬件或软硬结合的方式实现。当以软件模块方式实现时，该软件模块的程序代码被加载到设备的存储介质中，由处理器读取存储介质中的程序代码并执行，从而实现该装置中各组成模块的功能。The device provided by the present invention can be implemented in the form of software, hardware or a combination of software and hardware. When implemented as a software module, the program code of the software module is loaded into the storage medium of the device, and the processor reads and executes the program code in the storage medium, thereby realizing the functions of each component module in the device.

基于本发明实施例的一方面，本发明还提供一种电子设备，包括处理器、通信接口、存储介质和通信总线，其中，处理器、通信接口、存储介质通过通信总线完成相互间的通信；Based on an aspect of the embodiments of the present invention, the present invention also provides an electronic device, including a processor, a communication interface, a storage medium, and a communication bus, wherein, the processor, the communication interface, and the storage medium communicate with each other through the communication bus;

存储介质，用于存放计算机程序；storage medium for storing computer programs;

处理器，用于执行存储介质上所存放的计算机程序时，实施如前所述的事务性分布式数据同步方法中的一个或多个步骤。The processor is configured to implement one or more steps in the foregoing transactional distributed data synchronization method when executing the computer program stored on the storage medium.

本发明提供一种事务性分布式数据同步方法、装置、系统及存储介质，用于解决使用Flink从数据源同步数据到无主键目的数据库表的过程中无法保证数据一致性的技术问题。本发明将Flink集群进行分布式数据同步的过程分成预提交阶段和正式提交阶段，在预提交阶段创建快照并以数据库事务的预提交模式将快照数据提交到目的数据库，在快照创建成功后通过正式提交阶段将快照数据正式提交到目的数据库表。本发明技术方案中不仅记录快照的处理状态还会记录事务的提交状态，并在Flink故障的情况下，根据记录的快照和事务状态进行数据同步任务的恢复。通过本发明能够实现Flink从分布式数据源到无主键关系数据库表的数据同步的一致性。The invention provides a transactional distributed data synchronization method, device, system and storage medium, which are used to solve the technical problem that data consistency cannot be guaranteed in the process of using Flink to synchronize data from a data source to a database table without a primary key. The present invention divides the process of Flink cluster distributed data synchronization into a pre-submission stage and a formal submission stage. In the pre-submission stage, a snapshot is created and the snapshot data is submitted to the target database in the pre-submission mode of the database transaction. After the snapshot is successfully created, the formal In the submit phase, the snapshot data is formally submitted to the destination database table. In the technical solution of the present invention, not only the processing state of the snapshot is recorded, but also the submission state of the transaction is recorded, and in the case of a Flink failure, the data synchronization task is restored according to the recorded snapshot and transaction state. The invention can realize the consistency of data synchronization of Flink from a distributed data source to a relational database table without a primary key.

附图说明Description of drawings

为了更加清楚地说明本发明实施例或者现有技术中的技术方案，下面将对本发明实施例或者现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明中记载的一些实施例，对于本领域普通技术人员来讲，还可以根据本发明实施例的这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments of the present invention or the prior art. Obviously, the accompanying drawings in the following description These are only some embodiments described in the present invention, and those skilled in the art can also obtain other drawings according to these drawings of the embodiments of the present invention.

图1为本发明一实施例中基于Flink两阶段分布式事务状态机制实现数据同步的过程示意图；Fig. 1 is a schematic diagram of the process of realizing data synchronization based on the Flink two-stage distributed transaction state mechanism in an embodiment of the present invention;

图2A为本发明一实施例中预提交阶段发起预提交及执行预提交过程的示意图；FIG. 2A is a schematic diagram of initiating pre-submission and executing a pre-submission process in the pre-submission stage in an embodiment of the present invention;

图2B为本发明一实施例中预提交阶段确认完成预提交阶段的示意图；FIG. 2B is a schematic diagram of confirming the completion of the pre-submission phase in the pre-submission phase in an embodiment of the present invention;

图2C为本发明一实施例中执行第二阶段即正式提交阶段的示意图；FIG. 2C is a schematic diagram of executing the second stage, namely the formal submission stage, in an embodiment of the present invention;

图3A为本发明一实施例中第一阶段快照创建未成功情况下的异常恢复步骤示意图；FIG. 3A is a schematic diagram of abnormal recovery steps in the case of unsuccessful snapshot creation in the first stage in an embodiment of the present invention;

图3B为本发明一实施例中第一阶段快照创建成功情况下的异常恢复步骤示意图；FIG. 3B is a schematic diagram of abnormal recovery steps in the case of successful snapshot creation in the first stage in an embodiment of the present invention;

图4为本发明一实施例提供的用于实现事务性分布式数据同步方法的电子设备结构示意图。FIG. 4 is a schematic structural diagram of an electronic device for implementing a method for synchronizing transactional distributed data according to an embodiment of the present invention.

具体实施方式Detailed ways

在本发明实施例使用的术语仅仅是出于描述特定实施例的目的，而非限制本发明实施例。本发明实施例中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其它含义。应当理解，尽管在本发明实施例可能采用术语第一、第二、第三等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用于区别类似的信息、实体或步骤，而不是用于描述特定的顺序或先后次序。例如，在不脱离本发明实施例范围的情况下，第一信息也可以被称为第二信息，类似地，第二信息也可以被称为第一信息。此外，所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。本发明中的“和/或”仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况，其中A，B可以是单数或者复数。并且，在本发明的描述中，除非另有说明，“多个”是指两个或多于两个。“以下至少一项(个)”或其类似表达，是指的这些项中的任意组合，包括单项(个)或复数项(个)的任意组合。例如，a，b，或c中的至少一项(个)，可以表示：a，b，c，a-b，a-c，b-c，或a-b-c，其中a，b，c可以是单个，也可以是多个。The terms used in the embodiments of the present invention are only for the purpose of describing specific embodiments, rather than limiting the embodiments of the present invention. The singular forms "a", "said" and "the" used in the embodiments of the present invention are also intended to include plural forms unless the context clearly indicates otherwise. It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present invention to describe various information, the information should not be limited to these terms. These terms are only used to distinguish similar information, entities or steps, and are not used to describe a specific order or sequence. For example, without departing from the scope of the embodiments of the present invention, first information may also be called second information, and similarly, second information may also be called first information. Additionally, the use of the word "if" may be construed as "at" or "when" or "in response to a determination." The "and/or" in the present invention is only an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three cases, where A, B can be singular or plural. And, in the description of the present invention, unless otherwise specified, "plurality" means two or more than two. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .

Flink是一种分布式流处理框架，可以并行和流水线方式执行任意流数据程序，Flink的流水线运行时系统可以执行批处理和流处理程序。Flink的检查点Checkpoint快照机制是其可靠性的基石。当一个任务在运行过程中出现故障时，可以根据Checkpoint快照信息恢复到故障之前的状态，然后从该状态恢复任务的运行。在Flink中，Checkpoint快照机制采用的是分布式快照算法，通过Checkpoint快照机制，保证了Flink程序内部的恰好一次(Exactly Once)的语义，保证Flink内部数据一致性。Flink is a distributed stream processing framework that can execute arbitrary stream data programs in a parallel and pipeline manner. Flink's pipeline runtime system can execute batch and stream processing programs. Flink's checkpoint Checkpoint snapshot mechanism is the cornerstone of its reliability. When a task fails during operation, it can restore to the state before the failure according to the Checkpoint snapshot information, and then resume the operation of the task from this state. In Flink, the Checkpoint snapshot mechanism adopts a distributed snapshot algorithm. Through the Checkpoint snapshot mechanism, the semantics of exactly once (Exactly Once) inside the Flink program are guaranteed, and the internal data consistency of Flink is guaranteed.

Flink框架中包括两个重要组件，分别是管理协调组件JobManager(作业管理器)和执行进程组件TaskManager(任务管理器)，Flink架构也遵循Master-Slave主从架构设计原则，JobManager相当于整个集群的Master节点，且整个集群有且只有一个活跃的JobManager，TaskManager相当于整个集群的Slave节点。The Flink framework includes two important components, namely the management and coordination component JobManager (job manager) and the execution process component TaskManager (task manager). The Flink architecture also follows the Master-Slave master-slave architecture design principle. The JobManager is equivalent to the entire cluster. Master node, and the entire cluster has one and only one active JobManager, TaskManager is equivalent to the Slave node of the entire cluster.

JobManager负责整个Flink集群任务的调度以及资源的管理，TaskManager负责具体的任务执行和对应任务在每个节点上的资源申请和管理。JobManager从客户端中获取提交的应用，然后根据集群中TaskManager上任务槽TaskSlot的使用情况，为提交的应用分配相应的TaskSlot资源并指令TaskManager启动从客户端中获取的应用。JobManager is responsible for the scheduling of the entire Flink cluster task and resource management, and TaskManager is responsible for specific task execution and resource application and management of corresponding tasks on each node. The JobManager obtains the submitted application from the client, and then allocates the corresponding TaskSlot resources for the submitted application according to the usage of the task slot TaskSlot on the TaskManager in the cluster and instructs the TaskManager to start the application obtained from the client.

客户端通过将编写好的Flink应用编译打包，提交到JobManager，然后JobManager会根据已注册在JobManager中的TaskManager的资源情况，将任务分配给有资源的TaskManager节点，然后启动并运行任务。The client compiles and packages the written Flink application, submits it to the JobManager, and then the JobManager assigns the task to the TaskManager node with resources according to the resources of the TaskManager registered in the JobManager, and then starts and runs the task.

在使用Flink实现Kafka到关系数据库的数据同步的场景中，基于Flink框架原生快照检查点Checkpoint机制同时增加“状态存储后端”用于存储Flink集群中source算子任务所处理的业务数据的快照偏移offset，快照偏移为source算子任务从kafka消费的数据起始位置，sink算子任务与数据库连接，将处理后的业务数据输出写入至数据库表中。再每次快照对应的一批业务数据处理完成时，Jobmanager会将快照偏移offset写入状态存储后端，在遇到集群或任务异常时，Flink恢复至上次快照成功时的状态，继续进行下一快照数据的处理，但是该实现过程会面临如下两个问题：In the scenario where Flink is used to synchronize data from Kafka to relational databases, based on the Flink framework's native snapshot checkpoint Checkpoint mechanism, a "state storage backend" is added to store the snapshot bias of the business data processed by the source operator task in the Flink cluster. Shift offset, the snapshot offset is the starting position of the data consumed by the source operator task from Kafka, the sink operator task is connected to the database, and writes the processed business data output into the database table. Each time the processing of a batch of business data corresponding to the snapshot is completed, Jobmanager will write the snapshot offset offset to the backend of the state storage. When a cluster or task is abnormal, Flink will restore to the state when the last snapshot was successful, and proceed to the next step A snapshot data processing, but the realization process will face the following two problems:

问题一：如果本次快照未成功，但是已经写入部分数据至数据库表中，此时发生Flink集群异常，任务重启，会再次从上次快照恢复，这时候就可能重复处理已入库的数据，致使数据库表中插入重复数据。Question 1: If this snapshot is not successful, but some data has been written into the database table, an exception occurs in the Flink cluster at this time, the task restarts, and it will be restored from the previous snapshot again. At this time, the data that has been stored in the database may be processed repeatedly , causing duplicate data to be inserted into the database table.

问题二：如果本次快照成功，但快照数据未完全写入数据库，此时发生Flink集群异常，当任务重启后，会从上次成功的快照之后继续进行下一次快照数据的处理，这种情况下，由于部分快照数据未入库，就会导致丢失一部业务数据。Question 2: If the snapshot is successful, but the snapshot data is not completely written into the database, an exception occurs in the Flink cluster. When the task is restarted, it will continue to process the next snapshot data from the last successful snapshot. In this case In this case, some business data will be lost because some snapshot data is not stored in the database.

可见，基于Flink原生的checkpoint快照机制，在上述情况下，只能保障数据源端和Flink内部的数据一致性，而对于sink输出端，在目的数据库表没有主键的情况下，则无法保证数据源端和目的数据库表中的业务数据的一致性。It can be seen that based on Flink's native checkpoint snapshot mechanism, in the above cases, only the data consistency between the data source and Flink can be guaranteed, but for the sink output, if the destination database table has no primary key, the data source cannot be guaranteed The consistency of business data in the end and destination database tables.

本发明为解决使用Flink从数据源同步数据到无主键目的数据库表的过程中，无法保证数据一致性的技术问题，提出了一种事务性分布式数据同步方案。本发明技术方案的核心思想是：将使用Flink集群进行分布式数据同步的过程分成预提交阶段和正式提交阶段，sink算子任务在预提交阶段以数据库事务的预提交模式将快照数据提交到目的数据库，JobManager在接收到所有算子任务的快照成功反馈后记录快照成功状态并通知sink算子任务正式提交快照数据到目的数据库表，sink算子任务在提交数据成功后记录事务提交状态。本发明技术方案中不仅记录快照的处理状态还会记录事务的提交状态，并在Flink故障的情况下，根据记录的快照和事务状态进行数据同步任务的恢复。通过本发明能够实现Flink从分布式数据源(例如分布式发布订阅消息系统，Kafka等)到无主键关系数据库(例如MySQL)表的数据同步的一致性。In order to solve the technical problem that data consistency cannot be guaranteed in the process of using Flink to synchronize data from a data source to a database table without a primary key, the present invention proposes a transactional distributed data synchronization scheme. The core idea of the technical solution of the present invention is to divide the process of using the Flink cluster for distributed data synchronization into a pre-submission stage and a formal submission stage. In the pre-submission stage, the sink operator task submits the snapshot data to the destination in the pre-submission mode of the database transaction. For the database, the JobManager records the snapshot success status after receiving the snapshot success feedback of all operator tasks and notifies the sink operator task to formally submit the snapshot data to the destination database table, and the sink operator task records the transaction submission status after submitting the data successfully. In the technical solution of the present invention, not only the processing state of the snapshot is recorded, but also the submission state of the transaction is recorded, and in the case of a Flink failure, the data synchronization task is restored according to the recorded snapshot and transaction state. The present invention can realize the consistency of data synchronization of Flink from a distributed data source (such as a distributed publish-subscribe message system, Kafka, etc.) to a relational database (such as MySQL) table without a primary key.

基于本发明的基本思想，以下结合附图和具体实施例来描述本发明的具体实现过程。Based on the basic idea of the present invention, the specific implementation process of the present invention will be described below in conjunction with the drawings and specific embodiments.

图1为本发明一实施例中基于Flink两阶段分布式事务状态机制实现数据同步的过程示意图。该实施例使用Flink将实现分布式数据源到无主键关系数据库表的基于分布式事务状态机制的数据同步过程分为两个阶段，第一阶段为预提交阶段，第二阶段为正式提交阶段。Flink集群中的协调管理组件JobManager(作业管理器)作为协调者在两个阶段中协调快照的创建和事务的提交，Flink集群中的执行进程组件TaskManager(任务管理器)所管理的各算子任务作为参与者负责数据同步(数据获取、数据处理、数据输出)。在完成第一阶段的预提交阶段后由JobManager发起第二阶段的处理步骤。JobManager和TaskManager协作完成基于事务机制的分布式数据同步。Fig. 1 is a schematic diagram of the process of realizing data synchronization based on Flink's two-stage distributed transaction state mechanism in an embodiment of the present invention. This embodiment uses Flink to divide the data synchronization process based on the distributed transaction state mechanism from the distributed data source to the non-primary key relational database table into two stages, the first stage is the pre-commit stage, and the second stage is the formal commit stage. The coordination management component JobManager (job manager) in the Flink cluster acts as a coordinator to coordinate the creation of snapshots and the submission of transactions in two phases, and each operator task managed by the execution process component TaskManager (task manager) in the Flink cluster As a participant is responsible for data synchronization (data acquisition, data processing, data output). After completing the pre-commit phase of the first phase, the JobManager initiates the processing steps of the second phase. JobManager and TaskManager cooperate to complete distributed data synchronization based on transaction mechanism.

本发明通过两阶段的事务性写入过程，既能够保证同步数据在Flink内部的一致性又能够保证Sink输出端的数据一致性，从而保证数据从数据源到无主键关系数据库表的一致性。Through the two-stage transactional writing process, the present invention can not only ensure the consistency of synchronous data inside Flink but also ensure the data consistency of Sink output end, thereby ensuring the consistency of data from data sources to non-primary key relational database tables.

以下分别对两个阶段的处理过程进行详细描述。The processing procedures of the two stages are described in detail below.

图2A为本发明一实施例中预提交阶段发起预提交及执行预提交过程的示意图；图2B为本发明一实施例中预提交阶段确认完成预提交阶段的示意图。图中数据同步的数据源为Kafka，本发明不限制数据源的类型和具体形式，例如也可以是其它分布式消息订阅发布组件，也可以是其它数据采集系统等。图中数据同步的目的端为无主关键字索引(无主键)的MySQL数据库表，本发明不限制目的关系数据库的类型，例如可以是MySQL、SQLServer、Oracle等。图中JobManager和TaskManager为Flink引擎自身组件，分布式文件系统HDFS作为状态存储后端用来缓存快照偏移、快照数据等，分布式内存数据库Redis为新引入组件，用于缓存事务状态信息等。FIG. 2A is a schematic diagram of initiating pre-submission and executing the pre-submission process in the pre-submission phase in an embodiment of the present invention; FIG. 2B is a schematic diagram of confirming completion of the pre-submission phase in an embodiment of the present invention. The data source for data synchronization in the figure is Kafka. The present invention does not limit the type and specific form of the data source. For example, it can also be other distributed message subscription and publishing components, or other data collection systems. The destination of the data synchronization in the figure is a MySQL database table without primary key index (no primary key). The present invention does not limit the type of the target relational database, such as MySQL, SQLServer, Oracle, etc. In the figure, JobManager and TaskManager are the components of the Flink engine itself. The distributed file system HDFS is used as the state storage backend to cache snapshot offsets and snapshot data. The distributed memory database Redis is a newly introduced component to cache transaction status information.

基于图2A和图2B执行第一阶段的处理过程包括：The first stage of processing based on Figure 2A and Figure 2B includes:

步骤200、Flink集群中作业管理器向数据输入算子Source注入检查点分界线，触发快照创建过程；Step 200, the job manager in the Flink cluster injects the checkpoint boundary into the data input operator Source, triggering the snapshot creation process;

Flink集群中的作业管理器Jobmanager负责调度执行数据同步作业，客户端向Jobmanager提交数据同步作业后，Jobmanager将数据同步作业的作业任务调度给任务管理器Taskmanager的各算子(Source、Map、Sink)去运行这些算子任务。The job manager Jobmanager in the Flink cluster is responsible for scheduling and executing data synchronization jobs. After the client submits the data synchronization job to the Jobmanager, the Jobmanager schedules the job tasks of the data synchronization job to the operators (Source, Map, Sink) of the task manager Taskmanager. To run these operator tasks.

Flink自身提供Checkpoint检查点快照功能，该快照功能通过注入检查点分界线来实现，检查点分界线可以看作一种快照标记，其主要作用是用来区分前后两个checkpoint检查点快照，一个快照对应一批能否分别被多个任务算子处理的业务数据。检查点分界线是一种特殊的数据流，由Jobmanager组件触发，Jobmanager从数据输入算子(source)任务开始注入检查点分界线，该分界线会随着整个数据处理流程和普通数据流一样进行处理流转，因此每个算子都会检测到该检查点分界线并进行相应的处理。Flink itself provides the Checkpoint checkpoint snapshot function, which is implemented by injecting the checkpoint boundary line. The checkpoint boundary line can be regarded as a snapshot mark, and its main function is to distinguish between two checkpoint checkpoint snapshots before and after. A snapshot Corresponds to a batch of business data that can be processed by multiple task operators. The checkpoint boundary is a special data flow triggered by the Jobmanager component. The Jobmanager injects the checkpoint boundary from the data input operator (source) task. The boundary will follow the entire data processing process and the normal data flow. Process flow, so each operator will detect the checkpoint boundary and process accordingly.

步骤201、Flink集群数据输入算子任务source在执行过程中，在检测到检查点分界线后，记录快照偏移；Step 201, during the execution of the Flink cluster data input operator task source, after detecting the checkpoint boundary, record the snapshot offset;

数据输入source算子任务从Kafka消费数据，在接收到Jobmanager的注入检查点分界线的指令后，对消费的数据进行快照标记，并将该检查点分界线对应的快照偏移(即当前消费的kafka数据的位置信息)记录到状态存储后端。The data input source operator task consumes data from Kafka. After receiving the instruction of Jobmanager injecting the checkpoint boundary line, it takes a snapshot of the consumed data and offsets the snapshot corresponding to the checkpoint boundary line (that is, the currently consumed The location information of kafka data) is recorded to the state storage backend.

本发明实施例中所使用的状态存储后端可以是分布式文件系统(例如HadoopDistributed File System，HDFS)，也可以是其它类型的分布式文件系统或数据库，本发明不做限制。The state storage backend used in the embodiment of the present invention may be a distributed file system (such as Hadoop Distributed File System, HDFS), or other types of distributed file systems or databases, which is not limited in the present invention.

步骤202、Flink集群中的数据输出算子任务sink在检测到检查点分界线后，开启事务，缓存快照数据及事务ID；Step 202, the data output operator task sink in the Flink cluster detects the checkpoint boundary, starts the transaction, and caches the snapshot data and transaction ID;

Flink集群中可能会有多个Sink算子任务协作完成数据同步作业，每个sink算子任务在检测到检查点分界线后即开启一个针对当前快照数据的数据库事务，对该检查点分界线标记的快照数据进行序列化，将快照数据及事务标识ID序列化缓存到状态存储后端，以备在集群故障恢复时使用。sink算子任务也可以使用其它存储位置缓存序列号的快照数据，不与快照偏移等快照状态信息一起存储。There may be multiple sink operator tasks in the Flink cluster to cooperate to complete the data synchronization job. After each sink operator task detects the checkpoint boundary, it starts a database transaction for the current snapshot data and marks the checkpoint boundary. The snapshot data is serialized, and the snapshot data and transaction ID are serialized and cached to the state storage backend for use when the cluster fails to recover. The sink operator task can also use other storage locations to cache the snapshot data of the serial number, which is not stored together with the snapshot status information such as the snapshot offset.

步骤203、sink算子任务将当前快照数据预提交到目的数据库表中并记录事务的预提交状态；Step 203, the sink operator task pre-submits the current snapshot data to the destination database table and records the pre-commit status of the transaction;

Flink集群中可能会有多个Sink算子任务协作完成数据同步作业，每个sink算子任务在检测到检查点分界线后即开启一个针对当前快照数据的事务，将检查点分界线标记的快照数据及事务标识ID序列化缓存到状态存储后端之后，将快照数据预提交到目的数据库中，在完成快照数据的预提交后，sink算子任务还需要在分布式内存数据库redis中记录快照数据的预提交状态即标记自己负责的那部分快照数据的事务状态为预提交状态(未提交模式)，需要记录事务ID和预提交的成功或失败的结果。There may be multiple sink operator tasks in the Flink cluster to cooperate to complete the data synchronization job. After each sink operator task detects the checkpoint boundary line, it will start a transaction for the current snapshot data, and the snapshot marked by the checkpoint boundary line After the data and transaction ID are serialized and cached to the state storage backend, the snapshot data is pre-submitted to the destination database. After the snapshot data pre-submission is completed, the sink operator task also needs to record the snapshot data in the distributed memory database redis The pre-commit status of the snapshot data is marked as the pre-commit state (uncommitted mode), and the transaction ID and the success or failure result of the pre-commit need to be recorded.

本发明中的事务和数据库中事务的概念一致，数据库中在开始一个事务时，在执行Commit提交指令之前，对数据库表的增/删/改操作并不会实际生效，通过Rollback回滚指令可随时将当前事务回滚到事务开始前的状态，本发明中的预提交阶段的处理过程相当于数据库事务中执行Commit提交指令之前的处理过程。The transaction in the present invention is consistent with the concept of the transaction in the database. When starting a transaction in the database, before executing the Commit command, the addition/deletion/modification of the database table will not actually take effect, and the Rollback command can Roll back the current transaction to the state before the transaction starts at any time. The processing process of the pre-commit phase in the present invention is equivalent to the processing process before executing the Commit instruction in the database transaction.

本发明中Sink算子任务以数据库事务的预写入模式将快照数据写入到目的数据库中，虽然这部分快照数据已经实际写入数据库，但是处于未正式Commit提交到目的数据库表中的状态。只有所有sink算子任务都完成了当前批次的所有快照数据的预提交并记录了事务预提交状态后，Jobmanager才可以进一步触发sink算子任务执行正式提交快照数据到目的数据库表的Commit提交操作。In the present invention, the Sink operator task writes the snapshot data into the target database in the pre-write mode of the database transaction. Although this part of the snapshot data has actually been written into the database, it is in the state of not being officially committed to the target database table. Only after all the sink operator tasks have completed the pre-submission of all the snapshot data of the current batch and recorded the transaction pre-commit status, the Jobmanager can further trigger the sink operator tasks to perform the Commit operation that formally submits the snapshot data to the destination database table .

步骤204、Flink集群中的各算子任务在完成各自负责的快照数据的预提交阶段的处理后，向管理协调组件发送快照完成反馈和预提交结果反馈；Step 204, each operator task in the Flink cluster sends snapshot completion feedback and pre-submission result feedback to the management coordination component after completing the pre-submission phase processing of the snapshot data they are responsible for;

步骤205、管理协调组件在接收到所有算子任务反馈的快照完成反馈和预提交结果反馈后，记录所注入的检查点分界线标记的快照成功状态；Step 205. After receiving the snapshot completion feedback and pre-submission result feedback from all operator task feedbacks, the management coordination component records the snapshot success status of the injected checkpoint boundary mark;

该步骤对应图2B的示例，在各算子任务完成当前快照相关的预提交阶段的处理后，都会向管理协调组件Jobmanager发送快照完成反馈和预提交结果反馈。This step corresponds to the example in Figure 2B. After each operator task completes the processing of the pre-submission phase related to the current snapshot, it will send snapshot completion feedback and pre-submission result feedback to the management coordination component Jobmanager.

例如，source算子任务在完成向状态存储后端的快照偏移的记录后，会向Jobmanager发送快照完成反馈；sink算子任务在完成向状态存储后端的快照数据的缓存后也会向Jobmanager发送快照完成反馈，在sink算子任务完成快照数据的预提交并记录预提交状态后也会向Jobmanager发送预提交结果反馈。For example, after the source operator task finishes recording the snapshot offset to the state storage backend, it will send a snapshot completion feedback to the Jobmanager; after the sink operator task completes the snapshot data cache to the state storage backend, it will also send a snapshot to the Jobmanager Completion of the feedback, after the sink operator task completes the pre-submission of the snapshot data and records the pre-submission status, it will also send the pre-submission result feedback to the Jobmanager.

管理协调组件Jobmanager在接收到所有算子有关当前快照及预提交状态的反馈，根据反馈结果判定所有算子任务都成功完成各自的处理步骤后，将快照成功状态记录到状态存储后端，快照成功状态信息可包括检查点分界线标记、快照标识、快照时间等。只有当各算子任务成功完成快照创建以及Sink算子任务成功完成同步数据的预提交并记录事务状态后，Jobmanager才可确认第一阶段完成。若有算子任务未成功完成快照创建或快照数据的预提交，则说明当前快照的第一阶段即预提交阶段处理失败，进而导致当前快照的两阶段事务状态的数据同步失败，此时需要执行相应的故障处理步骤。The management and coordination component Jobmanager receives feedback from all operators about the current snapshot and pre-submission status, and after judging that all operator tasks have successfully completed their respective processing steps according to the feedback results, records the successful status of the snapshot to the state storage backend, and the snapshot is successful. The state information may include a checkpoint boundary mark, a snapshot ID, a snapshot time, and the like. Jobmanager can confirm the completion of the first phase only after each operator task successfully completes the snapshot creation and the Sink operator task successfully completes the pre-submission of synchronous data and records the transaction status. If any operator task fails to successfully complete the snapshot creation or the pre-commit of the snapshot data, it means that the processing of the first phase of the current snapshot, that is, the pre-commit phase, fails, which leads to the failure of the data synchronization of the two-phase transaction status of the current snapshot. At this time, you need to execute Corresponding troubleshooting steps.

在本发明一些实施例中，source和sink算子任务一般都会有，正常情况下，一般在source和sink之间可能会有多个Map数据处理算子用于处理数据同步。Map算子任务一般属于无状态的算子，可不记录状态信息。In some embodiments of the present invention, there are generally source and sink operator tasks. Normally, there may be multiple Map data processing operators between the source and the sink for processing data synchronization. Map operator tasks are generally stateless operators that do not need to record state information.

图2C为本发明一实施例中执行第二阶段即正式提交阶段的示意图，基于图2C执行第二阶段的处理过程包括：FIG. 2C is a schematic diagram of executing the second stage, that is, the formal submission stage, in an embodiment of the present invention. The process of executing the second stage based on FIG. 2C includes:

步骤206、作业管理器向所有算子任务发送快照完成消息；Step 206, the job manager sends a snapshot completion message to all operator tasks;

当作业管理器Jobmanager将快照创建成功的状态写入状态存储后端后，向所有算子(Source、Map、Sink)发送快照完成消息，指示各算子任务预提交阶段完成。After the job manager Jobmanager writes the status of successful snapshot creation to the state storage backend, it sends a snapshot completion message to all operators (Source, Map, Sink) to indicate the completion of the task pre-submission phase of each operator.

步骤207、sink算子任务在接收到快照完成消息后，将当前快照数据正式提交到目的数据库表中并记录事务提交状态为提交完成状态；Step 207, after receiving the snapshot completion message, the sink operator task officially submits the current snapshot data to the destination database table and records the transaction submission status as the submission completion status;

当sink算子任务接收到预提交阶段完成通知消息后，正式将自己负责的快照数据Commit提交的目的数据库表中，提交成功后向分布式内存数据库redis记录事务提交成功的状态信息，至此，与注入的检查点分界线对应的快照数据就成功被同步到目的关系数据库表中了，基于分布式事务状态机制的第二阶段过程就成功完成了。When the sink operator task receives the notification message of the completion of the pre-commit phase, it officially submits the snapshot data Commit it is responsible for to the target database table, and after the commit is successful, it records the status information of the successful commit of the transaction to the distributed memory database redis. The snapshot data corresponding to the injected checkpoint boundary line is successfully synchronized to the target relational database table, and the second-stage process based on the distributed transaction state mechanism is successfully completed.

本发明采用基于事务机制的二阶段分布式数据同步的一个重要的技术效果是能够实现在Flink集群故障情况下，从Kafka到目的数据库的数据一致性，若Flink集群发生故障，包括Jobmanager在预设时间内未收到所有算子有关当前快照的正常反馈、有算子反馈创建快照失败、sink算子任务反馈预提交失败等情况。检查点分界线标记的快照数据同步失败的情况可能发生在预提交阶段，也可能发生在正式提交阶段，以下针对两种情况说明故障恢复的方法。An important technical effect of the two-stage distributed data synchronization based on the transaction mechanism in the present invention is that it can achieve data consistency from Kafka to the destination database in the event of a Flink cluster failure. The normal feedback about the current snapshot from all operators was not received within a certain period of time, some operators reported that snapshot creation failed, sink operator task feedback failed to pre-submit, etc. The snapshot data synchronization failure marked by the checkpoint boundary line may occur in the pre-commit phase or in the formal commit phase. The following describes the fault recovery methods for the two cases.

异常情况一：数据同步作业过程故障中断时处于第一阶段预提交阶段，并且本次Checkpoint快照未成功Abnormal situation 1: When the data synchronization job process is interrupted due to a failure, it is in the first stage of pre-submission stage, and the checkpoint snapshot is not successful.

创建快照是否成功是以Jobmanager是否接收到所有算子任务发送的表示快照创建成功的快照完成反馈消息来判断，若接收到所有算子任务的快照完成反馈则Jobmanager判定为快照创建成功，此时Jobmanager会向状态存储后端记录快照创建成功状态记录信息。有些情况下Map算子任务是无状态的，可不向Jobmanager发送快照完成反馈消息，Jobmanager主要判断是否接收到所有source和sink算子任务发送的快照完成反馈即可判定快照创建是否成功。Whether the snapshot creation is successful is judged by whether the Jobmanager receives the snapshot completion feedback message sent by all operator tasks indicating that the snapshot is created successfully. If the snapshot completion feedback of all operator tasks is received, the Jobmanager determines that the snapshot creation is successful. The snapshot creation success status record information will be recorded to the state storage backend. In some cases, the Map operator task is stateless and does not need to send a snapshot completion feedback message to the Jobmanager. The Jobmanager mainly determines whether the snapshot completion feedback sent by all source and sink operator tasks is received to determine whether the snapshot creation is successful.

图3A为本发明一实施例中第一阶段快照创建未成功情况下的异常恢复步骤示意图，若在Jobmanager记录快照创建成功的状态记录信息之前，Flink集群发生故障，则在集群重新启动后执行如下步骤：Figure 3A is a schematic diagram of the abnormal recovery steps in the case of unsuccessful creation of the snapshot in the first stage in an embodiment of the present invention. If the Flink cluster fails before the Jobmanager records the status record information of the successful snapshot creation, the execution is as follows after the cluster is restarted step:

步骤310、Jobmanager从状态存储后端获取最近一次创建成功的快照(最近失败快照的前一个成功的快照)的快照信息；Step 310, the Jobmanager obtains the snapshot information of the last successfully created snapshot (the previous successful snapshot of the latest failed snapshot) from the state storage backend;

最近一次成功创建的快照的快照信息可包括对应的检查点分界线信息(快照标识)、快照偏移、快照创建时间等。The snapshot information of the last successfully created snapshot may include corresponding checkpoint boundary information (snapshot identifier), snapshot offset, snapshot creation time, and the like.

步骤311、Jobmanager协调调度Taskmanager恢复数据同步任务的执行；Step 311, Jobmanager coordinates and schedules Taskmanager to restore the execution of the data synchronization task;

步骤312、sink算子任务根据所述快照信息从状态存储后端获取最近一次创建成功的快照的事务ID及快照数据；Step 312, the sink operator task obtains the transaction ID and the snapshot data of the last successfully created snapshot from the state storage backend according to the snapshot information;

步骤313、sink算子任务根据事务ID获取事务提交状态，验证事务提交状态为已提交；Step 313, the sink operator task obtains the transaction submission status according to the transaction ID, and verifies that the transaction submission status is submitted;

该步骤可包括sink算子任务从状态存储后端获取缓存的序列化的快照数据，通过反序列化处理恢复出事务ID及快照数据，然后根据事务ID从redis中判断最近一次创建成功的快照对应的事务提交是否成功，若成功则继续执行后续步骤，若不成功则需要将对应的快照数据提交到目的数据库表中，然后执行后续步骤。This step may include the sink operator task obtaining the cached serialized snapshot data from the state storage backend, recovering the transaction ID and snapshot data through deserialization processing, and then judging from redis the latest successfully created snapshot corresponding to the transaction ID Whether the transaction submission is successful, if successful, continue to execute the next steps, if not, you need to submit the corresponding snapshot data to the destination database table, and then execute the next steps.

步骤314、source算子任务从状态存储后端获取下一次待消费的快照数据(最近失败快照数据)偏移，重新开始创建快照等数据同步过程。Step 314, the source operator task obtains the offset of the next snapshot data to be consumed (the latest failed snapshot data) from the state storage backend, and restarts the data synchronization process such as creating a snapshot.

图3B为本发明一实施例中第一阶段快照创建成功情况下的异常恢复步骤示意图，若在Jobmanager记录最近一次快照创建成功后，Flink集群发生故障，此种情况下可能有部分sink算子任务提交事务成功，但第二阶段还未成功完成，则在集群重新启动后执行如下步骤：Figure 3B is a schematic diagram of the abnormal recovery steps when the first-stage snapshot is successfully created in an embodiment of the present invention. If the Flink cluster fails after the Jobmanager records that the latest snapshot is successfully created, there may be some sink operator tasks in this case If the committed transaction is successful, but the second phase has not been successfully completed, perform the following steps after the cluster restarts:

步骤320、Jobmanager从状态存储后端获取最近一次创建成功的快照的快照信息；Step 320, the Jobmanager obtains the snapshot information of the last successfully created snapshot from the state storage backend;

步骤321、Jobmanager协调调度Taskmanager恢复数据同步任务的执行；Step 321, Jobmanager coordinates and schedules the execution of Taskmanager recovery data synchronization task;

步骤322、sink算子任务根据所述快照信息从状态存储后端获取最近一次创建成功的快照的事务ID及快照数据；Step 322, the sink operator task obtains the transaction ID and the snapshot data of the last successfully created snapshot from the state storage backend according to the snapshot information;

步骤323、sink算子任务根据事务ID获取事务提交状态；Step 323, the sink operator task obtains the transaction submission status according to the transaction ID;

步骤324、在事务未提交的状态下(快照数据未被正式提交的目的数据库表中)，sink算子任务将本次恢复后的未提交事务提交到目的数据库表中；Step 324, in the state of transaction uncommitted (snapshot data has not been formally submitted in the destination database table), the sink operator task submits the recovered uncommitted transaction to the destination database table;

步骤325、source算子任务从状态存储后端(通过反序列化操作)获取下一待消费的起点offset(同一快照不同数据块的偏移)继续进行数据同步。Step 325, the source operator task obtains the next starting point offset (offset of different data blocks of the same snapshot) to be consumed from the state storage backend (through deserialization operation) to continue data synchronization.

一个检查点分界线标识的快照对应Kafka中的一批数据，多个source可并行消费Kafka中的这一批数据，一个source算子任务可能会处理属于同一快照的这一批数据中的多个数据块，各算子任务可根据检查点分界线来识别一个快照，通过数据块标识来识别所处理的是哪个数据块。A snapshot marked by a checkpoint boundary corresponds to a batch of data in Kafka. Multiple sources can consume this batch of data in Kafka in parallel. A source operator task may process multiple data in this batch of data belonging to the same snapshot. For data blocks, each operator task can identify a snapshot according to the checkpoint boundary, and identify which data block is being processed through the data block identifier.

若Flink集群在快照数据同步过程的第二阶段产生故障，则说明快照创建已经成功，因此处理的过程与图3B的过程相似，sink算子任务根据事务ID判断事务提交是否成功，若未成功则正式提交事务，若成功则继续消费Kafka中的同一批次的快照数据执行数据同步。If the Flink cluster fails in the second stage of the snapshot data synchronization process, it means that the snapshot creation has been successful. Therefore, the processing process is similar to the process in Figure 3B. The sink operator task judges whether the transaction submission is successful based on the transaction ID. If not, then Formally submit the transaction, and if successful, continue to consume the same batch of snapshot data in Kafka to perform data synchronization.

图4为本发明一实施例提供的用于实现本发明提供的事务性分布式数据同步方法的电子设备结构示意图，该设备400包括：诸如中央处理单元(CPU)的处理器410、通信总线420、通信接口440以及存储器430。其中，处理器410与存储器430可以通过通信总线420相互通信。存储器430内存储有计算机程序，当该计算机程序被处理器410执行时即可实现本发明提供的事务性分布式数据同步方法中的一个或多个步骤的功能。4 is a schematic structural diagram of an electronic device for implementing the transactional distributed data synchronization method provided by the present invention according to an embodiment of the present invention. The device 400 includes: a processor 410 such as a central processing unit (CPU), a communication bus 420 , a communication interface 440 and a memory 430. Wherein, the processor 410 and the memory 430 may communicate with each other through the communication bus 420 . A computer program is stored in the memory 430, and when the computer program is executed by the processor 410, the functions of one or more steps in the transactional distributed data synchronization method provided by the present invention can be realized.

存储器是指基于某种存储介质用于存储计算机程序和/或数据的装置，它可以是易失性存储器(Volatile Memory，VM，常称为内存)，也可以是非易失性存储器(Non-Volatile Memory，NVM)。内存是指与处理器直接交换数据的内部存储器，它可以随时读写数据，而且速度很快，作为操作系统和其它运行程序的临时数据的存储介质。内存可以是同步动态随机存取内存(Synchronous Dynamic Random Access Memory，SDRAM)、动态随机存取内存(Dynamic Random Access Memory，DRAM)等。非易失性存储器是指采用持久化存储介质的存储器，具有容量大和可持久保存数据的特性，可以是存储级存储器(StorageClass Memory，SCM)、固态硬盘(Solid State Disk，SSD)、NAND闪存、磁盘等。SCM是业界对介于内存与闪存之间的新存储介质的统称，是一种同时结合持久化存储特性与内存特性的复合型储存技术，存取速度慢于DRAM快于SSD硬盘。Memory refers to a device based on a certain storage medium for storing computer programs and/or data. It can be a volatile memory (Volatile Memory, VM, often called memory) or a non-volatile memory (Non-Volatile Memory, NVM). Memory refers to the internal memory that directly exchanges data with the processor. It can read and write data at any time, and the speed is very fast. It is used as a storage medium for temporary data of the operating system and other running programs. The memory may be Synchronous Dynamic Random Access Memory (Synchronous Dynamic Random Access Memory, SDRAM), Dynamic Random Access Memory (Dynamic Random Access Memory, DRAM), etc. Non-volatile memory refers to a memory using a persistent storage medium, which has the characteristics of large capacity and persistent data storage. It can be storage class memory (StorageClass Memory, SCM), solid state disk (Solid State Disk, SSD), NAND flash memory, disk etc. SCM is the industry's general term for new storage media between memory and flash memory. It is a composite storage technology that combines persistent storage features and memory features. The access speed is slower than DRAM and faster than SSD hard drives.

处理器可以是通用处理器，包括中央处理器(Central Processing Unit，CPU)、网络处理器(Network Processor，NP)等；还可以是数字信号处理器(Digital SignalProcessing，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现场可编程门阵列(Field-Programmable Gate Array，FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processing, DSP), an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

应当认识到，本发明的实施例可以由计算机硬件、硬件和软件的组合、或者通过存储在非暂时性(或称为非持久性)存储器中的计算机指令来实现或实施。所述方法可以使用标准编程技术，包括配置有计算机程序的非暂时性存储介质在计算机程序中实现，其中如此配置的存储介质使得计算机以特定和预定义的方式操作。每个程序可以以高级过程或面向对象的编程语言来实现以与计算机系统通信。然而，若需要，该程序可以以汇编或机器语言实现。在任何情况下，该语言可以是编译或解释的语言。此外，为此目的该程序能够在编程的专用集成电路上运行。此外，可按任何合适的顺序来执行本发明描述的过程的操作，除非本发明另外指示或以其他方式明显地与上下文矛盾。本发明描述的过程(或变型和/或其组合)可在配置有可执行指令的一个或多个计算机系统的控制下执行，并且可作为共同地在一个或多个处理器上执行的代码(例如，可执行指令、一个或多个计算机程序或一个或多个应用)、由硬件或其组合来实现。所述计算机程序包括可由一个或多个处理器执行的多个指令。It should be appreciated that embodiments of the present invention may be realized or implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in non-transitory (or referred to as non-persistent) memory. The methods can be implemented in a computer program using standard programming techniques, including a non-transitory storage medium configured with a computer program, wherein the storage medium so configured causes the computer to operate in a specific and predefined manner. Each program can be implemented in a high-level procedural or object-oriented programming language to communicate with the computer system. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on an application specific integrated circuit programmed for this purpose. Furthermore, operations of processes described herein may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) can be performed under the control of one or more computer systems configured with executable instructions and as code that is commonly executed on one or more processors ( For example, executable instructions, one or more computer programs or one or more applications), by hardware or a combination thereof. The computer program comprises a plurality of instructions executable by one or more processors.

进一步，所述方法可以在可操作地连接至合适的任何类型的计算平台中实现，包括但不限于个人电脑、迷你计算机、主框架、工作站、网络或分布式计算环境、单独的或集成的计算机平台、或者与带电粒子工具或其它成像装置通信等等。本发明的各方面可以以存储在非暂时性存储介质或设备上的机器可读代码来实现，无论是可移动的还是集成至计算平台，如硬盘、光学读取和/或写入存储介质、RAM、ROM等，使得其可由可编程计算机读取，当存储介质或设备由计算机读取时可用于配置和操作计算机以执行在此所描述的过程。此外，机器可读代码，或其部分可以通过有线或无线网络传输。当此类媒体包括结合微处理器或其他数据处理器实现上文所述步骤的指令或程序时，本发明所述的发明包括这些和其他不同类型的非暂时性计算机可读存储介质。当根据本发明所述的方法和技术编程时，本发明还包括计算机本身。Further, the method can be implemented in any type of computing platform operably connected to a suitable one, including but not limited to personal computer, minicomputer, main frame, workstation, network or distributed computing environment, stand-alone or integrated computer platform, or communicate with charged particle tools or other imaging devices, etc. Aspects of the invention can be implemented as machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or written storage medium, RAM, ROM, etc., such that they are readable by a programmable computer, when the storage medium or device is read by the computer, can be used to configure and operate the computer to perform the processes described herein. Additionally, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other various types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein.

以上所述仅为本发明的实施例而已，并不用于限制本发明。对于本领域技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only examples of the present invention, and are not intended to limit the present invention. Various modifications and variations of the present invention will occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.

Claims

1. A method for transactional distributed data synchronization, the method being applied to a Flink cluster, the method comprising:

the method comprises the steps that a flank cluster is used for synchronizing data in a distributed data source to a target keyless relational database table, namely a target database table, and the process is divided into a pre-submitting stage and a formal submitting stage;

in the pre-commit stage, snapshot creation and pre-commit of snapshot data are completed, and an output operator task sink is responsible for pre-commit of the snapshot data to a target database table and records a snapshot completion state and a transaction pre-commit state;

in the formal commit phase, the job manager Jobmanager is responsible for instructing sink operator tasks to formally commit snapshot data to the destination database table after the snapshot is created successfully and recording transaction commit results.

2. The method of claim 1, wherein the pre-commit phase comprises:

The job manager Jobmanager marks the currently synchronized snapshot data by injecting checkpoint boundaries;

the input operator task source records snapshot offsets after detecting checkpoint boundaries;

after detecting a check point boundary, the sink operator task is responsible for caching current snapshot data and pre-submitting the current snapshot data to a target database table and recording a transaction pre-submitting state;

each operator task sends snapshot completion feedback to the Jobmanager after the snapshot data is successfully processed, and the sink operator task sends pre-submission result feedback to the Jobmanager after the snapshot data is successfully submitted;

and after the Jobmanager receives snapshot completion feedback of all operator tasks and pre-submitting result feedback of sink operator tasks, recording a snapshot success state.

3. The method of claim 2, wherein the step of formally submitting the phase comprises:

after Jobmanager judges that creation is successful and pre-submission is successful according to snapshot completion feedback and pre-submission result feedback, snapshot completion information is sent to all operator tasks;

and after receiving the snapshot completion message, the output operator task sink formally submits snapshot data to a target database table and records a transaction submitting result whether the transaction formally submits successfully or not.

4. The method of claim 2, wherein if the Flink cluster fails during the pre-commit phase and results in unsuccessful snapshot creation, the method further comprises the step of failure recovery:

jobmanager obtains snapshot information of a snapshot which is successfully created last time from a state storage back end;

jobmanager coordinates the task manager to resume the data synchronization task;

acquiring a transaction identification ID and snapshot data of a snapshot which is successfully created last time from a state storage back end according to the snapshot information by a sink operator task;

and the sink operator task acquires a transaction submitting state according to the transaction identification ID, verifies whether the snapshot data of the snapshot which is successfully created in the last time is successfully submitted to a target database table, and if so, the source operator task acquires the snapshot data offset to be consumed next time, and resumes the data synchronization process.

5. The method of claim 3, wherein if the flank cluster failed during the pre-commit phase and the last snapshot creation was successful, the method further comprises the step of recovering from the failure:

jobmanager obtains snapshot information of a snapshot which is successfully created last time;

jobmanager coordinates the task manager to resume execution of the data synchronization task;

Acquiring a transaction identification ID and snapshot data of a snapshot which is successfully created last time according to the snapshot information by a sink operator task;

the sink operator task acquires the transaction submitting state according to the transaction ID and judges whether the corresponding snapshot data is formally submitted to the target database table;

under the condition that the transaction is not formally submitted, the sink operator task submits snapshot data of the snapshot which is successfully created last time to a target database table;

the source operator task acquires the next snapshot data offset to be consumed and continues to perform data synchronization.

6. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the Jobmanager and each operator task record snapshot information in a state storage back end, wherein the state storage back end is a distributed file system;

the distributed data source is a distributed publish-subscribe messaging system.

7. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the sink operator task records the transaction state in a distributed memory database.

8. The transactional distributed data synchronization system is characterized by comprising a Flink cluster, a state storage back end and a distributed memory database;

the Flink cluster comprises a job manager Jobmanager and a task manager Taskanager, wherein at least an input operator task source and an output operator task sink are operated in the task manager Taskanager;

The Flink cluster is used for synchronizing data in the distributed data source into a target database table which is a target database table without a main key relation, and the data synchronization process is divided into a pre-submission stage and a formal submission stage;

in the pre-submitting stage, snapshot creation and pre-submitting of snapshot data are completed, and an output operator sink operator task is responsible for pre-submitting the snapshot data to a target database table and recording a snapshot completion state and a transaction pre-submitting state;

in the formal submitting stage, the job manager Jobmanager is responsible for instructing a sink operator task to formally submit snapshot data to a target database table after the snapshot is successfully created and recording a transaction submitting result;

the state storage back end is used for storing snapshot information and caching snapshot data;

the distributed memory database is used for storing transaction states.

9. The system of claim 8, wherein in the pre-commit phase:

the Jobmanager marks currently synchronized snapshot data through injecting check point boundaries;

the source operator task records snapshot offsets after detecting a checkpoint boundary;

the Jobmanager is also used for receiving snapshot completion feedback of all operator tasks and pre-submitting result feedback of sink operator tasks and recording snapshot success states.

10. The system of claim 9, wherein in the formal commit phase:

the Jobmanager is also used for sending snapshot completion information to all operator tasks after the creation success and the pre-submission success are judged according to snapshot completion feedback and pre-submission result feedback;

and after receiving the snapshot completion message, the output operator task sink formally submits snapshot data to a target database table and records a transaction submission result of whether the transaction formally submits successfully or not.

11. A transactional distributed data synchronization apparatus for synchronizing data in a distributed data source to a destination keyless relational database table, i.e., a destination database table, via a flank cluster, the apparatus comprising:

the pre-submitting module is used for completing the processing steps of the pre-submitting stage in the data synchronization process and completing snapshot creation and pre-submitting of snapshot data; the method comprises the steps of pre-submitting snapshot data to a target database table through an output operator task sink and recording a snapshot completion state and a transaction pre-submitting state;

The formal submitting module is used for completing the processing steps of the formal submitting stage in the data synchronization process and formally submitting the snapshot data in a target database table in a transaction mode; and after the snapshot is successfully created, the job manager Jobmanager instructs the sink operator task to formally submit snapshot data to the target database table, record the transaction submission state and record the transaction submission result.

12. An electronic device is characterized by comprising a processor, a communication interface, a storage medium and a communication bus, wherein the processor, the communication interface and the storage medium are communicated with each other through the communication bus;

a storage medium storing a computer program;

a processor for carrying out the method steps of any one of claims 1-7 when executing a computer program stored on a storage medium.

13. A storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1 to 7.