WO2024098696A1

WO2024098696A1 - Data recovery method, apparatus and device, and readable storage medium

Info

Publication number: WO2024098696A1
Application number: PCT/CN2023/093083
Authority: WO
Inventors: 李飞龙; 许永良; 孙明刚
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2022-11-11
Filing date: 2023-05-09
Publication date: 2024-05-16
Anticipated expiration: 2025-05-11
Also published as: CN115454727B; CN115454727A

Abstract

The present application discloses a data recovery method, apparatus and device, and a readable storage medium. The method comprises: backing up data of a master node in a secondary node of a multi-control node so as to use data in the secondary node as mirror image data; monitoring whether the number of faulty disks of a RAID array in the master node exceeds a fault tolerance of the RAID array; if the number of faulty disks of the RAID array exceeds the fault tolerance of the RAID array, sending information of lost data in the faulty disks to the secondary node, and the secondary node acquiring, from the mirror image data, data corresponding to the information and sending the data; receiving the data corresponding to the information, and writing the data corresponding to the information into corresponding partitions of hot spares corresponding to the faulty disks; and replacing the corresponding faulty disk with the hot spares so as to reconstruct the RAID array with normal disks. Mirror image data is added to a secondary node, and the mirror image data in the secondary node is used for data recovery after the number of faulty discs exceeds a fault tolerance, thereby improving the data reliability of a storage system.

Description

Data recovery method, device, equipment and readable storage medium

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请要求于2022年11月11日提交中国专利局，申请号为202211409858.4，申请名称为“一种数据恢复方法、装置、设备及可读存储介质”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。This application claims priority to a Chinese patent application filed with the China Patent Office on November 11, 2022, with application number 202211409858.4 and application name “A data recovery method, device, equipment and readable storage medium”, all contents of which are incorporated by reference in this application.

Technical Field

本申请涉及数据存储技术领域，更具体地说，涉及一种数据恢复方法、装置、设备及可读存储介质。The present application relates to the technical field of data storage, and more specifically, to a data recovery method, device, equipment and readable storage medium.

Background technique

社会现已进入大数据时代，海量的数据需要安全可靠地存储下来，很多行业不仅对需要存储的数据量呈指数级增长，而且对存储数据的可靠性要求已达到极致，往往会由于丢失一个小小的数据而导致致命的业务灾难，例如银行业和军工领域等等。所以，存储系统一直在增加数据可靠性和提高I/O(Input/Output，输入/输出)性能两个方面寻找突破。Society has entered the era of big data, where massive amounts of data need to be stored securely and reliably. In many industries, not only is the amount of data that needs to be stored growing exponentially, but the reliability of stored data has reached an extreme. Losing a small amount of data can often lead to fatal business disasters, such as in the banking and military fields. Therefore, storage systems have been looking for breakthroughs in increasing data reliability and improving I/O (Input/Output) performance.

在增加数据可靠性方面，目前业界采用RAID(Redundant Arrays of Independent Disks，磁盘阵列)技术提高数据可靠性，利用RAID阵列中的冗余磁盘来恢复故障盘的数据。在提高I/O性能方面，业界已利用多控节点组成集群，具体地，存储系统中为了保证系统的高可用性，会使用至少两个节点组成一个IOGROUP(输入输出组)，至少两个节点分别连接双端口硬盘的一个端口，IOGROUP中的至少两个节点互为对端节点，一个或多个IOGROUP组成集群，集群中节点可相互通信。其中，主节点负责处理主机的I/O请求，辅助节点负责存储系统的后台任务(例如RAID阵列初始化、巡检和重构任务等等)，以此来提供存储的I/O性能。但是，在主节点采用RAID技术进行I/O数据存储时，若存储系统中故障盘数量超过RAID阵列所能恢复的最大故障盘数量，则无法通过RAID阵列的内部机制进行故障盘数据的恢复，从而导致存储系统的数据可靠性比较低。In terms of increasing data reliability, the industry currently uses RAID (Redundant Arrays of Independent Disks) technology to improve data reliability, and uses redundant disks in the RAID array to recover data from failed disks. In terms of improving I/O performance, the industry has used multi-control nodes to form a cluster. Specifically, in order to ensure the high availability of the system, at least two nodes will be used to form an IOGROUP (input and output group), at least two nodes are connected to one port of a dual-port hard disk respectively, at least two nodes in the IOGROUP are each other's peer nodes, and one or more IOGROUPs form a cluster, and the nodes in the cluster can communicate with each other. Among them, the master node is responsible for processing the I/O requests of the host, and the auxiliary node is responsible for the background tasks of the storage system (such as RAID array initialization, inspection and reconstruction tasks, etc.), so as to provide storage I/O performance. However, when the master node uses RAID technology for I/O data storage, if the number of failed disks in the storage system exceeds the maximum number of failed disks that the RAID array can recover, the failed disk data cannot be recovered through the internal mechanism of the RAID array, resulting in low data reliability of the storage system.

发明内容Summary of the invention

有鉴于此，本申请的目的是提供一种数据恢复方法、装置、设备及可读存储介质，用于恢复故障盘数据，以提高数据可靠性。In view of this, the purpose of the present application is to provide a data recovery method, device, equipment and readable storage medium for recovering failed disk data to improve data reliability.

为了实现上述目的，本申请提供如下技术方案：In order to achieve the above objectives, this application provides the following technical solutions:

一种数据恢复方法，包括：A data recovery method, comprising:

在多控节点中的辅助节点中备份主控节点中的数据，以将辅助节点中的数据作为主控节点的镜像数据； Backing up data in the main control node in the auxiliary node among the multiple control nodes, so that the data in the auxiliary node is used as mirror data of the main control node;

监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量；Monitor whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array;

若RAID阵列的故障盘个数超过RAID阵列的容错量，则将故障盘中丢失数据的信息发送至辅助节点，由辅助节点从镜像数据中获取信息对应的数据并进行发送；If the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, the information about the lost data in the failed disks is sent to the auxiliary node, which obtains the data corresponding to the information from the mirror data and sends it;

接收信息对应的数据，将信息对应的数据写入故障盘对应的热备盘的相应分区中；Receive data corresponding to the information, and write the data corresponding to the information into a corresponding partition of a hot spare disk corresponding to the failed disk;

利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列。Use the hot spare disk to replace the corresponding failed disk to re-form the RAID array with the normal disk.

本申请一些实施例中，在多控节点中的辅助节点中备份主控节点中的数据，包括：In some embodiments of the present application, backing up data in a master control node in a secondary node among multiple control nodes includes:

在辅助节点中备份主控节点中的逻辑卷，以得到对应的镜像逻辑卷；逻辑卷及镜像逻辑卷中包含多个块，块与磁盘中的分块存在映射关系，主控节点及辅助节点中均存储有映射关系；Back up the logical volume in the master node in the auxiliary node to obtain the corresponding mirror logical volume; the logical volume and the mirror logical volume contain multiple blocks, and there is a mapping relationship between the blocks and the blocks in the disk. The mapping relationship is stored in both the master node and the auxiliary node;

将故障盘中丢失数据的信息发送至辅助节点，包括：Send information about lost data in the failed disk to the secondary node, including:

根据故障盘包含的分块及映射关系，形成逻辑卷对应的数据丢失元数据，将数据丢失元数据发送至辅助节点；数据丢失元数据中包含逻辑卷包含的块是否丢失的信息；According to the blocks and mapping relationship contained in the failed disk, data loss metadata corresponding to the logical volume is formed, and the data loss metadata is sent to the auxiliary node; the data loss metadata includes information on whether the blocks contained in the logical volume are lost;

辅助节点从镜像数据中获取信息对应的数据并进行发送，包括：The auxiliary node obtains the data corresponding to the information from the mirror data and sends it, including:

辅助节点根据映射关系及数据丢失元数据获取丢失数据的分块编号，并将镜像逻辑卷中与分块编号对应的数据发送至主控节点。The auxiliary node obtains the block number of the lost data according to the mapping relationship and the data loss metadata, and sends the data corresponding to the block number in the mirrored logical volume to the master node.

本申请一些实施例中，根据故障盘包含的分块及映射关系，形成逻辑卷对应的数据丢失元数据，包括：In some embodiments of the present application, data loss metadata corresponding to a logical volume is formed according to the blocks and mapping relationships contained in the failed disk, including:

根据故障盘包含的分块及映射关系，形成逻辑卷对应且以位图为数据组织方式的数据丢失元数据；其中，位图中的每个bit位表示逻辑卷包含的相应块是否丢失。According to the blocks and mapping relationship contained in the failed disk, data loss metadata corresponding to the logical volume is formed with bitmap as the data organization method; wherein each bit in the bitmap indicates whether the corresponding block contained in the logical volume is lost.

本申请一些实施例中，在利用热备盘替换对应的故障盘之前，还包括：In some embodiments of the present application, before using the hot spare disk to replace the corresponding failed disk, the method further includes:

判断RAID阵列中的各条带是否需要重新计算校验分块；Determine whether each stripe in the RAID array needs to recalculate the check blocks;

若存在需要重新计算校验分块的条带，则根据条带中的各分块计算校验分块，并将计算得到的校验分块写入故障盘对应的热备盘的相应分区中。If there is a stripe for which the check blocks need to be recalculated, the check blocks are calculated according to the blocks in the stripe, and the calculated check blocks are written to the corresponding partition of the hot spare disk corresponding to the failed disk.

本申请一些实施例中，接收信息对应的数据，将信息对应的数据写入故障盘对应的热备盘的相应分区中，包括：In some embodiments of the present application, receiving data corresponding to information, and writing the data corresponding to the information into a corresponding partition of a hot spare disk corresponding to a failed disk, includes:

逐分块接收信息对应的数据，并逐分块将信息对应的数据写入故障盘对应的热备盘的相应分区中。The data corresponding to the information is received block by block, and the data corresponding to the information is written block by block into a corresponding partition of the hot spare disk corresponding to the failed disk.

本申请一些实施例中，若RAID阵列的故障盘个数超过RAID阵列的容错量，则还包括：In some embodiments of the present application, if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, the method further includes:

将主机的I/O请求重定向到辅助节点；Redirect the host's I/O requests to the secondary node;

在利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列之后，还包括：After the hot spare disk is used to replace the corresponding failed disk to re-form the RAID array with the normal disk, the following steps are also included:

将主机的I/O请求重定向到主控节点。Redirect the host's I/O requests to the master node.

本申请一些实施例中，在将主机的I/O请求重定向到辅助节点时，还包括：In some embodiments of the present application, when redirecting the I/O request of the host to the auxiliary node, it also includes:

将主机新发送的数据存储主控节点中，以将主控节点中存储的主机新发送的数据作为辅助节点的镜像数据。The data newly sent by the host is stored in the master control node, so that the data newly sent by the host stored in the master control node is used as the mirror data of the auxiliary node.

本申请一些实施例中，若RAID阵列的故障盘个数未超过RAID阵列的容错量，则还包括：In some embodiments of the present application, if the number of failed disks in the RAID array does not exceed the fault tolerance of the RAID array, the method further includes:

利用RAID阵列中各条带包含的正常磁盘的分块进行计算，得到故障盘对应的热备盘的分块；The blocks of the normal disks contained in each stripe in the RAID array are used for calculation to obtain the blocks of the hot spare disk corresponding to the failed disk;

当恢复故障盘丢失的所有数据后，利用热备盘替换故障盘，以与正常磁盘重新组成 RAID阵列。After all data lost on the failed disk is recovered, the hot spare disk is used to replace the failed disk to re-form the system with the normal disk. RAID array.

本申请一些实施例中，监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量，包括：In some embodiments of the present application, monitoring whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array includes:

定时监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量。Regularly monitor whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array.

本申请一些实施例中，还包括：In some embodiments of the present application, it also includes:

若RAID阵列的故障盘个数未超过RAID阵列的容错量，则返回执行定时监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量的步骤。If the number of failed disks in the RAID array does not exceed the fault tolerance of the RAID array, the process returns to the step of periodically monitoring whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array.

本申请一些实施例中，若辅助节点的个数大于1，则将故障盘中丢失数据的信息发送至辅助节点，包括：In some embodiments of the present application, if the number of auxiliary nodes is greater than 1, information about lost data in the failed disk is sent to the auxiliary node, including:

将故障盘中丢失数据的信息发送至根据预设选择策略从多个辅助节点中选择出的一个辅助节点中。The information about lost data in the failed disk is sent to an auxiliary node selected from a plurality of auxiliary nodes according to a preset selection strategy.

本申请一些实施例中，在利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列之后，还包括：In some embodiments of the present application, after the hot spare disk is used to replace the corresponding failed disk to re-form a RAID array with the normal disk, the following is further included:

定时清理辅助节点中的镜像数据。Regularly clean up the mirror data in the secondary node.

一种数据恢复装置，包括：A data recovery device, comprising:

备份模块，用于在多控节点中的辅助节点中备份主控节点中的数据，以将辅助节点中的数据作为主控节点的镜像数据；A backup module, used to back up the data in the main control node in the auxiliary node among the multiple control nodes, so as to use the data in the auxiliary node as the mirror data of the main control node;

监控模块，用于监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量；A monitoring module is used to monitor whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array;

发送模块，用于若RAID阵列的故障盘个数超过RAID阵列的容错量，则将故障盘中丢失数据的信息发送至辅助节点，由辅助节点从镜像数据中获取信息对应的数据并进行发送；A sending module is used to send information about lost data in the failed disks to the auxiliary node if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, and the auxiliary node obtains data corresponding to the information from the mirror data and sends it;

写入模块，用于接收信息对应的数据，将信息对应的数据写入故障盘对应的热备盘的相应分区中；A writing module, used for receiving data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of the hot spare disk corresponding to the failed disk;

第一替换模块，用于利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列。The first replacement module is used to replace the corresponding failed disk with the hot spare disk to re-form a RAID array with the normal disk.

一种数据恢复设备，包括：A data recovery device, comprising:

存储器，用于存储计算机程序；Memory for storing computer programs;

处理器，用于执行计算机程序时实现如上述任一项的数据恢复方法的步骤。A processor is used to implement the steps of any of the above data recovery methods when executing a computer program.

一种非易失性可读存储介质，非易失性可读存储介质中存储有计算机程序，计算机程序被处理器执行时实现如上述任一项的数据恢复方法的步骤。A non-volatile readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of any of the above data recovery methods are implemented.

本申请提供了一种数据恢复方法、装置、设备及可读存储介质，其中，该方法包括：在多控节点中的辅助节点中备份主控节点中的数据，以将辅助节点中的数据作为主控节点的镜像数据；监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量；若RAID阵列的故障盘个数超过RAID阵列的容错量，则将故障盘中丢失数据的信息发送至辅助节点，由辅助节点从镜像数据中获取信息对应的数据并进行发送；接收信息对应的数据，将信息对应的数据写入故障盘对应的热备盘的相应分区中；利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列。The present application provides a data recovery method, device, equipment and readable storage medium, wherein the method comprises: backing up data in a main control node in a secondary node in a multi-control node, so as to use the data in the secondary node as mirror data of the main control node; monitoring whether the number of failed disks in a RAID array in the main control node exceeds the fault tolerance of the RAID array; if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, sending information about lost data in the failed disk to the secondary node, and the secondary node obtaining data corresponding to the information from the mirror data and sending the information; receiving data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of a hot spare disk corresponding to the failed disk; replacing the corresponding failed disk with the hot spare disk, so as to re-form a RAID array with normal disks.

本申请公开的上述技术方案，在多控节点中的辅助节点中备份主控节点中的数据，以将辅助节点中备份的数据作为主控节点的镜像数据。当主控节点中RAID阵列的故障盘个数超过RAID阵列的容错量之后，则将故障盘中丢失数据的信息发送至辅助节点，由辅助节点从镜像数据中获取丢失数据的信息对应的数据并将该数据发送出去。在接收到辅助节点发送的数据后，将该数据写入故障盘对应的热备盘的相应分区中，然后，利用热备盘替换故障盘而与正常盘重新组成RAID阵列。由此可知，本申请通过在辅助节点中增加镜像数据，并在故障盘数量超过容错量之后利用辅助节点中的镜像数据进行数据恢复，从而可以提高存储系统的数据可靠性。The above technical solution disclosed in the present application backs up the data in the main control node in the auxiliary node in the multi-control node, so that the data backed up in the auxiliary node is used as the mirror data of the main control node. When the number of failed disks in the RAID array in the main control node exceeds the fault tolerance of the RAID array, the information of the lost data in the failed disk is sent to the auxiliary node, and the auxiliary node obtains the data corresponding to the information of the lost data from the mirror data and sends the data out. After the data is recovered, the data is written to the corresponding partition of the hot spare disk corresponding to the failed disk, and then the hot spare disk is used to replace the failed disk and re-compose the RAID array with the normal disk. It can be seen that the application can improve the data reliability of the storage system by adding mirror data in the auxiliary node and using the mirror data in the auxiliary node for data recovery after the number of failed disks exceeds the fault tolerance.

BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据提供的附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are merely embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on the provided drawings without paying any creative work.

图1为本申请实施例提供的一种数据恢复方法的流程图；FIG1 is a flow chart of a data recovery method provided by an embodiment of the present application;

图2为本申请实施例提供的另一种数据恢复方法的流程图；FIG2 is a flow chart of another data recovery method provided in an embodiment of the present application;

图3为本申请实施例提供的主控节点的结构示意图；FIG3 is a schematic diagram of the structure of a master control node provided in an embodiment of the present application;

图4为本申请实施例提供的故障盘个数超过容错量时重构过程的示意图；FIG4 is a schematic diagram of a reconstruction process when the number of failed disks exceeds the fault tolerance provided by an embodiment of the present application;

图5为本申请实施例提供的主控节点的另一种结构示意图；FIG5 is another schematic diagram of the structure of a master control node provided in an embodiment of the present application;

图6为本申请实施例提供的辅助节点示意图；FIG6 is a schematic diagram of an auxiliary node provided in an embodiment of the present application;

图7为本申请实施例提供的在故障盘个数未超过容错量时重构恢复故障盘数据的示意图；7 is a schematic diagram of reconstructing and restoring data of a failed disk when the number of failed disks does not exceed the fault tolerance provided by an embodiment of the present application;

图8为本申请实施例提供的一种数据恢复装置的结构示意图；FIG8 is a schematic diagram of the structure of a data recovery device provided in an embodiment of the present application;

图9为本申请实施例提供的一种数据恢复设备的结构示意图。FIG. 9 is a schematic diagram of the structure of a data recovery device provided in an embodiment of the present application.

Detailed ways

本申请的核心是提供一种数据恢复方法、装置、设备及可读存储介质，用于恢复故障盘数据，以提高数据可靠性。The core of the present application is to provide a data recovery method, device, equipment and readable storage medium for recovering failed disk data to improve data reliability.

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

参见图1和图2，其中，图1示出了本申请实施例提供的一种数据恢复方法的流程图，图2示出了本申请实施例提供的另一种数据恢复方法的流程图。本申请实施例提供的一种数据恢复方法，可以包括：Referring to FIG. 1 and FIG. 2 , FIG. 1 shows a flow chart of a data recovery method provided in an embodiment of the present application, and FIG. 2 shows a flow chart of another data recovery method provided in an embodiment of the present application. A data recovery method provided in an embodiment of the present application may include:

S11：在多控节点中的辅助节点中备份主控节点中的数据，以将辅助节点中的数据作为主控节点的镜像数据。S11: backing up the data in the main control node in the auxiliary node among the multiple control nodes, so as to use the data in the auxiliary node as the mirror data of the main control node.

需要说明的是，本申请的执行主体可以为多控节点(包含主控节点和辅助节点)中的主控节点，也可以为存储系统，本申请以执行主体为多控节点为例进行说明。It should be noted that the execution subject of the present application can be a master control node among multiple control nodes (including a master control node and auxiliary nodes), or it can be a storage system. The present application is explained using the execution subject as a multiple control node as an example.

首先，主控节点可以处理主机下发的I/O请求，并对I/O请求对应的数据(具体即为I/O数据)进行存储，同时会在多控节点中的辅助节点中备份主控节点中的数据，也即在辅助节点中备份主控节点所存储的与主机下发的I/O请求对应的数据。其中，备份在辅助节点中的数据具体可以存储在辅助节点的内存中，以便于快速进行数据获取和传输。 First, the master control node can process the I/O request sent by the host and store the data corresponding to the I/O request (specifically, I/O data). At the same time, the data in the master control node can be backed up in the auxiliary node in the multi-control node, that is, the data stored in the master control node corresponding to the I/O request sent by the host can be backed up in the auxiliary node. The data backed up in the auxiliary node can be specifically stored in the memory of the auxiliary node to facilitate rapid data acquisition and transmission.

通过在辅助节点中备份主控节点中的数据可以实现将辅助节点中所备份的数据作为主控节点的镜像数据，以便于后续可以利用辅助节点中的径向数据来重构恢复出主控节点中故障盘中的数据，从而有效提高存储系统的数据可靠性。By backing up the data in the master node in the auxiliary node, the data backed up in the auxiliary node can be used as the mirror data of the master node, so that the radial data in the auxiliary node can be used to reconstruct and restore the data in the failed disk in the master node, thereby effectively improving the data reliability of the storage system.

S12：监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量；若RAID阵列的故障盘个数超过RAID阵列的容错量，则执行步骤S13；若RAID阵列的故障盘个数未超过RAID阵列的容错量，则返回步骤S12。S12: Monitor whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array; if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, execute step S13; if the number of failed disks in the RAID array does not exceed the fault tolerance of the RAID array, return to step S12.

主控节点在处理主机下发的I/O请求的同时，或者在处理主机下发的I/O请求之后，主控节点可以监控获取主控节点中RAID阵列的故障盘的个数，并判断RAID阵列中故障盘个数是否超过RAID阵列的容错量(即RAID阵列所能恢复的最大故障盘数量)。While processing the I/O request sent by the host, or after processing the I/O request sent by the host, the master control node can monitor the number of failed disks in the RAID array in the master control node, and determine whether the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array (that is, the maximum number of failed disks that the RAID array can recover).

其中，RAID阵列级别有它本身的内部数据冗余，例如RAID5阵列提供一个P校验分块的数据冗余，RAID5阵列利用该P校验分块(条带中P校验分块，由数据分块异或运算得到，数据分块为主机下发的有效数据)的单冗余恢复一个故障盘的数据，即RAID5阵列的容错量为1，也即在故障盘个数为1时，可以采用内部数据冗余进行故障盘数据恢复；RAID6阵列提供P校验分块和Q校验分块(RAID6阵列中Q校验分块和P校验分块配合运用，可以恢复RAID6阵列中的两个故障盘)的数据双冗余，RAID6阵列利用P校验和Q校验的双冗余恢复两个故障盘的数据，即RAID6阵列的容错量为2，也即在故障盘的个数在不超过2时，可以采用内部数据冗余进行故障盘数据恢复。需要说明的是，本申请提及的RAID阵列具体可以为RAID5阵列或者RAID6阵列，当然，也可以为其他的RAID阵列，本申请对此不做限定。Among them, the RAID array level has its own internal data redundancy. For example, the RAID5 array provides a P check block data redundancy. The RAID5 array uses the single redundancy of the P check block (the P check block in the stripe is obtained by the XOR operation of the data block, and the data block is the valid data sent by the host) to restore the data of a faulty disk, that is, the fault tolerance of the RAID5 array is 1, that is, when the number of faulty disks is 1, the internal data redundancy can be used to restore the data of the faulty disk; the RAID6 array provides P check blocks and Q check blocks (the Q check blocks and P check blocks in the RAID6 array are used together to restore the two faulty disks in the RAID6 array) data dual redundancy. The RAID6 array uses the dual redundancy of P check and Q check to restore the data of two faulty disks, that is, the fault tolerance of the RAID6 array is 2, that is, when the number of faulty disks does not exceed 2, the internal data redundancy can be used to restore the data of the faulty disk. It should be noted that the RAID array mentioned in this application can be a RAID5 array or a RAID6 array, of course, it can also be other RAID arrays, and this application does not limit this.

若RAID阵列的故障盘个数超过RAID阵列的容错量，则无法采用RAID阵列内部的数据冗余机制进行故障盘数据的恢复，这是因为同时发生故障的故障盘个数超过了其容错量。在这种情况下，主控节点需要利用辅助节点的镜像数据来重构故障盘中丢失的数据。若RAID阵列的故障盘个数未超过RAID阵列的容错量，则可以返回步骤S12，即继续监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量。If the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, the data redundancy mechanism inside the RAID array cannot be used to recover the data of the failed disks, because the number of failed disks that fail at the same time exceeds its fault tolerance. In this case, the master control node needs to use the mirror data of the auxiliary node to reconstruct the lost data in the failed disk. If the number of failed disks in the RAID array does not exceed the fault tolerance of the RAID array, it can return to step S12, that is, continue to monitor whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array.

S13：将故障盘中丢失数据的信息发送至辅助节点，由辅助节点从镜像数据中获取信息对应的数据并进行发送。S13: Send the information of the lost data in the failed disk to the auxiliary node, and the auxiliary node obtains the data corresponding to the information from the mirror data and sends it.

在步骤S12中，若RAID阵列的故障盘个数超过RAID阵列的容错量，则可以采用辅助节点提供的外部冗余进行故障盘数据的恢复。具体地，主控节点可以获取故障盘中丢失数据的信息，以便于基于丢失数据的信息来从辅助节点中检索主控节点丢失的数据。例如丢失数据的信息可以为故障盘中哪些分块(strip)的数据发生丢失等，其中，分块是磁盘上的物理存储介质的分区，用于RAID阵列进行数据重构的粒度大小。In step S12, if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, the external redundancy provided by the auxiliary node can be used to recover the data of the failed disk. Specifically, the master node can obtain information about the lost data in the failed disk, so as to retrieve the lost data of the master node from the auxiliary node based on the information about the lost data. For example, the information about the lost data can be which strips of data in the failed disk are lost, etc., wherein a strip is a partition of the physical storage medium on the disk, and is used for the granularity of data reconstruction of the RAID array.

主控节点在获取故障盘中丢失数据的信息后，可以将故障盘中丢失数据的信息发送至辅助节点。辅助节点在接收到丢失数据的信息后，可以从自身所存储的镜像数据中获取丢失数据的信息对应的数据，并将获取到的丢失数据的信息发送至主控节点，以便于主控节点根据这些数据进行数据重构恢复。After obtaining the information about the lost data in the failed disk, the master node can send the information about the lost data in the failed disk to the auxiliary node. After receiving the information about the lost data, the auxiliary node can obtain the data corresponding to the information about the lost data from the mirror data stored in itself, and send the obtained information about the lost data to the master node, so that the master node can reconstruct and recover the data based on the data.

另外，辅助节点只检索丢失数据的信息对应的数据并将其数据传输到主控节点进行数据重构可以最小化需要检索和传输的数据量，也最小化了数据重构的时间窗口。In addition, the auxiliary node only retrieves the data corresponding to the information of the lost data and transmits the data to the master node for data reconstruction, which can minimize the amount of data that needs to be retrieved and transmitted, and also minimize the time window for data reconstruction.

S14：接收信息对应的数据，将信息对应的数据写入故障盘对应的热备盘的相应分区中。S14: receiving data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of the hot spare disk corresponding to the failed disk.

在步骤S13的基础上，主控节点可以接收辅助节点发送的与丢失数据的信息对应的数据，并可以将与丢失数据的信息对应的数据写入故障盘对应的热备盘的相应分区中。其中，热备盘又称为spare，备用存储驱动器。Based on step S13, the master node may receive data corresponding to the information about lost data sent by the auxiliary node. And the data corresponding to the information of the lost data can be written into the corresponding partition of the hot spare disk corresponding to the failed disk. Among them, the hot spare disk is also called spare, a spare storage drive.

S15：利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列。S15: Use the hot spare disk to replace the corresponding failed disk to re-form a RAID array with normal disks.

待故障盘中所有丢失的数据都恢复完毕并写入对应热备盘的相应分区中后，则可以利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列，从而完成故障盘数据的恢复。After all lost data in the failed disk is recovered and written to the corresponding partition of the corresponding hot spare disk, the hot spare disk can be used to replace the corresponding failed disk to re-form the RAID array with the normal disk, thereby completing the recovery of the failed disk data.

例如如图3和图4所示，其中，图3示出了本申请实施例提供的主控节点的结构示意图，图4示出了本申请实施例提供的故障盘个数超过容错量时重构过程的示意图，在图3和图4中，主控节点具有由五块硬盘组成的RAID5阵列，图4给出了图3主控节点的两个磁盘同时发生故障时的数据重构过程。在图3中，磁盘1和磁盘2同时发生故障(超过了RAID5的容错量)，因此，strip1-2、5、9、13-14和17-18的数据丢失，并超过了RAID5阵列内部机制进行数据恢复，因此，则采用本申请中的上述方案进行数据恢复，得到与磁盘1对应的热备盘1，与磁盘2对应的热备盘2，并利用热备盘1替换对应的磁盘1，利用热备盘2替换对应的磁盘2，以与正常磁盘(磁盘3、磁盘4、磁盘5)重新组成RAID阵列。For example, as shown in Figures 3 and 4, Figure 3 shows a schematic diagram of the structure of the master node provided by the embodiment of the present application, and Figure 4 shows a schematic diagram of the reconstruction process when the number of failed disks exceeds the fault tolerance provided by the embodiment of the present application. In Figures 3 and 4, the master node has a RAID5 array composed of five hard disks, and Figure 4 shows the data reconstruction process when two disks of the master node of Figure 3 fail at the same time. In Figure 3, disk 1 and disk 2 fail at the same time (exceeding the fault tolerance of RAID5), so the data of strips 1-2, 5, 9, 13-14 and 17-18 are lost, and the internal mechanism of the RAID5 array is exceeded for data recovery. Therefore, the above scheme in the present application is used for data recovery, and a hot spare disk 1 corresponding to disk 1 and a hot spare disk 2 corresponding to disk 2 are obtained, and the corresponding disk 1 is replaced by the hot spare disk 1, and the corresponding disk 2 is replaced by the hot spare disk 2, so as to re-form the RAID array with the normal disks (disk 3, disk 4, disk 5).

通过上述过程可知，本申请针对存储系统出现概率较高的数据丢失事件，在集群多控存储系统下，在集群中的辅助节点中增加镜像数据，当故障盘的个数超过RAID阵列所能恢复的最大故障盘数量后，则利用辅助节点的镜像数据来重构恢复出故障盘中的数据，具体即主控节点使用正常磁盘的数据和从辅助节点传输来的数据(数据本需要写入故障盘，但磁盘发生故障导致没有写入成功，所以，将该数据对应的辅助节点的镜像数据传输到主控节点)进行离线重构，以有效提高存储系统的数据可靠性。Through the above process, it can be known that, in response to data loss events with a high probability in the storage system, the present application adds mirror data in the auxiliary node in the cluster under the cluster multi-controller storage system. When the number of failed disks exceeds the maximum number of failed disks that can be recovered by the RAID array, the mirror data of the auxiliary node is used to reconstruct and restore the data in the failed disk. Specifically, the main control node uses the data of the normal disk and the data transmitted from the auxiliary node (the data originally needs to be written to the failed disk, but the disk fails and the writing is not successful, so the mirror data of the auxiliary node corresponding to the data is transmitted to the main control node) for offline reconstruction, so as to effectively improve the data reliability of the storage system.

本申请公开的上述技术方案，在多控节点中的辅助节点中备份主控节点中的数据，以将辅助节点中备份的数据作为主控节点的镜像数据。当主控节点中RAID阵列的故障盘个数超过RAID阵列的容错量之后，则将故障盘中丢失数据的信息发送至辅助节点，由辅助节点从镜像数据中获取丢失数据的信息对应的数据并将该数据发送出去。在接收到辅助节点发送的数据后，将该数据写入故障盘对应的热备盘的相应分区中，然后，利用热备盘替换故障盘而与正常盘重新组成RAID阵列。由此可知，本申请通过在辅助节点中增加镜像数据，并在故障盘数量超过容错量之后利用辅助节点中的镜像数据进行数据恢复，从而可以提高存储系统的数据可靠性。The above technical solution disclosed in the present application backs up the data in the main control node in the auxiliary node in the multi-control node, so that the backed-up data in the auxiliary node is used as the mirror data of the main control node. When the number of failed disks in the RAID array in the main control node exceeds the fault tolerance of the RAID array, the information of the lost data in the failed disk is sent to the auxiliary node, and the auxiliary node obtains the data corresponding to the information of the lost data from the mirror data and sends the data out. After receiving the data sent by the auxiliary node, the data is written to the corresponding partition of the hot spare disk corresponding to the failed disk, and then the hot spare disk is used to replace the failed disk and re-compose the RAID array with the normal disk. It can be seen from this that the present application can improve the data reliability of the storage system by adding mirror data in the auxiliary node and using the mirror data in the auxiliary node for data recovery after the number of failed disks exceeds the fault tolerance.

本申请一些实施例中，在多控节点中的辅助节点中备份主控节点中的数据，可以包括：In some embodiments of the present application, backing up data in a master control node in a secondary node among multiple control nodes may include:

将故障盘中丢失数据的信息发送至辅助节点，可以包括：Send information about lost data in the failed disk to the secondary node, which may include:

辅助节点从镜像数据中获取信息对应的数据并进行发送，可以包括：The auxiliary node obtains the data corresponding to the information from the mirror data and sends it, which may include:

在本申请中，考虑到主机是通过逻辑卷(volume)而不是分块来进行I/O访问，主控节点对I/O请求对应的数据进行存储时会先存储在逻辑卷中，因此，当在辅助节点中备份主控节点中的数据时，可以在辅助节点中备份主控节点中的逻辑卷，以得到对应的镜像逻辑卷。其中，主控节点中的逻辑卷以及镜像逻辑卷中均包含有多个块(block，是主机I/O访问数据的粒度大小，是一个逻辑单元，block和strip存在某种映射关系，通过这种映射关系可以定位到物理磁盘中的具体分块，多个block组成volume)，块与磁盘中的分块存在映射关系，In this application, considering that the host performs I/O access through logical volumes rather than blocks, the master control node When storing the data corresponding to the I/O request, it will be stored in the logical volume first. Therefore, when backing up the data in the master node in the auxiliary node, the logical volume in the master node can be backed up in the auxiliary node to obtain the corresponding mirror logical volume. Among them, the logical volume in the master node and the mirror logical volume both contain multiple blocks (block is the granularity of the host I/O access data, which is a logical unit. There is a certain mapping relationship between block and strip. Through this mapping relationship, it can be located to the specific block in the physical disk. Multiple blocks constitute a volume). There is a mapping relationship between the block and the block in the disk.

且主控节点及辅助节点中均存储有映射关系，也即主控节点与辅助节点中都维护有块和分块之间的映射关系。需要说明的是，块与分块可以是一个分块对应多个块，这样一个分块就会被等分切分成几个块，也可以是一个分块对应一个块。The mapping relationship is stored in both the master node and the auxiliary node, that is, the mapping relationship between blocks and sub-blocks is maintained in both the master node and the auxiliary node. It should be noted that blocks and sub-blocks can be one sub-block corresponding to multiple blocks, so that a sub-block will be equally divided into several blocks, or one sub-block corresponding to one block.

具体可以参见图5和图6，图5示出了本申请实施例提供的主控节点的另一种结构示意图，图6示出了本申请实施例提供的辅助节点示意图，图5给出了配置RAID5阵列的主控节点框图，图6中辅助节点中的镜像volume A对应图5中主控节点中的volume A，图6中辅助节点中的镜像volume B对应图5中主控节点中的volume B。主控节点作为主服务器处理主机的I/O请求，I/O请求的数据分布在由五块硬盘组成的RAID5阵列中。图5中，主控节点具有由五块硬盘组成的RAID5阵列，strip是用于RAID阵列进行数据重构的粒度大小，block是用于主机I/O访问数据的粒度大小，该block也是用于在主控节点和辅助节点之间互为数据镜像的粒度大小。从另一个角度来看，strip是磁盘的物理数据单元，block是逻辑数据单元。一个volume由多个block组成。主控节点和辅助节点都维护block和strip之间的映射关系，其中，该映射关系可以是一个strip对应多个block，这样一个strip就会被等分切分成几个block，也可以是一个strip对应一个block。为了更容易理解，图5及图6中设计一个strip对应一个block，如图5中block 0-99对应strip 1A。图5左侧提供了按volume组织的映射关系的示例，volume A中的数据分布在五个磁盘上，如包含字母“A”的矩形框所示，volume B中的数据同样分布在五个磁盘上，如包含字母“B”的矩形框所示。具体来说，volume A中的数据分布在strip1-2、4-5、9、11-12、16、19-20中。volume B中的数据分布在在strip3、6-8、10、13-15、17-18中。For details, please refer to Figures 5 and 6. Figure 5 shows another structural schematic diagram of the master control node provided in the embodiment of the present application, and Figure 6 shows a schematic diagram of the auxiliary node provided in the embodiment of the present application. Figure 5 shows a block diagram of the master control node configured with a RAID5 array. The mirror volume A in the auxiliary node in Figure 6 corresponds to the volume A in the master control node in Figure 5, and the mirror volume B in the auxiliary node in Figure 6 corresponds to the volume B in the master control node in Figure 5. The master control node processes the I/O request of the host as the main server, and the data of the I/O request is distributed in a RAID5 array composed of five hard disks. In Figure 5, the master control node has a RAID5 array composed of five hard disks. Strip is the granularity size used for data reconstruction of the RAID array, and block is the granularity size used for host I/O access to data. The block is also the granularity size used for data mirroring between the master control node and the auxiliary node. From another perspective, a strip is a physical data unit of a disk, and a block is a logical data unit. A volume consists of multiple blocks. Both the master node and the slave node maintain a mapping relationship between blocks and strips, where the mapping relationship can be one strip corresponding to multiple blocks, so that a strip is equally divided into several blocks, or one strip corresponding to one block. For easier understanding, in Figures 5 and 6, one strip is designed to correspond to one block, such as block 0-99 corresponds to strip 1A in Figure 5. The left side of Figure 5 provides an example of a mapping relationship organized by volume. The data in volume A is distributed on five disks, as shown in the rectangular box containing the letter "A", and the data in volume B is also distributed on five disks, as shown in the rectangular box containing the letter "B". Specifically, the data in volume A is distributed in strips 1-2, 4-5, 9, 11-12, 16, and 19-20. The data in volume B is distributed in strips 3, 6-8, 10, 13-15, and 17-18.

在上述基础上，主控节点在将故障盘中丢失数据的信息发送至辅助节点时，具体可以先根据故障盘包含的分块以及存储的块与分块的映射关系来形成主控节点中逻辑卷对应的数据丢失元数据。其中，该数据丢失元数据也是元数据(管理条带的数据结构，可以是位图也可以是哈希表等等)，只是数据丢失元数据不仅管理条带而且还标识了故障盘中丢失数据的数据单元(该数据单元是块，也即数据丢失元数据是针对逻辑卷中的块来说的)。具体地，数据丢失元数据中包含主控节点中逻辑卷包含的块是否丢失的信息(逻辑卷包含的所有块是否丢失的信息均包含在数据丢失元数据中)。需要说明的是，本申请所涉及的数据丢失元数据是用于实现指定逻辑功能的一种全局变量，具体可以由C语言或者C++实现。On the basis of the above, when the master control node sends the information about lost data in the failed disk to the auxiliary node, it can first form the data loss metadata corresponding to the logical volume in the master control node according to the blocks contained in the failed disk and the mapping relationship between the stored blocks and the blocks. Among them, the data loss metadata is also metadata (a data structure for managing stripes, which can be a bitmap or a hash table, etc.), but the data loss metadata not only manages stripes but also identifies the data unit of lost data in the failed disk (the data unit is a block, that is, the data loss metadata is for the blocks in the logical volume). Specifically, the data loss metadata contains information on whether the blocks contained in the logical volume in the master control node are lost (the information on whether all blocks contained in the logical volume are lost is included in the data loss metadata). It should be noted that the data loss metadata involved in this application is a global variable used to implement a specified logical function, which can be implemented in C language or C++.

然后，主控节点可以将所形成的数据丢失元数据发送至辅助节点。相应地，辅助节点在接收到数据丢失元数据之后，可以对数据丢失元数据进行扫描，根据存储的块与分块的映射关系以及接收到的数据丢失元数据来获取丢失数据的分块编号，例如图6中辅助节点最终求得的丢失数据的分块编号为1-2、5、9、13-14和17-18，然后，辅助节点可以将镜像逻辑卷中与分块编号对应的数据发送至主控节点，以图6为例，则将分块编号为1-2、5、9、13-14和17-18对应的数据发送至主控节点。Then, the master node can send the formed data loss metadata to the auxiliary node. Accordingly, after receiving the data loss metadata, the auxiliary node can scan the data loss metadata, and obtain the block number of the lost data according to the mapping relationship between the stored blocks and the blocks and the received data loss metadata. For example, in FIG6 , the block numbers of the lost data finally obtained by the auxiliary node are 1-2, 5, 9, 13-14 and 17-18. Then, the auxiliary node can send the data corresponding to the block numbers in the mirrored logical volume to the master node. Taking FIG6 as an example, the data corresponding to the block numbers 1-2, 5, 9, 13-14 and 17-18 are sent to the master node.

通过上述不仅实现在辅助节点中形成镜像逻辑卷，以使得在主控节点和辅助节点之间互为数据镜像的为块粒度大小，从而更好地面向主机和用户，而且主控节点和辅助节点之间传输的是数据丢失元数据，并使得辅助节点基于数据丢失元数据只检索丢失数据分块(具体指的是故障盘中的分块对应的数据)并将其传输到主控节点以进行数据重构，这最小化了需要检索和传输的数据量，并且也最小化了数据重构的时间窗口。Through the above, not only the mirror logical volume is formed in the auxiliary node, but also the mutual The data mirroring is block-sized, which is better for the host and the user. What is transmitted between the master node and the auxiliary node is the data loss metadata, and the auxiliary node only retrieves the lost data blocks (specifically the data corresponding to the blocks in the failed disk) based on the data loss metadata and transmits it to the master node for data reconstruction. This minimizes the amount of data that needs to be retrieved and transmitted, and also minimizes the time window for data reconstruction.

本申请一些实施例中，根据故障盘包含的分块及映射关系，形成逻辑卷对应的数据丢失元数据，可以包括：In some embodiments of the present application, data loss metadata corresponding to a logical volume is formed according to the blocks and mapping relationships contained in the failed disk, which may include:

在本申请中，具体可以根据故障盘包含的分块及映射关系，形成逻辑卷对应且以位图为数据组织方式的数据丢失元数据，且位图中的每个bit位表示逻辑卷包含的相应块是否丢失，例如设计bit位为0时表示相应的块是丢失的块，bit位为1时表示相应的块是没有丢失的块(当然，也可以根据需要而采用其他设计)。In the present application, data loss metadata corresponding to the logical volume and organized in a bitmap format can be formed based on the blocks and mapping relationships contained in the failed disk, and each bit in the bitmap indicates whether the corresponding block contained in the logical volume is lost. For example, when the bit is designed to be 0, it indicates that the corresponding block is a lost block, and when the bit is 1, it indicates that the corresponding block is a non-lost block (of course, other designs can also be adopted as needed).

具体地，以图6为例，磁盘1和磁盘2都发生故障，所以，strip1-2、5、9、13-14和17-18是数据丢失的分块，这些分块一一对应的块的bit位被设置为0。根据strip与block的一对一关系，主控节点中的volume A的数据丢失元数据和volume B中的数据丢失元数据表示为：第一行位图(表示volume A)：{0(分块1)、0(分块2)、1(分块4)、0(分块5)、0(分块9)、1(分块11)、1(分块12)、1(分块16)、1(分块19)、1(分块20)}，第二行位图(表示volume B)：{1(分块3)、1(分块6)、1(分块7)、1(分块8)、1(分块10)、0(分块13)、0(分块14)、1(分块15)、0(分块17)、0(分块18)}。需要指出的是，本申请设计的位图元数据组织方式是两维位图元数据组织方式，每一行代表一个逻辑卷中的各个块的bit位，不同逻辑卷中的块的bit位在每一列上，多个逻辑卷就有多行，这样形成二维位图元数据组织方式，也即数据丢失元数据为二维位图元数据组织方式。综上，对于图5中所示出的主控节点的逻辑卷，则可以得到最终的数据丢失元数据(也即位图元数据)第一行为：0 0 1 0 0 1 1 1 1 1，第二行为：1 1 1 1 1 0 0 1 0 0。当然，也可以根据实际需要而对数据丢失元数据中位图元数据组织方式进行调整，本申请对此不做限定。Specifically, taking FIG. 6 as an example, both disk 1 and disk 2 fail, so strips 1-2, 5, 9, 13-14, and 17-18 are blocks where data is lost, and the bits of the blocks corresponding to each other are set to 0. According to the one-to-one relationship between strips and blocks, the data loss metadata of volume A in the master node and the data loss metadata in volume B are represented as follows: the first row of bitmap (representing volume A): {0(block 1), 0(block 2), 1(block 4), 0(block 5), 0(block 9), 1(block 11), 1(block 12), 1(block 16), 1(block 19), 1(block 20)}, the second row of bitmap (representing volume B): {1(block 3), 1(block 6), 1(block 7), 1(block 8), 1(block 10), 0(block 13), 0(block 14), 1(block 15), 0(block 17), 0(block 18)}. It should be pointed out that the bitmap metadata organization method designed by the present application is a two-dimensional bitmap metadata organization method, in which each row represents the bit position of each block in a logical volume, and the bit positions of the blocks in different logical volumes are on each column. There are multiple rows for multiple logical volumes, thus forming a two-dimensional bitmap metadata organization method, that is, the data loss metadata is a two-dimensional bitmap metadata organization method. In summary, for the logical volume of the master control node shown in FIG5 , the final data loss metadata (that is, bitmap metadata) can be obtained. The first line is: 0 0 1 0 0 1 1 1 1 1, and the second line is: 1 1 1 1 1 1 0 0 1 0 0. Of course, the bitmap metadata organization method in the data loss metadata can also be adjusted according to actual needs, and the present application does not limit this.

通过以位图为数据组织方式的数据丢失元数据不仅简单，明了，而且便于辅助节点基于此快速进行丢失数据的分块编号的确定。The data loss metadata using bitmap as the data organization method is not only simple and clear, but also convenient for the auxiliary node to quickly determine the block number of the lost data based on it.

当然，数据丢失元数据也可以是哈希表等其他数据组织方式，只要能够表示逻辑卷包含的块是否丢失的信息以及能够使得辅助节点基于此进行丢失数据的分块编号的确定即可。另外，需要说明的是，本申请设计的二维位图元数据组织方式不仅可以应用在本业务场景下，也可以应用在其他业务场景下。Of course, the data loss metadata can also be other data organization methods such as hash tables, as long as it can indicate whether the blocks contained in the logical volume are lost and enable the auxiliary node to determine the block number of the lost data based on this information. In addition, it should be noted that the two-dimensional bitmap metadata organization method designed in this application can be applied not only in this business scenario, but also in other business scenarios.

本申请一些实施例中，在利用热备盘替换对应的故障盘之前，还可以包括：In some embodiments of the present application, before using the hot spare disk to replace the corresponding failed disk, the following steps may also be included:

在本申请中，在利用热备盘替换对应的故障盘之前，主控节点还可以判断RAID阵列中的各条带是否需要重新计算校验分块，其中，主控节点可以对各条带依次进行判断。需要说明的是，条带是阵列的不同磁盘上的位置相关的块的集合，是组织不同磁盘上分块的单位。具体可以参见图5，图5中用虚线矩形框101表示了条带。以条带为单位，利用条带内的分块进行异或运算，以此重构数据和计算P校验分块，因此，在RAID阵列中，以条带为单元来保持RAID阵列的冗余性。如图5所示，条带101由strip1A、strip2A、strip3B、strip4A和P校验分块(即Parity1)组成。strip1A-4A可以是主机下发的数据块，而P校验分块是条带101的strip1A-4A异或运算求得的冗余分块，存储的是冗余数据。In the present application, before using the hot spare disk to replace the corresponding failed disk, the master control node can also determine whether each stripe in the RAID array needs to recalculate the check blocks, wherein the master control node can make judgments on each stripe in turn. It should be noted that a stripe is a collection of position-related blocks on different disks of the array, and is a unit for organizing blocks on different disks. For details, see FIG. 5 , in which a dotted rectangular box 101 represents a stripe. Taking the stripe as a unit, using the blocks in the stripe The blocks are XORed to reconstruct the data and calculate the P check block. Therefore, in the RAID array, the redundancy of the RAID array is maintained with stripes as units. As shown in FIG5 , stripe 101 is composed of strip1A, strip2A, strip3B, strip4A and P check block (i.e., Parity1). Strip1A-4A can be the data blocks sent by the host, and the P check block is the redundant block obtained by the XOR operation of strip1A-4A of stripe 101, which stores redundant data.

主控节点在判断RAID阵列中的条带是否需要重新计算校验分块时，具体可以是判断该条带中的校验分块是否位于故障盘上，或者说，是判断该条带中是否已经缺失校验分块。如果条带中的校验分块位于故障盘上，或者说，该条带中已经缺失校验分块，则确定该条带需要重新计算校验分块；如果条带中的校验分块不位于故障盘上，或者说，该条带中不缺失校验分块，则确定该条带不需要重新计算校验分块。When the master control node determines whether the stripe in the RAID array needs to recalculate the check blocks, it can specifically determine whether the check blocks in the stripe are located on the faulty disk, or in other words, whether the check blocks are missing in the stripe. If the check blocks in the stripe are located on the faulty disk, or in other words, the check blocks are missing in the stripe, it is determined that the check blocks in the stripe need to be recalculated; if the check blocks in the stripe are not located on the faulty disk, or in other words, the check blocks are not missing in the stripe, it is determined that the check blocks in the stripe do not need to be recalculated.

如果确定条带需要重新计算校验分块，则根据条带中的各分块(这里提及的各分块具体指的是该条带中正常磁盘中的分块、该条带中故障盘对应的热备盘中的分块)计算校验分块，具体是利用条带中的数据分块进行异或运算，以计算得到校验分块，例如图4和图6中的条带102可以基于strip6A、strip7B、strip8B以及恢复的strip5A(也即故障盘2中对应的热备盘2中的strip5A)进行异或运算求得P校验分块2(Parity2)。在计算得到该条带的校验分块之后，可以将计算得到的校验分块写入故障盘对应的热备盘的相应分区中，对应图6，则是将求得的Parity2写入热备盘1的相应分区中。通过前述过程即可实现对相应条带的完全恢复(也即不仅恢复了丢失的分块数据，而且还恢复了丢失的校验分块)，以提高数据恢复的可靠性，并提高RAID阵列重组的可靠性。If it is determined that the stripe needs to recalculate the parity block, the parity block is calculated based on each block in the stripe (the blocks mentioned here specifically refer to the blocks in the normal disks in the stripe and the blocks in the hot spare disk corresponding to the faulty disk in the stripe), specifically, the data blocks in the stripe are used to perform an XOR operation to calculate the parity block. For example, the stripe 102 in Figures 4 and 6 can be based on strip6A, strip7B, strip8B and the restored strip5A (that is, strip5A in the hot spare disk 2 corresponding to the faulty disk 2) to perform an XOR operation to obtain P parity block 2 (Parity2). After the parity block of the stripe is calculated, the calculated parity block can be written into the corresponding partition of the hot spare disk corresponding to the faulty disk. Corresponding to Figure 6, the obtained Parity2 is written into the corresponding partition of the hot spare disk 1. Through the above process, the corresponding stripe can be completely recovered (that is, not only the lost block data is recovered, but also the lost parity block is recovered), so as to improve the reliability of data recovery and the reliability of RAID array reorganization.

需要说明的是，图2示出了一种进行校验分块判断和数据写入热备盘相应分区的方式，也即主控节点在将从辅助节点接收到的分块的数据写入热备盘相应分区后，即判断RAID阵列中相应条带(具体即为分块的数据写入所对应的条带)是否需要重新计算校验分块，在需要重新计算时则进行校验分块的计算，并将重新计算求得的校验分块写入热备盘的对应分区，以使得相应条带被完全恢复；若确定不需要重新计算校验分块，或者将重新计算求得的校验分块写入热备盘的对应分区后，则判断故障盘的所有数据是否都已经恢复重构完毕，如果故障盘的所有数据都已经恢复重构完毕，则所有丢失的数据都恢复完毕并写入热备盘的对应分区后，用热备盘替换故障盘，并和其他没有故障的磁盘重新组成RAID阵列，如果故障盘的所有数据未恢复重构完毕，则移动到下一个条带继续将接收到的分块数据写入热备盘的对应分区上，以执行下一个条带中的分块重构，以此循环直到所有丢失的数据都恢复完毕。当然，也可以在将所有丢失数据的信息对应的数据均写入到故障盘对应的热备盘的相应分区中之后，再逐条带判断是否需要重新计算校验分块。It should be noted that FIG. 2 shows a method for performing check block judgment and writing data to the corresponding partition of the hot spare disk, that is, after the master control node writes the block data received from the auxiliary node to the corresponding partition of the hot spare disk, it judges whether the corresponding stripe in the RAID array (specifically, the stripe corresponding to the block data writing) needs to recalculate the check block. If recalculation is required, the check block is calculated, and the recalculated check block is written to the corresponding partition of the hot spare disk, so that the corresponding stripe is completely restored; if it is determined that the check block does not need to be recalculated, or the recalculated check block is written After the verification block is written to the corresponding partition of the hot spare disk, it is determined whether all the data of the failed disk has been restored and reconstructed. If all the data of the failed disk has been restored and reconstructed, all the lost data is restored and written to the corresponding partition of the hot spare disk, and the failed disk is replaced with the hot spare disk, and the RAID array is reconstructed with other non-faulty disks. If all the data of the failed disk has not been restored and reconstructed, it is moved to the next stripe to continue writing the received block data to the corresponding partition of the hot spare disk to perform the block reconstruction in the next stripe, and this cycle is repeated until all the lost data is restored. Of course, it is also possible to write the data corresponding to the information of all the lost data to the corresponding partition of the hot spare disk corresponding to the failed disk, and then determine whether the verification block needs to be recalculated stripe by stripe.

本申请一些实施例中，接收信息对应的数据，将信息对应的数据写入故障盘对应的热备盘的相应分区中，可以包括：In some embodiments of the present application, receiving data corresponding to information and writing the data corresponding to the information into a corresponding partition of a hot spare disk corresponding to the failed disk may include:

在本申请中，主控节点在接收辅助节点发送的信息对应的数据时，可以逐分块接收信息对应的数据，也即按照串行的方式接收故障盘中各分块对应的数据，以实现有序地进行数据接收。并且，在将信息对应的数据写入故障盘对应的热备盘的相应分区中，也可以逐分块将信息对应的数据写入故障盘对应的热备盘的相应分区中，也即按照串行的方式将故障盘中各分块对应的数据依次写入故障盘对应的热备盘的相应分区中，以实现有序地进行数据写入和恢复。In the present application, when the master control node receives the data corresponding to the information sent by the auxiliary node, it can receive the data corresponding to the information block by block, that is, receive the data corresponding to each block in the faulty disk in a serial manner, so as to achieve orderly data reception. In addition, when writing the data corresponding to the information into the corresponding partition of the hot spare disk corresponding to the faulty disk, it can also write the data corresponding to the information block by block into the corresponding partition of the hot spare disk corresponding to the faulty disk, that is, write the data corresponding to each block in the faulty disk into the corresponding partition of the hot spare disk corresponding to the faulty disk in a serial manner, so as to achieve orderly data writing and recover.

需要说明的是，主控节点对于分块对应数据的接收和分块对应数据的写入可以同时进行，也即在接收一个分块对应的数据的同时，可以将之前接收的分块对应的数据写入对应热备盘的相应分区中。当然，主控节点也可以在接收一个分块对应的数据，并在将该数据写入对应热备盘的相应分区中之后，再接收另一个分块对应的数据。本申请对此不做限定。It should be noted that the master control node can receive and write data corresponding to the blocks at the same time, that is, while receiving data corresponding to a block, the previously received data corresponding to the block can be written into the corresponding partition of the corresponding hot spare disk. Of course, the master control node can also receive data corresponding to a block, and after writing the data into the corresponding partition of the corresponding hot spare disk, receive data corresponding to another block. This application does not limit this.

本申请一些实施例中，若RAID阵列的故障盘个数超过RAID阵列的容错量，则还可以包括：In some embodiments of the present application, if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, the following may also be included:

在利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列之后，还可以包括：After the hot spare disk is used to replace the corresponding failed disk to re-form the RAID array with the normal disk, the following steps may also be performed:

在本申请中，若RAID阵列的故障盘个数超过RAID阵列的容错量，则可以将主机的I/O请求重定向到辅助节点，也即由辅助节点进行主机的I/O请求的接收和处理，并由辅助节点对主机的I/O请求对应的数据进行存储，以保证主机I/O请求能够得到正常处理。且，将主机的I/O请求重定向到主控节点可以使得辅助节点作为主控节点的数据镜像而有效地发挥作用。另外，在将主机的I/O请求重定向的主控节点的同时，主控节点可以进行下线(即不再进行主机I/O请求的接收和处理)，在此情况下主控节点可以离线进行数据的重构和恢复。In the present application, if the number of failed disks of the RAID array exceeds the fault tolerance of the RAID array, the I/O request of the host can be redirected to the auxiliary node, that is, the auxiliary node receives and processes the I/O request of the host, and the auxiliary node stores the data corresponding to the I/O request of the host to ensure that the I/O request of the host can be processed normally. Moreover, redirecting the I/O request of the host to the master control node can make the auxiliary node effectively play a role as the data mirror of the master control node. In addition, while redirecting the I/O request of the host to the master control node, the master control node can go offline (that is, no longer receive and process the I/O request of the host), in which case the master control node can reconstruct and recover the data offline.

另外，在利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列之后，主控节点可以将主机的I/O请求重定向到主控节点，也即继续有主控节点接收、处理执行来自主机的I/O请求，以使得存储系统恢复正常状态。In addition, after replacing the corresponding failed disk with the hot spare disk to re-form the RAID array with the normal disk, the master control node can redirect the host's I/O requests to the master control node, that is, the master control node continues to receive, process and execute I/O requests from the host, so that the storage system returns to normal.

本申请一些实施例中，在将主机的I/O请求重定向到辅助节点时，还可以包括：In some embodiments of the present application, when redirecting the I/O request of the host to the auxiliary node, the following steps may also be included:

在本申请中，在将主机的I/O请求重定向到辅助节点时，还可以将主机新发送的数据存储在主控节点中(也即主机新发送的与I/O请求对应的数据不仅存储在辅助节点中，还存储在主控节点中)，以将主控节点中存储的主机新发送的数据作为辅助节点的镜像数据，从而若辅助节点进行数据落盘并出现故障盘数量超过容错量后，可以采用主控节点中的镜像数据对辅助节点中故障盘的数据进行恢复，也即实现与主控节点进行故障盘数据恢复类似的过程，以提高辅助节点中数据存储的可靠性。In the present application, when redirecting the I/O request of the host to the auxiliary node, the data newly sent by the host can also be stored in the main control node (that is, the data newly sent by the host corresponding to the I/O request is stored not only in the auxiliary node, but also in the main control node), so that the data newly sent by the host stored in the main control node can be used as the mirror data of the auxiliary node, so that if the auxiliary node writes data to the disk and the number of failed disks exceeds the fault tolerance, the mirror data in the main control node can be used to recover the data of the failed disk in the auxiliary node, that is, to implement a process similar to the failed disk data recovery of the main control node, so as to improve the reliability of data storage in the auxiliary node.

本申请一些实施例中，若RAID阵列的故障盘个数未超过RAID阵列的容错量，则还可以包括：In some embodiments of the present application, if the number of failed disks in the RAID array does not exceed the fault tolerance of the RAID array, the method may further include:

当恢复故障盘丢失的所有数据后，利用热备盘替换故障盘，以与正常磁盘重新组成RAID阵列。After all the data lost on the failed disk is recovered, the hot spare disk is used to replace the failed disk to re-form the RAID array with normal disks.

在本申请中，在步骤S12中，若监控到主控节点中RAID阵列的故障盘个数未超过RAID阵列的容错量，则可以采用RAID阵列内部的数据冗余机制进行数据恢复。具体地，利用RAID阵列中各条带包含的正常磁盘的分块进行异或计算，以得到故障盘对应的热备盘的分块。然后，当恢复故障盘丢失的所有数据后，也即RAID阵列中所有条带的故障盘的数据均恢复之后，则利用故障盘对应的热备盘替换故障盘，以与正常磁盘重新组成RAID阵列，从而在故障盘个数未超过容错量时也可以进行故障盘数据的恢复，以提高存储系统数据可靠性。In the present application, in step S12, if the number of failed disks in the RAID array in the master control node is monitored to be less than the fault tolerance of the RAID array, the data redundancy mechanism within the RAID array can be used to recover data. Specifically, the blocks of the normal disks contained in each stripe in the RAID array are used to perform XOR calculations to obtain the blocks of the hot spare disk corresponding to the failed disk. Then, after all the data lost on the failed disk is recovered, that is, the number of failed disks in all stripes in the RAID array is restored. After the data is recovered, the hot spare disk corresponding to the failed disk is used to replace the failed disk to re-form the RAID array with the normal disk. In this way, the data of the failed disk can be recovered when the number of failed disks does not exceed the fault tolerance, thereby improving the data reliability of the storage system.

具体可以参见图7，其示出了本申请实施例提供的在故障盘个数未超过容错量时重构恢复故障盘数据的示意图，在图7中，主控节点具有五个磁盘组成的RAID5阵列，如果主控节点只有一个磁盘发生故障，则能够通过RAID5阵列内部机制进行数据恢复。其中，图7描述了主控节点中的磁盘1发生故障，使用spare热备盘重构磁盘1丢失的数据，例如热备盘中的strip1由磁盘2中的strip2A、磁盘3中的strip3B、磁盘4中的strip4A和磁盘5中的P校验分块(parity1)异或运算求得，同理其他条带的磁盘1的数据都可以恢复。当热备盘恢复了磁盘1丢失的所有数据后，热备盘将替换磁盘1，并与磁盘2-5组成一个新的RAID5阵列来处理主机I/O请求。Specifically, please refer to Figure 7, which shows a schematic diagram of reconstructing and recovering the data of a failed disk when the number of failed disks does not exceed the fault tolerance provided by an embodiment of the present application. In Figure 7, the master control node has a RAID5 array consisting of five disks. If only one disk of the master control node fails, data recovery can be performed through the internal mechanism of the RAID5 array. Among them, Figure 7 describes that disk 1 in the master control node fails, and the spare hot spare disk is used to reconstruct the lost data of disk 1. For example, strip1 in the hot spare disk is obtained by XOR operation of strip2A in disk 2, strip3B in disk 3, strip4A in disk 4, and P check block (parity1) in disk 5. Similarly, the data of disk 1 in other stripes can be recovered. When the hot spare disk recovers all the lost data of disk 1, the hot spare disk will replace disk 1 and form a new RAID5 array with disks 2-5 to process host I/O requests.

本申请一些实施例中，监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量，可以包括：In some embodiments of the present application, monitoring whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array may include:

在本申请中，主控节点可以定时监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量，具体地，可以采用定时器，以定时进行监控，从而便于及时发现故障盘个数与容错量之间的关系，进而便于及时采取不同的数据重构恢复方式进行数据恢复，以提高存储系统数据可靠性。其中，定时的时间间隔可以根据实际经验进行设置，本申请对此不做限定。In the present application, the master control node can monitor whether the number of faulty disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array. Specifically, a timer can be used to monitor regularly, so as to facilitate timely discovery of the relationship between the number of faulty disks and the fault tolerance, and then facilitate timely adoption of different data reconstruction and recovery methods for data recovery, so as to improve the data reliability of the storage system. Among them, the time interval of the timing can be set according to actual experience, and this application does not limit this.

当然，主控节点也可以实时监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量。Of course, the master control node can also monitor in real time whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array.

本申请一些实施例中，还可以包括：In some embodiments of the present application, the following may also be included:

在本申请中，主控节点在进行定时监控时，若监控到RAID阵列的故障盘个数未超过RAID阵列的容错量，则可以返回执行定时监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量的步骤，也即进行定时循环监控，以便于及时发现故障盘个数与容错量之间的关系，进而便于及时采取不同的数据重构恢复方式进行数据恢复，以提高存储系统数据可靠性。In the present application, when the master control node is performing timed monitoring, if the number of failed disks in the monitored RAID array does not exceed the fault tolerance of the RAID array, the master control node can return to the step of performing timed monitoring to determine whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array, that is, performing timed cyclic monitoring, so as to timely discover the relationship between the number of failed disks and the fault tolerance, and then facilitate timely adoption of different data reconstruction and recovery methods for data recovery, so as to improve the data reliability of the storage system.

需要说明的是，在返回执行定时监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量的步骤的同时，主控节点还可以利用RAID阵列内部的数据冗余机制进行数据恢复，以保证存储系统的数据可靠性。It should be noted that while returning to execute the step of periodically monitoring whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array, the master control node can also use the data redundancy mechanism inside the RAID array to recover data to ensure the data reliability of the storage system.

本申请一些实施例中，若辅助节点的个数大于1，则将故障盘中丢失数据的信息发送至辅助节点，可以包括：In some embodiments of the present application, if the number of auxiliary nodes is greater than 1, sending information about lost data in the failed disk to the auxiliary node may include:

在本申请中，若辅助节点的个数大于1，则主控节点可以根据预设选择策略(例如可以为工作负载最小选择策略或工作性能最好(故障率最低等)选择策略等)从多个辅助节点中选择出一个辅助节点，然后，主控节点可以将丢失数据的信息发送至所选择出的一个辅助节点中，以利用该辅助节点中的镜像数据来重构恢复出故障盘中的数据。In the present application, if the number of auxiliary nodes is greater than 1, the master node may select an auxiliary node from the plurality of auxiliary nodes according to a preset selection strategy (for example, a selection strategy with the smallest workload or a selection strategy with the best working performance (lowest failure rate, etc.), etc.), and then the master node may send the information about the lost data to the selected auxiliary node. Click to use the mirror data in the auxiliary node to reconstruct and restore the data in the failed disk.

通过上述方式可以实现将多个辅助节点中的一个辅助节点参与到数据恢复中，以使得能够进行稳定、有序的数据恢复，提高数据可靠性。Through the above method, one of the multiple auxiliary nodes can be involved in data recovery, so that stable and orderly data recovery can be performed, thereby improving data reliability.

若辅助节点的个数为1，则主控节点直接将故障盘中丢失数据的信息发送至该辅助节点，以使得该辅助节点参与到故障盘数据的恢复中。If the number of auxiliary nodes is 1, the master control node directly sends the information of lost data in the failed disk to the auxiliary node, so that the auxiliary node participates in the recovery of the failed disk data.

本申请一些实施例中，在利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列之后，还可以包括：In some embodiments of the present application, after the hot spare disk is used to replace the corresponding failed disk to re-form a RAID array with the normal disk, the following method may also be included:

在本申请中，在利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列之后，还可以定时清理辅助节点中的镜像数据(具体可以采用覆盖写的方式来清理辅助节点中的镜像数据)，以减少对内存的占用，并使得辅助节点可以将主机下发的更多数据作为镜像数据而参与到故障盘数据的恢复中。In the present application, after the hot spare disk is used to replace the corresponding failed disk to re-form the RAID array with the normal disk, the mirror data in the auxiliary node can also be cleaned up regularly (specifically, the mirror data in the auxiliary node can be cleaned up by overwriting) to reduce memory usage and enable the auxiliary node to use more data sent by the host as mirror data to participate in the recovery of the failed disk data.

本申请实施例还提供了一种数据恢复装置，参见图8，其示出了本申请实施例提供的一种数据恢复装置的结构示意图，可以包括：The embodiment of the present application further provides a data recovery device. Referring to FIG. 8 , a schematic diagram of the structure of a data recovery device provided by the embodiment of the present application is shown, which may include:

备份模块81，用于在多控节点中的辅助节点中备份主控节点中的数据，以将辅助节点中的数据作为主控节点的镜像数据；A backup module 81, used to back up the data in the main control node in the auxiliary node among the multiple control nodes, so as to use the data in the auxiliary node as the mirror data of the main control node;

监控模块82，用于监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量；A monitoring module 82, used to monitor whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array;

发送模块83，用于若RAID阵列的故障盘个数超过RAID阵列的容错量，则将故障盘中丢失数据的信息发送至辅助节点，由辅助节点从镜像数据中获取信息对应的数据并进行发送；A sending module 83 is used to send information about lost data in the failed disk to the auxiliary node if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, and the auxiliary node obtains data corresponding to the information from the mirror data and sends it;

写入模块84，用于接收信息对应的数据，将信息对应的数据写入故障盘对应的热备盘的相应分区中；A writing module 84, used for receiving data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of the hot spare disk corresponding to the failed disk;

第一替换模块85，用于利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列。The first replacement module 85 is used to replace the corresponding failed disk with a hot spare disk to re-form a RAID array with normal disks.

本申请一些实施例中，备份模块81可以包括：In some embodiments of the present application, the backup module 81 may include:

备份单元，用于在辅助节点中备份主控节点中的逻辑卷，以得到对应的镜像逻辑卷；逻辑卷及镜像逻辑卷中包含多个块，块与磁盘中的分块存在映射关系，主控节点及辅助节点中均存储有映射关系；The backup unit is used to back up the logical volume in the master node in the auxiliary node to obtain the corresponding mirror logical volume; the logical volume and the mirror logical volume contain multiple blocks, and there is a mapping relationship between the blocks and the blocks in the disk, and the mapping relationship is stored in the master node and the auxiliary node;

发送模块83可以包括：The sending module 83 may include:

形成单元，用于根据故障盘包含的分块及映射关系，形成逻辑卷对应的数据丢失元数据，将数据丢失元数据发送至辅助节点；数据丢失元数据中包含逻辑卷包含的块是否丢失的信息；A forming unit, used to form data loss metadata corresponding to the logical volume according to the blocks and mapping relationship contained in the failed disk, and send the data loss metadata to the auxiliary node; the data loss metadata includes information on whether the blocks contained in the logical volume are lost;

辅助节点具体用于根据映射关系及数据丢失元数据获取丢失数据的分块编号，并将镜像逻辑卷中与分块编号对应的数据发送至主控节点。The auxiliary node is specifically used to obtain the block number of the lost data according to the mapping relationship and the data loss metadata, and send the data corresponding to the block number in the mirrored logical volume to the main control node.

本申请一些实施例中，形成单元可以包括：In some embodiments of the present application, the forming unit may include:

形成子单元，用于根据故障盘包含的分块及映射关系，形成逻辑卷对应且以位图为数据组织方式的数据丢失元数据；其中，位图中的每个bit位表示逻辑卷包含的相应块是否丢失。A subunit is formed, which is used to form data loss metadata corresponding to the logical volume and organized in a bitmap according to the blocks and mapping relationships contained in the failed disk; wherein each bit in the bitmap indicates whether the corresponding block contained in the logical volume is lost.

判断模块，用于在利用热备盘替换对应的故障盘之前，判断RAID阵列中的各条带是否需要重新计算校验分块；The judgment module is used to judge whether each stripe in the RAID array is The check blocks need to be recalculated;

第一计算模块，用于若存在需要重新计算校验分块的条带，则根据条带中的各分块计算校验分块，并将计算得到的校验分块写入故障盘对应的热备盘的相应分区中。The first calculation module is used to calculate the check block according to each block in the stripe if there is a stripe for which the check block needs to be recalculated, and write the calculated check block into the corresponding partition of the hot spare disk corresponding to the failed disk.

本申请一些实施例中，写入模块84可以包括：In some embodiments of the present application, the writing module 84 may include:

写入单元，用于逐分块接收信息对应的数据，并逐分块将信息对应的数据写入故障盘对应的热备盘的相应分区中。The writing unit is used to receive data corresponding to the information block by block, and write the data corresponding to the information block by block into the corresponding partition of the hot spare disk corresponding to the failed disk.

第一重定向模块，用于若RAID阵列的故障盘个数超过RAID阵列的容错量，则将主机的I/O请求重定向到辅助节点；A first redirection module, configured to redirect the host's I/O request to the auxiliary node if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array;

第二重定向模块，用于在利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列之后，将主机的I/O请求重定向到主控节点。The second redirection module is used to redirect the I/O request of the host to the master control node after the hot spare disk replaces the corresponding failed disk to re-compose the RAID array with the normal disk.

存储模块，用于在将主机的I/O请求重定向到辅助节点时，将主机新发送的数据存储主控节点中，以将主控节点中存储的主机新发送的数据作为辅助节点的镜像数据。The storage module is used to store the data newly sent by the host in the main control node when redirecting the I/O request of the host to the auxiliary node, so as to use the data newly sent by the host stored in the main control node as the mirror data of the auxiliary node.

第二计算模块，用于若RAID阵列的故障盘个数未超过RAID阵列的容错量，则利用RAID阵列中各条带包含的正常磁盘的分块进行计算，得到故障盘对应的热备盘的分块；A second calculation module is used to calculate the blocks of the hot spare disk corresponding to the failed disk by using the blocks of the normal disk contained in each stripe in the RAID array if the number of failed disks in the RAID array does not exceed the fault tolerance of the RAID array;

第二替换模块，用于当恢复故障盘丢失的所有数据后，利用热备盘替换故障盘，以与正常磁盘重新组成RAID阵列。The second replacement module is used to replace the failed disk with a hot spare disk after all data lost on the failed disk is recovered, so as to re-form a RAID array with normal disks.

本申请一些实施例中，监控模块82可以包括：In some embodiments of the present application, the monitoring module 82 may include:

定时监控单元，用于定时监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量。The timing monitoring unit is used to periodically monitor whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array.

本申请一些实施例中，监控模块82还可以包括：In some embodiments of the present application, the monitoring module 82 may also include:

返回执行单元，用于若RAID阵列的故障盘个数未超过RAID阵列的容错量，则返回执行定时监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量的步骤。The return execution unit is used to return to the step of periodically monitoring whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array if the number of failed disks in the RAID array does not exceed the fault tolerance of the RAID array.

本申请一些实施例中，若辅助节点的个数大于1，则发送模块83可以包括：In some embodiments of the present application, if the number of auxiliary nodes is greater than 1, the sending module 83 may include:

发送单元，用于将故障盘中丢失数据的信息发送至根据预设选择策略从多个辅助节点中选择出的一个辅助节点中。The sending unit is used to send the information of lost data in the failed disk to an auxiliary node selected from multiple auxiliary nodes according to a preset selection strategy.

定时清理模块，用于在利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列之后，定时清理辅助节点中的镜像数据。The scheduled cleaning module is used to regularly clean up the mirror data in the auxiliary node after the hot spare disk replaces the corresponding failed disk to re-compose the RAID array with the normal disk.

本申请实施例还提供了一种数据恢复设备，参见图9，其示出了本申请实施例提供的一种数据恢复设备的结构示意图，可以包括：The embodiment of the present application further provides a data recovery device. Referring to FIG. 9 , a schematic diagram of the structure of a data recovery device provided by the embodiment of the present application is shown, which may include:

存储器91，用于存储计算机程序；A memory 91, used for storing computer programs;

处理器92，用于执行存储器91存储的计算机程序时可实现如下步骤：The processor 92, when used to execute the computer program stored in the memory 91, can implement the following steps:

在多控节点中的辅助节点中备份主控节点中的数据，以将辅助节点中的数据作为主控节点的镜像数据；监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量；若RAID阵列的故障盘个数超过RAID阵列的容错量，则将故障盘中丢失数据的信息发送至辅助节点，由辅助节点从镜像数据中获取信息对应的数据并进行发送；接收信息对应的数据，将信息对应的数据写入故障盘对应的热备盘的相应分区中；利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列。Back up the data in the master control node in the auxiliary node in the multi-control node so that the data in the auxiliary node can be used as the mirror data of the master control node; monitor whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array; if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, send the information of the lost data in the failed disk to the auxiliary node; The auxiliary node obtains the data corresponding to the information from the mirror data and sends it; receives the data corresponding to the information, and writes the data corresponding to the information into the corresponding partition of the hot spare disk corresponding to the failed disk; uses the hot spare disk to replace the corresponding failed disk to re-form the RAID array with the normal disk.

本申请实施例还提供了一种非易失性可读存储介质，非易失性可读存储介质中存储有计算机程序，计算机程序被处理器执行时可实现如下步骤：The embodiment of the present application further provides a non-volatile readable storage medium, in which a computer program is stored. When the computer program is executed by a processor, the following steps can be implemented:

在多控节点中的辅助节点中备份主控节点中的数据，以将辅助节点中的数据作为主控节点的镜像数据；监控主控节点中RAID阵列的故障盘个数是否超过RAID阵列的容错量；若RAID阵列的故障盘个数超过RAID阵列的容错量，则将故障盘中丢失数据的信息发送至辅助节点，由辅助节点从镜像数据中获取信息对应的数据并进行发送；接收信息对应的数据，将信息对应的数据写入故障盘对应的热备盘的相应分区中；利用热备盘替换对应的故障盘，以与正常磁盘重新组成RAID阵列。Back up the data in the main control node in the auxiliary node in the multi-control node to use the data in the auxiliary node as the mirror data of the main control node; monitor whether the number of failed disks in the RAID array in the main control node exceeds the fault tolerance of the RAID array; if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, send the information of the lost data in the failed disk to the auxiliary node, and the auxiliary node obtains the data corresponding to the information from the mirror data and sends it; receive the data corresponding to the information, and write the data corresponding to the information into the corresponding partition of the hot spare disk corresponding to the failed disk; use the hot spare disk to replace the corresponding failed disk to re-form the RAID array with the normal disk.

该非易失性可读存储介质可以包括：U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The non-volatile readable storage medium may include: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and other media that can store program codes.

本申请提供的一种数据恢复装置、设备及可读存储介质中相关部分的说明可以参见本申请实施例提供的一种数据恢复方法中对应部分的详细说明，在此不再赘述。The description of the relevant parts of a data recovery device, equipment and readable storage medium provided in the present application can refer to the detailed description of the corresponding parts of a data recovery method provided in an embodiment of the present application, and will not be repeated here.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。另外，本申请实施例提供的上述技术方案中与现有技术中对应技术方案实现原理一致的部分并未详细说明，以免过多赘述。It should be noted that, in this article, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is any such actual relationship or order between these entities or operations. Moreover, the term "include", "comprise" or any other variant thereof is intended to cover non-exclusive inclusion, so that the process, method, article or equipment that includes a series of elements are inherent to the elements. In the absence of more restrictions, the elements limited by the sentence "comprise one..." do not exclude the presence of other identical elements in the process, method, article or equipment that includes the elements. In addition, the above-mentioned technical solution provided in the embodiment of the present application is consistent with the corresponding technical solution in the prior art in principle, and the part is not described in detail, so as not to repeat too much.

对所公开的实施例的上述说明，使本领域技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域技术人员来说将是显而易见的，本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下，在其它实施例中实现。因此，本申请将不会被限制于本文所示的这些实施例，而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。 The above description of the disclosed embodiments enables those skilled in the art to implement or use the present application. Various modifications to these embodiments will be apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present application. Therefore, the present application will not be limited to the embodiments shown herein, but will conform to the widest scope consistent with the principles and novel features disclosed herein.

Claims

A data recovery method, comprising:

Backing up data in the main control node in an auxiliary node among the multiple control nodes, so as to use the data in the auxiliary node as mirror data of the main control node;

Monitoring whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array;

If the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, information about lost data in the failed disks is sent to the auxiliary node, and the auxiliary node obtains data corresponding to the information from the mirror data and sends the data;

receiving data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of a hot spare disk corresponding to the failed disk;

The hot spare disk is used to replace the corresponding failed disk to re-form a RAID array with normal disks.

The data recovery method according to claim 1, characterized in that backing up the data in the main control node in the auxiliary node among the multiple control nodes comprises:

The logical volume in the master node is backed up in the auxiliary node to obtain a corresponding mirrored logical volume; the logical volume and the mirrored logical volume contain a plurality of blocks, and there is a mapping relationship between the blocks and the blocks in the disk, and the mapping relationship is stored in both the master node and the auxiliary node;

Sending information about lost data in the failed disk to the auxiliary node includes:

According to the blocks contained in the failed disk and the mapping relationship, data loss metadata corresponding to the logical volume is formed, and the data loss metadata is sent to the auxiliary node; the data loss metadata includes information on whether the blocks contained in the logical volume are lost;

The auxiliary node obtains data corresponding to the information from the mirror data and sends the data, including:

The auxiliary node obtains the block number of the lost data according to the mapping relationship and the data loss metadata, and sends the data corresponding to the block number in the mirrored logical volume to the main control node.

The data recovery method according to claim 2 is characterized in that, according to the blocks contained in the failed disk and the mapping relationship, forming the data loss metadata corresponding to the logical volume comprises:

According to the blocks contained in the failed disk and the mapping relationship, the data loss metadata corresponding to the logical volume and organized in a bitmap is formed; wherein each bit in the bitmap indicates whether the corresponding block contained in the logical volume is lost.

The data recovery method according to claim 1, characterized in that before using the hot spare disk to replace the corresponding failed disk, it also includes:

Determine whether each stripe in the RAID array needs to recalculate the check blocks;

If there is a stripe for which the check blocks need to be recalculated, the check blocks are calculated according to the blocks in the stripe, and the calculated check blocks are written into the corresponding partition of the hot spare disk corresponding to the failed disk.

The data recovery method according to claim 4, characterized in that determining whether each stripe in the RAID array needs to recalculate the check block comprises:

Determine whether the check block in the stripe is located on the failed disk; or,

Determine whether the parity block is missing in the stripe.

The data recovery method according to claim 5, further comprising:

If the parity block in the stripe is located on the failed disk, or the parity block is missing in the stripe, it is determined that the stripe needs Recalculate the checksum block;

If the parity block in the stripe is not located on the failed disk, or the parity block is not missing in the stripe, it is determined that the stripe does not need to recalculate the parity block.

The data recovery method according to claim 4, characterized in that calculating the check block according to each block in the stripe comprises:

The blocks in the stripe are used to perform an XOR operation to calculate the check block.

The data recovery method according to claim 1, characterized in that receiving the data corresponding to the information and writing the data corresponding to the information into a corresponding partition of the hot spare disk corresponding to the failed disk comprises:

The data corresponding to the information is received block by block, and the data corresponding to the information is written block by block into a corresponding partition of the hot spare disk corresponding to the failed disk.

The data recovery method according to claim 1, characterized in that if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, further comprising:

Redirecting the I/O request of the host to the auxiliary node;

After the hot spare disk is used to replace the corresponding failed disk to re-form a RAID array with a normal disk, the method further includes:

Redirecting the I/O request of the host to the master control node.

The data recovery method according to claim 9, characterized in that when redirecting the I/O request of the host to the master control node, it also includes:

The I/O request of the host is no longer received and processed.

The data recovery method according to claim 9, characterized in that when redirecting the I/O request of the host to the auxiliary node, it also includes:

The data newly sent by the host is stored in the master control node, so that the data newly sent by the host stored in the master control node is used as the mirror data of the auxiliary node.

The data recovery method according to claim 1, characterized in that if the number of failed disks in the RAID array does not exceed the fault tolerance of the RAID array, it further comprises:

Using the blocks of the normal disks contained in each stripe in the RAID array to perform calculations, obtain the blocks of the hot spare disk corresponding to the failed disk;

After all the lost data of the failed disk is recovered, the hot spare disk is used to replace the failed disk to re-form a RAID array with normal disks.

The data recovery method according to claim 1, characterized in that monitoring whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array comprises:

Regularly monitor whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array.

Real-time monitoring is performed to determine whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array.

The data recovery method according to claim 13 or 14, characterized in that it also includes:

If the number of failed disks in the RAID array does not exceed the fault tolerance of the RAID array, the process of periodically monitoring whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array is returned. step.

The data recovery method according to claim 1, characterized in that if the number of the auxiliary nodes is greater than 1, sending the information of lost data in the failed disk to the auxiliary node comprises:

The information about lost data in the failed disk is sent to an auxiliary node selected from the plurality of auxiliary nodes according to a preset selection strategy.

The data recovery method according to claim 1, characterized in that after using the hot spare disk to replace the corresponding failed disk to re-form a RAID array with a normal disk, it also includes:

The mirror data in the auxiliary node is cleaned up regularly.

A data recovery device, comprising:

A backup module, used to back up data in the main control node in the auxiliary node among the multiple control nodes, so as to use the data in the auxiliary node as the mirror data of the main control node;

A monitoring module, used to monitor whether the number of failed disks in the RAID array in the master control node exceeds the fault tolerance of the RAID array;

A sending module, configured to send information about lost data in the failed disks to the auxiliary node if the number of failed disks in the RAID array exceeds the fault tolerance of the RAID array, and the auxiliary node obtains data corresponding to the information from the mirror data and sends the data;

A writing module, used for receiving data corresponding to the information, and writing the data corresponding to the information into a corresponding partition of the hot spare disk corresponding to the failed disk;

The first replacement module is used to use the hot spare disk to replace the corresponding failed disk, so as to re-form a RAID array with normal disks.

A data recovery device, comprising:

Memory for storing computer programs;

A processor, configured to implement the steps of the data recovery method according to any one of claims 1 to 17 when executing the computer program.

A non-volatile readable storage medium, characterized in that a computer program is stored in the non-volatile readable storage medium, and when the computer program is executed by a processor, the steps of the data recovery method according to any one of claims 1 to 17 are implemented.