[go: up one dir, main page]

WO2018001110A1 - Procédé et dispositif de restauration de données stockées sur la base d'un codage à effacement, et noeud de stockage - Google Patents

Procédé et dispositif de restauration de données stockées sur la base d'un codage à effacement, et noeud de stockage Download PDF

Info

Publication number
WO2018001110A1
WO2018001110A1 PCT/CN2017/088477 CN2017088477W WO2018001110A1 WO 2018001110 A1 WO2018001110 A1 WO 2018001110A1 CN 2017088477 W CN2017088477 W CN 2017088477W WO 2018001110 A1 WO2018001110 A1 WO 2018001110A1
Authority
WO
WIPO (PCT)
Prior art keywords
reconstructed
queue
data
stripe
recovery threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2017/088477
Other languages
English (en)
Chinese (zh)
Inventor
江滢
王志坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Publication of WO2018001110A1 publication Critical patent/WO2018001110A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's

Definitions

  • the present application relates to the field of communications, for example, to a method and apparatus for storing data based on erasure code, and a storage node.
  • Cloud storage systems In recent years, with the explosive growth of information resources and data, distributed storage systems have become the foundation and core of cloud storage and big data with high performance, high scalability, high availability, and easy management. However, due to hardware damage and software failure, data corruption and loss may occur during data storage. Cloud storage systems generally use erasure code technology to improve fault tolerance and improve data resource utilization efficiency and system performance. The erasure code does not increase the excess storage space, and usually ensures high reliability and availability of data through reasonable redundancy coding.
  • the use of erasure code technology to save data compared to the full replication technology, to a large extent reduce the system space overhead, but at the same time due to data reconstruction will bring huge network overhead, the adoption of this
  • the type of erasure code technology may cause the network of the entire system or the network of some nodes to be congested and unable to provide services, which affects the performance of the system.
  • the number of nodes deployed in the current storage system is increasing, and the number of nodes that fail every day is also increasing.
  • the proportion of data recovery traffic will continue to increase in total network traffic. Access that greatly affects daily business data. Therefore, how to reduce the bandwidth consumption in the erasure code technology and ensure the service performance are currently worthy of consideration.
  • the embodiments of the present disclosure provide a storage data reconstruction method and apparatus based on erasure code, and a storage node, so as to solve the problem of high bandwidth consumption in the storage data reconstruction in the related art, the system is unstable, and the service performance is not good. The problem.
  • an embodiment of the present disclosure provides a method for reconstructing a stored data based on an erasure code, including:
  • the startup failure recovery threshold is less than or equal to the number of striped storage blocks. Reconstructing the difference between the minimum number of data blocks and the erasure code, and greater than or equal to 1;
  • Data reconstruction is performed using the non-faulty data block of the strip.
  • the embodiment of the present disclosure further provides a storage data reconstruction apparatus based on an erasure code, including:
  • the startup fault recovery threshold determining module is configured to determine a startup fault recovery threshold, and the startup fault recovery threshold is less than or equal to a difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1;
  • the fault recovery startup module is configured to start a fault recovery of the stripe for a stripe whose number of fault data blocks reaches a startup fault recovery threshold;
  • a data reconstruction module configured to utilize the non-faulty data blocks of the stripe for data reconstruction.
  • the embodiment of the present disclosure further provides a storage node based on an erasure code, including a physical storage medium and a processor, where the processor is configured to:
  • the startup failure recovery threshold is less than or equal to the difference between the number of striped storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1;
  • the stripe non-faulty data block is extracted from the physical storage medium of the storage node and the physical storage medium of the other storage node for data reconstruction.
  • the embodiment of the present disclosure further provides a computer storage medium, where the computer storage medium stores computer executable instructions, and the computer executable instructions are used to execute the erasure code based storage data reconstruction method according to any one of the foregoing.
  • the computer storage medium may be a transitory computer readable storage medium or a non-transitory computer readable storage medium.
  • the startup failure recovery threshold is less than or equal to the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1;
  • the stripe of the fault recovery threshold starts the fault recovery of the strip; the non-faulty data block of the strip is used for data reconstruction.
  • the number of times of failure recovery is reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the service performance of the system.
  • FIG. 1 is a schematic diagram of a principle of an erasure code according to any embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of distributed data storage of erasure codes according to any embodiment of the present disclosure
  • FIG. 3 is a flowchart of a method for reconstructing stored data based on erasure code according to Embodiment 1 of the present disclosure
  • FIG. 4 is a schematic diagram of a storage data reconstruction apparatus based on erasure code according to Embodiment 2 of the present disclosure
  • FIG. 5 is a schematic diagram of a storage node based on an erasure code according to Embodiment 3 of the present disclosure
  • FIG. 6 is a schematic diagram of a storage cluster based on an erasure code according to Embodiment 4 of the present disclosure
  • FIG. 7 is a flowchart of a method for reconstructing stored data based on erasure code according to Embodiment 4 of the present disclosure.
  • the data is stored by using the erasure code technique.
  • the method includes: dicing the original file to obtain k source data blocks, and then encoding the k source data blocks to generate n coded data blocks, that is, one.
  • (n, k) erasure code is that k source data blocks are encoded to obtain n data blocks; then, when performing data reconstruction, any k data blocks in the n data blocks can be restored by decoding.
  • the k source data blocks, the k source data blocks are combined to reconstruct the original file.
  • the distributed data storage model based on erasure code can be seen in Figure 2.
  • the k data nodes store the original data blocks, labeled D 0 , D 1 , . . . , D k-1 , respectively ;
  • m coding nodes store the coded data blocks, labeled C 0 , C 1 , . . . , C m-1 .
  • the erasure code algorithm needs to cut the original file into k equal parts and store it in k data nodes in turn, that is, switch the original file to obtain k source data blocks, and put the m pieces of data generated by the encoding into m coding nodes. .
  • the original file When storing a large file, the original file needs to be double-cut, that is, each time the data of the specified size is read from the file for encoding, we refer to the original data and the encoded data involved in the encoding process as a stripe. band).
  • a stripe independently constitutes a coded set of information, and different stripes are independent of each other.
  • the data reconstruction is triggered.
  • the new node needs to first download all the data from the k nodes to recover the original file. Re-encoding to generate invalid data, the amount of data transmitted in this process is k times the invalid data.
  • the related technologies limit the available network bandwidth for data recovery, which will inevitably lead to a slower node reconstruction process.
  • the node reconstruction rate directly affects system reliability. If the reconfiguration rate is too slow, even if the speed at which the node fails, the system will not be able to maintain its reliability. And limit the data recovery bandwidth, but reduce the network bandwidth consumption in a short period of time, and in the long run, the bandwidth occupied by data recovery is not substantially reduced. Therefore, a more reasonable and reliable data reconstruction method is needed to reduce system bandwidth usage and ensure system stability.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • Figure 3 including:
  • Step S301 Determine a startup failure recovery threshold, and the startup failure recovery threshold is less than or equal to a difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1.
  • the concept of starting the fault recovery threshold is introduced, that is, the startup failure recovery is performed, in order to reduce the number of times of the storage data recovery.
  • the startup failure recovery threshold for data recovery is set for each strip according to the resource condition of the system, and the fault recovery of the strip is started for the stripe whose number of faulty data blocks reaches the startup fault recovery threshold.
  • the start failure recovery threshold is defined as r. For each strip, when the number of fault data blocks reaches r, the fault recovery is started immediately.
  • r can be max nk and the minimum is 1, which is the number of stripe storage blocks, corresponding to n storage nodes in the system.
  • k reconstructs the least data block for the erasure code The number corresponds to k data nodes.
  • the startup failure recovery threshold is set, and the fault recovery of the strip is started for the strip whose fault data block reaches the startup fault recovery threshold. Compared with the related technology, as long as one storage node fails, the data recovery and reconstruction are triggered, which is effective. The frequency of data recovery is reduced, and the bandwidth occupation is reduced, thereby ensuring service performance and improving system stability.
  • the method for reconstructing the data based on the erasure code provided in this embodiment may further include setting the startup failure recovery threshold to an initial value; dynamically adjusting the startup failure recovery threshold according to the system load condition, and the system load is heavier, the more the failure recovery threshold is. Big.
  • the data recovery frequency can be reduced more reasonably, the bandwidth occupied by recovery can be reduced, and services can be guaranteed as much as possible.
  • Performance when the storage system is initialized, the determined startup failure recovery threshold is set to an initial value in the storage system, and then the startup failure recovery threshold is dynamically adjusted according to the real-time resource state of the system, wherein the system load is heavier.
  • the startup failure recovery threshold is larger; the real-time dynamic adjustment of the startup failure recovery threshold includes setting an adjustment period, and adjusting the startup failure recovery threshold at intervals.
  • the initial value of the startup failure recovery threshold r can be set to 1.
  • dynamically adjusting the startup fault recovery threshold according to the system load condition includes: periodically calculating the load information of the system, and determining that the system load is a heavy load or a light load according to a preset rule; and recovering the startup fault of the next cycle when the heavy load is performed;
  • the threshold is increased by a preset step value, and when the light load is performed, the startup failure recovery threshold of the next cycle is subtracted from the preset step value;
  • the preset step value includes greater than or equal to 1, and is less than or equal to the number of stripe storage data blocks and
  • the erasure code reconstructs a positive integer of the minimum number of data block differences.
  • the startup failure recovery threshold of the next cycle is incremented by 1, and is not greater than the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code; when the heavy load is lightly loaded, the next period is The startup failure recovery threshold is decremented by 1, and is not less than one. That is, when it is judged that the load of the system is light and the system bandwidth does not constitute a bottleneck, the failure recovery threshold is started to be continuously 1 to ensure rapid recovery of system data.
  • the startup fault recovery threshold is n-k, which ensures that the faulty strip recovers quickly, thus effectively ensuring system reliability and improving system service performance.
  • the above-mentioned periodic calculation system load information, and determining the system load as a heavy load or a light load according to a preset rule includes: recording Num i as the number of user I/O requests completed in the time period P i , Latency i (k) P i is the period of the k-th user I / O service hours; maximum delay period P i is provided limit i, and delay requirements for each protocol user I / O, are satisfied latency i (k) ⁇ Limit i ; defines the ratio of Violate i to user I/O that violates the delay protocol:
  • is called relaxation factor, then the load is judged to be overloaded; if system congestion does not occur in period P i and Violate i > ⁇ is not satisfied, it is judged as light
  • the ⁇ can be set as needed, which is not limited in this embodiment.
  • Step S102 Start a fault recovery of the stripe for a stripe whose number of faulty data blocks reaches a fault recovery threshold.
  • the number of faulty data blocks in the system is detected, and the number of fault data blocks corresponding to each stripe is counted.
  • the faulty data block of the stripe reaches the startup fault recovery threshold, the stripe is recovered.
  • Step S103 performing data reconstruction by using the non-faulty data block of the strip.
  • At least one queue to be reconstructed is constructed for each stripe of the faulty data block.
  • the stripe identification information is recorded in the queue to be reconstructed, and each strip corresponding to each queue has the same number of fault data blocks; for the queue to be reconstructed that reaches the startup fault recovery threshold, according to the fault data of each strip corresponding to the queue.
  • the number of blocks is selected from the largest to the smallest, and the queues to be reconstructed are sequentially selected, and the fault recovery is started for the strips in the selected queue to be reconstructed.
  • the fault data block and the stripe statistics can be performed through the queue to be reconstructed, and then the fault recovery is performed according to the statistical situation.
  • the data is reconstructed for the stripe whose number of faulty data blocks reaches the threshold of the fault recovery threshold.
  • the fault is recovered by sequentially selecting the strips with more faulty data blocks, and the k normal storage data corresponding to the stripe are read from the system.
  • the method for reconstructing the stored data based on the erasure code provided by the embodiment by determining the startup failure recovery threshold, starts the difference that the failure recovery threshold is less than or equal to the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and If the number of faulty data blocks reaches the start fault recovery threshold, the fault recovery of the strip is started; and the non-faulty data block of the strip is used for data reconstruction.
  • the number of times of failure recovery is reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the system service performance.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • the embodiment provides a storage data reconstruction apparatus based on the erasure code.
  • the method includes: a startup failure recovery threshold determination module 41, a failure recovery startup module 42 and a data reconstruction module 43, wherein the startup failure recovery threshold is determined.
  • the module 41 is configured to determine a startup failure recovery threshold, the startup failure recovery threshold being less than or equal to a difference between the number of stripe storage data blocks and the number of erasure code reconstruction minimum data blocks, and greater than or equal to 1;
  • the fault recovery initiation module 42 is configured to For the stripe whose number of failed data blocks reaches the startup failure recovery threshold, the fault recovery of the strip is initiated;
  • the data reconstruction module 43 is configured to perform data reconstruction using the non-faulty data blocks of the strip.
  • the startup failure recovery threshold determination module 41 performs the startup failure recovery threshold setting, in order to reduce the number of times of the storage data recovery.
  • the startup failure recovery threshold for data recovery is set for each stripe. For the stripe whose number of faulty data blocks reaches the startup fault recovery threshold, the fault recovery of the strip is started.
  • the start failure recovery threshold is defined as r. For each strip, when the number of fault data blocks reaches r, the fault recovery is started immediately.
  • r may be n-k at the maximum and 1 at least.
  • the startup failure recovery threshold is set, and the fault recovery of the strip is started for the stripe whose number of faulty data blocks reaches the startup fault recovery threshold.
  • the data recovery is triggered. Reconstruction effectively reduces the frequency of data recovery and reduces bandwidth usage, thereby ensuring service performance and improving system stability.
  • the apparatus for reconstructing data based on the erasure code may further include a startup failure recovery threshold adjustment module 44 configured to set the startup failure recovery threshold to an initial value;
  • the load condition is dynamically adjusted to start the fault recovery threshold. The heavier the system load is, the larger the fault recovery threshold is.
  • the data recovery frequency can be reduced more reasonably, the bandwidth occupied by recovery can be reduced, and services can be guaranteed as much as possible.
  • Performance when the storage system is initialized, the determined startup failure recovery threshold is set to an initial value in the storage system, and then the startup failure recovery threshold is dynamically adjusted according to the real-time resource state of the system, wherein the system load is heavier. The startup failure recovery threshold is larger.
  • the initial value of the startup failure recovery threshold r may be set to 1 by the startup failure recovery threshold adjustment module.
  • dynamically adjusting the startup fault recovery threshold according to the system load condition includes: periodically calculating the load information of the system, and determining that the system load is a heavy load or a light load according to a preset rule; and recovering the startup fault of the next cycle when the heavy load is performed;
  • the threshold is incremented by one, and is not greater than the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code; when the light load is heavy, the start failure recovery threshold of the next period is decremented by 1, and not less than one. That is, when it is judged that the load of the system is light and the system bandwidth does not constitute a bottleneck, the failure recovery threshold is started to be continuously 1 to ensure rapid recovery of system data.
  • the startup fault recovery threshold is nk, which ensures that the faulty fault is quickly recovered, thus effectively ensuring system reliability and improving system service performance.
  • the above-mentioned cyclic calculation system load information, and determining whether the system load is heavy or light load according to a preset rule can be performed by whether system congestion occurs in the period P i or whether the Violate i is greater than ⁇ , ⁇ is called a relaxation factor. Judge.
  • the fault recovery startup module 42 determines, for the number of fault data blocks, a strip that initiates a fault recovery threshold, including: detecting the number of data blocks that have failed in the system, and performing statistics on the data of the fault data blocks for each stripe, when the strips are When the fault data block is reached, the strip is recovered and the fault recovery of the strip is started.
  • the apparatus for reconstructing the data based on the erasure code further includes a reconstruction queue processing module 45 configured to: construct at least one to be reconstructed for each stripe of the faulty data block In the queue, the stripe identification information is recorded in the queue to be reconstructed, and each strip corresponding to each queue has the same number of fault data blocks; for the queue to be reconstructed that reaches the startup fault recovery threshold, according to each strip corresponding to the queue The number of fault data blocks is selected from the largest to the smallest, and the queues to be reconstructed are selected, and the fault recovery is started for the strips in the selected queue to be reconstructed.
  • the data reconstruction module 43 performs data reconstruction using the non-faulty data blocks of the stripe. Select a strip with more faulty data blocks to recover the fault, read the k normal storage data corresponding to the strip from the network, and obtain the original file; then calculate the placement bar according to the strip id and the current node and network availability. a new set of n nodes; and encodes n data blocks according to an erasure code algorithm, respectively sends strip information and data blocks to the new node through the network; each new node updates local information according to the situation; writes the data Enter the node and complete the data reconstruction.
  • the apparatus for reconstructing data based on the erasure code provided by the embodiment, by determining the startup failure recovery threshold, starts the difference that the failure recovery threshold is less than or equal to the number of stripe storage blocks and the minimum number of blocks of the erasure code reconstruction, and If the number of faulty data blocks reaches the start fault recovery threshold, the fault recovery of the strip is started; and the non-faulty data block of the strip is used for data reconstruction.
  • the number of times of failure recovery is effectively reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the service performance of the system.
  • Embodiment 3 is a diagrammatic representation of Embodiment 3
  • the embodiment provides a storage node based on an erasure code.
  • the processor 51 and the physical storage medium 52 are included, wherein the processor 51 is configured to: determine a startup failure recovery threshold, and distribute the threshold to other storage. Node; the startup failure recovery threshold is less than or equal to the difference between the number of stripe storage data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1; scanning the fault condition of each strip responsible for the storage node, for the fault data block The number of the fault recovery threshold is reached, and the fault recovery of the strip is started; the non-faulty data block of the strip is extracted from the physical storage medium of the storage node and the physical storage medium of the other storage node for data reconstruction.
  • the processor 51 sets a startup failure recovery threshold in the system when the system is initialized, and sets the startup failure recovery threshold to an initial value; then dynamically adjusts the startup failure recovery threshold according to the system load condition when the system performs file read and write operations. The heavier the system load, the larger the recovery threshold is. Through the setting of the startup failure recovery threshold, the frequency of data reconstruction in the system is dynamically adjusted according to the system load, thereby effectively reducing the bandwidth consumption of the system.
  • the processor 51 may be configured to: construct, for each strip of the faulty data block, at least one queue to be reconstructed, and record the queue to be reconstructed, in order to conveniently collect the faulty data block and the stripe information.
  • Stripe identification information each strip corresponding to the queue to be reconstructed has the same number of fault data blocks; for the queue to be reconstructed to reach the fault recovery threshold, the fault data according to each strip corresponding to the queue to be reconstructed The number of blocks is selected from the largest to the smallest, and the queues to be reconstructed are sequentially selected, and the fault recovery is started for the strips in the selected queue to be reconstructed.
  • the server 51 may obtain the non-faulty data block of the stripe from the physical storage medium of the storage node where the server is located, or obtain the non-faulty data block of the stripe from other stored physical storage media.
  • the physical storage medium in this embodiment may be a storage unit configured to store data.
  • the processor 51 in this embodiment may be a processor 51, which is provided with Different functional modules are used to perform the different processes described above; the processor 51 may also be a plurality of processors 51 having different processing functions, each of which performs one of the above processes or several processes.
  • the storage node based on the erasure code provided in this embodiment determines that the fault recovery threshold is started, and the fault recovery threshold is less than or equal to the difference between the number of stripe data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1
  • the fault recovery threshold is less than or equal to the difference between the number of stripe data blocks and the minimum number of data blocks reconstructed by the erasure code, and is greater than or equal to 1
  • the fault recovery of the strip is initiated; the non-faulty data block of the stripe is used for data reconstruction.
  • the number of times of failure recovery is effectively reduced, thereby reducing the bandwidth consumption of the system, making the system more stable and improving service performance.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • the storage node is usually a storage node provided by the foregoing embodiment 3.
  • the storage node usually includes a management center 61 and a management agent 62.
  • the management center 61 is configured to maintain the members and status of the cluster, as well as data distribution rules, data recovery rules, etc., to provide strong consistency decisions. It is usually deployed on three storage nodes by default to form a management center cluster; the management center 61 can also be deployed on a separate server for cluster management.
  • the management center cluster is designed based on paxos algorithm to implement a more suitable consistency election algorithm, so that the change of node state is unified on all nodes of the whole system.
  • the management agent 62 is configured to implement communication between the nodes and the management center 61, periodically provide node health information to the management center 61, and receive control instructions from the management center 61.
  • a management center 61 can be deployed on each storage node.
  • the distributed file storage client 63 is configured to provide a distributed cluster access point service, and can also be regarded as an agent for an application access storage system, providing a file operation interface common to applications such as C API, IAVA API, NFS ( Network File System (Network File System), CIFS (Common Internet File System), etc.; perform data interaction with the client 60, and the client 60 may be a user client corresponding to the storage cluster.
  • Data routing 64 is responsible for file access control, data file distribution and management of various data, and metadata storage. The data location function communicates with the local data storage service process, responds to read and write requests from the distributed file storage client, and routes the request to the local data storage service process on the node to implement data access, copy distribution, and the like.
  • Each data positioning module can share memory data and zero failover time, which can be easily expanded to provide massive metadata capacity.
  • Maintain routing data queues Q i to be reconstructed local data storage service manager responsible for the actual physical media resource management and maintenance space, and can be responsible for storing lookup local objects, perform I / O operations.
  • the local data storage service is a process that actually processes data read and write, interacts with physical storage devices, and implements data read and write functions.
  • the storage cluster may be a NAS storage cluster or any other storage cluster configured as data storage.
  • the embodiment provides a method for reconstructing stored data based on the erasure code.
  • the method includes:
  • Step S701 initializing the storage system.
  • the initial setting of the system includes: setting the use of the erasure code (n, k) through the management center 61, the maximum number of faulty data blocks can be tolerated as nk, and when the nk data blocks are faulty, another k normal data can be passed.
  • the block restores the original file and regenerates n data blocks to maintain system redundancy.
  • the system must maintain the necessary data reliability by writing additional redundant data to the new node.
  • the initial startup failure recovery threshold is initialized, and the maximum reliability is the highest, and the reliability is the highest.
  • the initialization startup failure recovery threshold r can be 1.
  • each storage node initializes a stripe list of the node, and each item in the stripe list includes a stripe id, a stripe master id information, and the stripe relates to all nodes and disk id information.
  • the number of failures of all stripe data blocks in Q 1 is 1.
  • the number of failures of all stripe data blocks in Q nk is nk.
  • Each record in the queue is stripped with an id. Therefore, when the storage system is initialized, there are nk queues to be reconstructed on each storage node, and each queue is empty.
  • step S702 a file writing operation is performed.
  • the file write operation is performed, including: the distributed file storage client 63 dynamically selects a data node response write request of a storage node according to the load balancing principle; the data route is searched or calculated according to the current storage system rule. Obtaining n nodes and disk ids that the file file should be written, and determining one of the (node id, disk id) tuples as the primary id; then encoding n data blocks according to the (n, k) erasure code; The stripe information and data blocks are sent to n nodes respectively. On the aforementioned n nodes, the data route records the stripe information into the stripe list, and the local data storage service writes the data to the local disk.
  • Step S703 detecting status information of the system.
  • the status information of the detection system includes: the management center 61 periodically reads the system load information and the system availability status information in the current time period P i from each node management agent 62, wherein the system availability status information includes each node, a disk, and a network. Link status, etc.
  • the management center 61 needs to process the collected information, including filtering out the dirty data obtained from the faulty node, etc.; the management center 61 confirms according to the processed system available information, the decision node and the network fault condition, and the active heartbeat. The fault condition of the system.
  • the startup failure recovery threshold r of the next period P i+1 is determined. Then, the determined startup failure recovery threshold is sent to the data route 64 of each storage node, and the global fault condition is also sent to each storage node.
  • step S704 data reconstruction is performed.
  • the data routing 64 scans each stripe responsible for the node (the strip main data block is at the local node), and refreshes the queue to be reconstructed, including: if the node or disk of all the data blocks of the strip S is located If the queue is not in any of the queues to be reconstructed, the strip is skipped and the next strip is scanned. If all the data blocks of the strip S are in the normal state, the strips are normal.
  • the strip S is deleted from the Q i and the queue information of the strip S is updated; if the fault node and the number of the disk blocks involved in the strip S are i (stripe If the node or disk where the partial data block of S is located is abnormal, and the previous period of the strip S is in the queue Q i to be reconstructed, the strip S is deleted from the Q i and then inserted into the queue Q i and updated at the same time.
  • the queue information of the strip S if the faulty node involved in the strip S, the number of disk blocks is i (the node where the partial data block of the strip S is located or the disk is abnormal), and the strip S is not in any one cycle configuration queue, then the strip S is inserted into the tail queue Q i, queue information and update the strip S by the above-mentioned more Such a process the same number of faults to be reconstructed slice queue, during reconstruction, and more preferred number of faults to be reconstructed with a strip queue for data reconstruction.
  • the queue Q i may be adjusted by the queuing module during the reconstruction process.
  • the blocks are sent to all reachable nodes in the set Set ⁇ Set'respectively; each new node updates the local information according to the situation.
  • each new node updates the local information according to the situation.
  • the data route of node n records the stripe information into the stripe list;
  • the data route of the node N records the stripe information into the stripe list, and the local data storage service module writes the data to the node to complete the data reconstruction;
  • the node n in the set if The space reclamation module deletes the data corresponding to the strip S and reclaims the space. At the same time, the data routing deletes the corresponding strip information record from the strip list.
  • Embodiments of the present disclosure also provide a computer readable storage medium storing computer executable instructions arranged to perform the method of any of the above embodiments.
  • the method for reconstructing stored data based on the erasure code provided in this embodiment recovers and merges multiple data blocks of the same strip into one completion according to the availability of the system and the load condition of the system, thereby effectively reducing the data recovery bandwidth occupation. Compared to one block failure in the related art, it takes up to k times bandwidth recovery, and recovering multiple data blocks (assuming f) requires f*k times bandwidth.
  • the method for reconstructing stored data based on the erasure code provided in this embodiment requires k times bandwidth to recover f data blocks, and converts to recover one data block, only needs k/f times bandwidth, thereby avoiding unnecessary data recovery.
  • the bandwidth consumption is greatly reduced; and the bandwidth consumption caused by data recovery is reduced, the network communication cost is effectively reduced, and the service performance is improved; the startup failure recovery threshold is dynamically adjusted according to the load, and the system data is quickly restored when the load is light.
  • the load is heavy, the strips with severe faults are quickly restored, thereby effectively ensuring system reliability and achieving a good balance between system reliability and system service performance.
  • the method for reconstructing the stored data based on the erasure code provided by the embodiment is simple to implement, and does not need to modify the underlying kernel, and is applicable to various operating systems such as windows and Linux; and is independent of the platform, that is, it is used for various architectures.
  • the distributed storage system is applicable.
  • modules or steps of the above embodiments of the present disclosure may be implemented by a general computing device, which may be concentrated on a single computing device or distributed among multiple computing devices. On the network, optionally, they may be implemented by program code executable by the computing device, such that they may be stored in a computer storage medium (ROM/RAM, disk, optical disk) by a computing device, and at some In some cases, it can be performed in a different order than here.
  • the steps shown or described are either made separately into individual integrated circuit modules, or a plurality of modules or steps are fabricated as a single integrated circuit module. Therefore, the present disclosure is not limited to any specific combination of hardware and software.
  • the method and device for storing data based on erasure code provided by the present application and the storage node reduce the number of times of failure recovery, thereby reducing the bandwidth consumption of the system, making the system more stable and improving the service performance of the system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Retry When Errors Occur (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

L'invention porte sur un procédé et un dispositif de restauration de données stockées sur la base d'un codage à effacement, et sur un noeud de stockage. Le procédé consiste à déterminer un seuil d'amorçage de rétablissement après défaillance, lequel seuil étant inférieur ou égal à la différence entre le nombre de blocs de données stockées d'une bande et le nombre minimum de blocs pour la restauration par codage à effacement et étant égal ou supérieur à 1 (S301); pour une bande, dont le nombre de blocs de données défaillants atteint le seuil d'amorçage de rétablissement après défaillance, amorcer le rétablissement après défaillance de la bande (S302); utiliser des blocs de données non défaillants de la bande pour effectuer une restauration de données (S303). Par rapport à l'état de la technique, l'invention réduit le nombre de fois où il faut effectuer la rétablissement après défaillance, ce qui réduit la consommation de largeur de bande du système, de sorte que le système est plus stable et présente de meilleures performances de service.
PCT/CN2017/088477 2016-06-29 2017-06-15 Procédé et dispositif de restauration de données stockées sur la base d'un codage à effacement, et noeud de stockage Ceased WO2018001110A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610495313.8 2016-06-29
CN201610495313.8A CN107544862B (zh) 2016-06-29 2016-06-29 一种基于纠删码的存储数据重构方法和装置、存储节点

Publications (1)

Publication Number Publication Date
WO2018001110A1 true WO2018001110A1 (fr) 2018-01-04

Family

ID=60786768

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/088477 Ceased WO2018001110A1 (fr) 2016-06-29 2017-06-15 Procédé et dispositif de restauration de données stockées sur la base d'un codage à effacement, et noeud de stockage

Country Status (2)

Country Link
CN (1) CN107544862B (fr)
WO (1) WO2018001110A1 (fr)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213637A (zh) * 2018-11-09 2019-01-15 浪潮电子信息产业股份有限公司 分布式文件系统集群节点的数据恢复方法、装置及介质
CN110568993A (zh) * 2019-08-06 2019-12-13 新华三技术有限公司成都分公司 一种数据更新方法及相关装置
CN110597655A (zh) * 2019-06-26 2019-12-20 中大编码有限公司 一种迁移与基于纠删码的重构相耦合的快速预知修复方法和实现
CN110874284A (zh) * 2018-09-03 2020-03-10 阿里巴巴集团控股有限公司 数据处理的方法和装置
CN111400083A (zh) * 2020-03-17 2020-07-10 上海七牛信息技术有限公司 数据存储方法及系统、存储介质
CN111581020A (zh) * 2020-04-22 2020-08-25 上海天玑科技股份有限公司 一种分布式块存储系统中数据恢复的方法和装置
CN111625394A (zh) * 2020-05-27 2020-09-04 成都信息工程大学 基于纠删码的数据恢复方法、装置、设备及存储介质
CN113190384A (zh) * 2021-05-21 2021-07-30 重庆紫光华山智安科技有限公司 基于纠删码的数据恢复控制方法、装置、设备及介质
US11182249B1 (en) 2020-06-24 2021-11-23 International Business Machines Corporation Block ID encoding in an erasure coded storage system
CN114415970A (zh) * 2022-03-25 2022-04-29 北京金山云网络技术有限公司 分布式存储系统的磁盘故障处理方法、装置及服务器
CN115480709A (zh) * 2022-10-14 2022-12-16 济南浪潮数据技术有限公司 存储池的重构速度控制方法及相关组件
CN115858250A (zh) * 2022-12-30 2023-03-28 浙江大华技术股份有限公司 数据恢复方法、装置、存储介质及电子装置
CN118821212A (zh) * 2024-07-09 2024-10-22 上海飞斯信息科技有限公司 一种基于分布式存储的加密方法及系统
CN120045144A (zh) * 2025-04-27 2025-05-27 湖北科惠通科技有限公司 一种超大规模科创数据的分布式存储与管理方法、系统及设备

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804039B (zh) * 2018-06-04 2021-01-29 平安科技(深圳)有限公司 自适应的数据恢复流控方法、装置、电子设备及存储介质
CN108763107B (zh) * 2018-06-04 2022-03-01 平安科技(深圳)有限公司 后台写盘流控方法、装置、电子设备及存储介质
CN108959399B (zh) * 2018-06-04 2022-07-15 平安科技(深圳)有限公司 分布式数据删除流控方法、装置、电子设备及存储介质
CN110865901B (zh) * 2018-08-28 2021-05-04 华为技术有限公司 组建ec条带的方法和装置
CN111506450B (zh) * 2019-01-31 2024-01-02 伊姆西Ip控股有限责任公司 用于数据处理的方法、设备和计算机程序产品
CN111176900A (zh) * 2019-12-30 2020-05-19 浪潮电子信息产业股份有限公司 一种分布式存储系统及其数据恢复方法、装置和介质
CN111475329B (zh) * 2020-02-25 2023-07-18 成都信息工程大学 一种大数据应用平台下降低预测式纠删码修复的方法及装置
CN111614720B (zh) * 2020-04-13 2022-02-18 厦门大学 针对集群存储系统单点失效修复的跨集群流量优化方法
CN111679793B (zh) * 2020-06-16 2023-03-14 成都信息工程大学 一种基于star码的单盘故障快速恢复方法
CN111917823B (zh) * 2020-06-17 2022-02-18 烽火通信科技股份有限公司 一种基于分布式存储Ceph的数据重构方法与装置
CN112799882A (zh) * 2021-02-08 2021-05-14 上海交通大学 一种基于图算法的文件感知恢复方法及装置
CN112783688B (zh) * 2021-02-10 2022-06-03 上海交通大学 一种基于可用分区级的纠删码数据恢复方法及装置
CN113205836A (zh) * 2021-03-26 2021-08-03 重庆冷存科技有限公司 基于纠删码的冷数据重构系统及其方法
CN113504875B (zh) * 2021-06-24 2023-08-01 中国科学院计算技术研究所 一种基于多级调度的纠删码系统恢复方法及系统
CN115657965B (zh) * 2022-11-16 2023-04-07 苏州浪潮智能科技有限公司 一种元数据的配置方法、装置及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130173996A1 (en) * 2011-12-30 2013-07-04 Michael H. Anderson Accelerated erasure coding system and method
CN103207761A (zh) * 2013-04-17 2013-07-17 浪潮(北京)电子信息产业有限公司 一种raid5系统热备盘的数据备份方法和重构方法
CN103577274A (zh) * 2012-07-31 2014-02-12 国际商业机器公司 管理存储器阵列的方法和装置
US20140208022A1 (en) * 2013-01-21 2014-07-24 Kaminario Technologies Ltd. Raid erasure code applied to partitioned stripe
CN103955343A (zh) * 2014-04-16 2014-07-30 华中科技大学 一种基于i/o流水线的失效节点数据重构优化方法

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6993701B2 (en) * 2001-12-28 2006-01-31 Network Appliance, Inc. Row-diagonal parity technique for enabling efficient recovery from double failures in a storage array
JP2006171957A (ja) * 2004-12-14 2006-06-29 Fujitsu Ltd ストレージ制御装置および方法
EP2663920B1 (fr) * 2011-01-11 2020-05-27 Hewlett-Packard Development Company, L.P. Planification de demandes simultanées
US9582355B2 (en) * 2014-07-09 2017-02-28 Qualcomm Incorporated Systems and methods for reliably storing data using liquid distributed storage
CN104391759B (zh) * 2014-11-11 2017-06-13 华中科技大学 一种纠删码存储中负载感知的数据归档方法
WO2016093797A1 (fr) * 2014-12-09 2016-06-16 Hitachi Data Systems Corporation Système et procédé permettant de fournir un stockage par bloc à attribution à la demande avec plusieurs classes de protection de données
CN104881370B (zh) * 2015-05-11 2018-01-12 中国人民解放军国防科学技术大学 协同使用纠删码和纠错码的可靠闪存存储系统构建方法
CN104935481B (zh) * 2015-06-24 2018-03-09 华中科技大学 一种分布式存储下基于冗余机制的数据恢复方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130173996A1 (en) * 2011-12-30 2013-07-04 Michael H. Anderson Accelerated erasure coding system and method
CN103577274A (zh) * 2012-07-31 2014-02-12 国际商业机器公司 管理存储器阵列的方法和装置
US20140208022A1 (en) * 2013-01-21 2014-07-24 Kaminario Technologies Ltd. Raid erasure code applied to partitioned stripe
CN103207761A (zh) * 2013-04-17 2013-07-17 浪潮(北京)电子信息产业有限公司 一种raid5系统热备盘的数据备份方法和重构方法
CN103955343A (zh) * 2014-04-16 2014-07-30 华中科技大学 一种基于i/o流水线的失效节点数据重构优化方法

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874284B (zh) * 2018-09-03 2024-03-22 阿里巴巴集团控股有限公司 数据处理的方法和装置
CN110874284A (zh) * 2018-09-03 2020-03-10 阿里巴巴集团控股有限公司 数据处理的方法和装置
CN109213637A (zh) * 2018-11-09 2019-01-15 浪潮电子信息产业股份有限公司 分布式文件系统集群节点的数据恢复方法、装置及介质
CN110597655A (zh) * 2019-06-26 2019-12-20 中大编码有限公司 一种迁移与基于纠删码的重构相耦合的快速预知修复方法和实现
CN110597655B (zh) * 2019-06-26 2023-04-28 云链网科技(广东)有限公司 迁移和基于纠删码的重构相耦合快速预知修复方法及装置
CN110568993B (zh) * 2019-08-06 2022-04-12 新华三技术有限公司成都分公司 一种数据更新方法及相关装置
CN110568993A (zh) * 2019-08-06 2019-12-13 新华三技术有限公司成都分公司 一种数据更新方法及相关装置
CN111400083A (zh) * 2020-03-17 2020-07-10 上海七牛信息技术有限公司 数据存储方法及系统、存储介质
CN111400083B (zh) * 2020-03-17 2024-02-23 上海七牛信息技术有限公司 数据存储方法及系统、存储介质
CN111581020B (zh) * 2020-04-22 2024-03-19 上海天玑科技股份有限公司 一种分布式块存储系统中数据恢复的方法和装置
CN111581020A (zh) * 2020-04-22 2020-08-25 上海天玑科技股份有限公司 一种分布式块存储系统中数据恢复的方法和装置
CN111625394A (zh) * 2020-05-27 2020-09-04 成都信息工程大学 基于纠删码的数据恢复方法、装置、设备及存储介质
WO2021260538A1 (fr) * 2020-06-24 2021-12-30 International Business Machines Corporation Codage d'id de bloc dans un système de stockage à code d'effacement
US11182249B1 (en) 2020-06-24 2021-11-23 International Business Machines Corporation Block ID encoding in an erasure coded storage system
CN113190384B (zh) * 2021-05-21 2022-07-22 重庆紫光华山智安科技有限公司 基于纠删码的数据恢复控制方法、装置、设备及介质
CN113190384A (zh) * 2021-05-21 2021-07-30 重庆紫光华山智安科技有限公司 基于纠删码的数据恢复控制方法、装置、设备及介质
CN114415970A (zh) * 2022-03-25 2022-04-29 北京金山云网络技术有限公司 分布式存储系统的磁盘故障处理方法、装置及服务器
CN115480709A (zh) * 2022-10-14 2022-12-16 济南浪潮数据技术有限公司 存储池的重构速度控制方法及相关组件
CN115858250A (zh) * 2022-12-30 2023-03-28 浙江大华技术股份有限公司 数据恢复方法、装置、存储介质及电子装置
CN118821212A (zh) * 2024-07-09 2024-10-22 上海飞斯信息科技有限公司 一种基于分布式存储的加密方法及系统
CN120045144A (zh) * 2025-04-27 2025-05-27 湖北科惠通科技有限公司 一种超大规模科创数据的分布式存储与管理方法、系统及设备

Also Published As

Publication number Publication date
CN107544862A (zh) 2018-01-05
CN107544862B (zh) 2022-03-25

Similar Documents

Publication Publication Date Title
WO2018001110A1 (fr) Procédé et dispositif de restauration de données stockées sur la base d'un codage à effacement, et noeud de stockage
JP6538780B2 (ja) 分散型データベースシステムのシステム全体のチェックポイント回避
US10536167B2 (en) Matrix-based error correction and erasure code methods and system and applications thereof
US10127233B2 (en) Data processing method and device in distributed file storage system
US9971823B2 (en) Dynamic replica failure detection and healing
US9846540B1 (en) Data durability using un-encoded copies and encoded combinations
JP6404907B2 (ja) 効率的な読み取り用レプリカ
US9053166B2 (en) Dynamically varying the number of database replicas
US20170270004A1 (en) Transmission time refinement in a storage system
US20170206140A1 (en) System and method for building a point-in-time snapshot of an eventually-consistent data store
CN109407977B (zh) 一种大数据分布式存储管理方法及系统
US20150201036A1 (en) Gateway device, file server system, and file distribution method
US9185188B1 (en) Method and system for determining optimal time period for data movement from source storage to target storage
US10372504B2 (en) Global usage tracking and quota enforcement in a distributed computing system
US20140136571A1 (en) System and Method for Optimizing Data Storage in a Distributed Data Storage Environment
US10740198B2 (en) Parallel partial repair of storage
US10223184B1 (en) Individual write quorums for a log-structured distributed storage system
US11106541B2 (en) System and method for replicating data in distributed database systems
KR101983208B1 (ko) 데이터 관리 방법, 노드, 그리고 데이터베이스 클러스터를 위한 시스템
CN107346270B (zh) 基于实时计算的基数估计的方法和系统
CN110825698A (zh) 元数据管理方法及相关装置
US20160139996A1 (en) Methods for providing unified storage for backup and disaster recovery and devices thereof
US20170315869A1 (en) Fault-tolerant Enterprise Object Storage System for Small Objects
JP6671708B2 (ja) バックアップリストアシステム及びバックアップリストア方法
US10083121B2 (en) Storage system and storage method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17819107

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17819107

Country of ref document: EP

Kind code of ref document: A1