WO2018019119A1

WO2018019119A1 - Method and device for dynamic partial-parallel data layout for continuous data storage

Info

Publication number: WO2018019119A1
Application number: PCT/CN2017/092403
Authority: WO
Inventors: 谭毓安; 孙志卓; 李元章; 于潇; 薛源; 张全新
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2016-07-26
Filing date: 2017-07-10
Publication date: 2018-02-01
Anticipated expiration: 2019-01-26
Also published as: CN106293511A; CN106293511B

Abstract

A method and device for a dynamic partial-parallel data layout (DPPDL) for continuous data storage being applicable in the storage of continuous data. The method is specifically realized by the technical solutions of stripe dividing, storage space dynamic mapping, access competition avoidance and performance requirement sensing, and comprises: firstly acquiring a performance requirement of a storage application via performance requirement sensing, subsequently executing a storage space dynamic mapping mechanism, dynamically allocating a storage space having an appropriate level of hard disk parallelism according to the performance requirement, and lastly optimizing the storage space dynamic mapping mechanism via access competition avoidance, wherein, the storage space dynamic mapping is a core component, and the performance requirement sensing is the basis of the storage space dynamic mapping, stripe dividing is a prerequisite for the storage space dynamic mapping, and access competition avoidance optimizes and completes the storage space dynamic mapping. The DPPDL utilizes a dynamic partial-parallel strategy, and dynamically allocates storage space having an appropriate level of parallelism according to the performance requirements of different loads, thus guaranteeing power-saving when the majority of the hard disks are on standby for a long time, and dynamically providing appropriate partial parallelism, thereby providing better usability and higher power-saving efficiency.

Description

Dynamic local parallel data layout method and device for continuous data storage

Technical field

本发明涉及独立硬盘冗余阵列技术领域，尤其涉及一种面向连续数据存储的动态局部并行数据布局方法及装置。The present invention relates to the field of redundant array technology for independent hard disks, and in particular, to a dynamic local parallel data layout method and apparatus for continuous data storage.

Background technique

近年来，视频监控、备份、归档等相关技术应用十分广泛，以视频监控为例，由于视频监控在取证与识别方面具有不可替代的作用，已成为现代社会无处不在的安防设施。该类应用需要海量存储空间，主要执行写操作，以顺序访问为主，对随机性能要求不高，称该类存储系统为连续数据存储系统。In recent years, video surveillance, backup, archiving and other related technologies have been widely used. Taking video surveillance as an example, video surveillance has become an ubiquitous security facility in modern society because it plays an irreplaceable role in forensics and identification. This kind of application requires a large amount of storage space, mainly performs write operations, mainly sequential access, and has low requirements on random performance. The storage system is called a continuous data storage system.

在海量数据存储中，为了满足存储系统性能、容量和可靠性需求，人们提出了各种类型的独立硬盘冗余阵列(Redundant Arrays of Independent Disks，RAID)。RAID把多个物理存储设备，如磁盘、固态盘(Solid State Disk，SSD)等联合起来，形成统一的逻辑存储设备，可提供更大的容量、更高的性能、更可靠的数据安全性。In the massive data storage, in order to meet the performance, capacity and reliability requirements of the storage system, various types of Redundant Arrays of Independent Disks (RAID) have been proposed. RAID combines multiple physical storage devices, such as disks and Solid State Disks (SSDs), to form a unified logical storage device that provides greater capacity, higher performance, and more reliable data security.

RAID中常用的技术术语如下：The technical terms commonly used in RAID are as follows:

条带化：把一段连续数据分割成相同大小的数据块，把每块数据分别写入RAID中不同盘上的方法。Striping: A method of dividing a piece of continuous data into blocks of the same size and writing each piece of data to different disks in RAID.

容错：利用某种运算，如异或运算，生成冗余的校验数据并保存。当硬盘出现故障丢失数据时，可利用校验数据进行数据恢复。异或运算通常用“”表示。Fault Tolerance: Generates redundant verification data and saves it using some kind of operation, such as XOR operation. When the hard disk fails and data is lost, the verification data can be used for data recovery. XOR operations are usually indicated by "".

分布校验：校验数据按一定规律分布在构成RAID的各个盘上。Distribution check: The check data is distributed on each disk constituting the RAID according to a certain rule.

局部并行：阵列中仅部分硬盘并行，而不是全部硬盘并行，能够提供合适的性能，并且便于调度其余硬盘待机节能。Partial Parallelism: Only some of the hard disks in the array are parallel, rather than all hard disks in parallel, providing the right performance and facilitating the scheduling of the remaining hard drives.

常用的RAID有RAID0、RAID5等。RAID0只进行条带化，不具有冗余校验能力。RAID5以条带的方式向阵列中的硬盘写数据，校验数据分布存储在阵列中的各个盘上，通过全局并行提高访问速度，具有单盘容错能力。Commonly used RAIDs are RAID0, RAID5, and so on. RAID0 is only striped, and does not have redundancy check capability. RAID 5 writes data to the hard disks in the array in a stripe manner. The verification data is distributed and stored on each disk in the array. The access speed is improved by global parallelism, and the single disk is fault tolerant.

连续数据存储系统以顺序访问为主，对随机性能要求不高，一般不需要全局并行提供的高性能。为此，发明专利ZL201010256899.5、ZL201010256665.0、ZL201010256711.7、ZL201010256908.0、ZL201010256679.2、ZL201010256699.X、ZL201010575578.1、ZL201010575625.2、ZL201010575611.0等提出多种局部并行数据布局，把采用该类局部并行数据布局的节能RAID统称为S-RAID。 Continuous data storage systems are based on sequential access, which does not require random performance and generally does not require high performance provided by global parallelism. To this end, the invention patents ZL201010256899.5, ZL201010256665.0, ZL201010256711.7, ZL201010256908.0, ZL201010256679.2, ZL201010256699.X, ZL201010575578.1, ZL201010575625.2, ZL201010575611.0, etc. propose a variety of local parallel data layout, Energy-efficient RAIDs using this type of locally parallel data layout are collectively referred to as S-RAID.

S-RAID基本思想是：①把阵列中的存储区分成若干组，组内并行提供合适的性能，分组便于调度部分硬盘运行而其余硬盘待机节能；②采用贪婪编址法，在顺序访问模式下，保证读写操作在较长时间内分布在部分确定的硬盘上，其它硬盘可以长时间待机节能。The basic idea of S-RAID is: 1 to divide the storage in the array into several groups, and provide appropriate performance in parallel within the group. The grouping is convenient for scheduling some hard disks to run while the remaining hard disks are standby energy-saving; 2 adopting greedy addressing method in sequential access mode To ensure that the read and write operations are distributed over a certain period of time on a certain hard disk, other hard disks can be used for a long time to save energy.

S-RAID的数据布局采用存储空间静态映射机制，即：在创建时，根据磁盘块数、S-RAID类型、Strip大小等参数，建立逻辑块地址(Logical Block Address，LBA)与硬盘物理块地址(Physical Block Address，PBA)的映射关系；此映射关系在S-RAID整个生命周期内保持不变。The data layout of the S-RAID adopts the static mapping mechanism of the storage space, that is, when creating, the logical block address (LBA) and the physical block address of the hard disk are established according to parameters such as the number of disk blocks, the S-RAID type, and the strip size. (Physical Block Address, PBA) mapping; this mapping remains unchanged throughout the life of the S-RAID.

然而，S-RAID的静态数据布局适合比较平稳的工作负载，不能根据波动负载、突发负载的性能需求动态调整局部并行度；对于波动负载、突发负载，S-RAID需要根据峰值负载的性能需求确定局部并行度，但该并行度对于谷值负载显然是过剩的。这种性能过剩会导致额外能耗，并且随着波动负载、突发负载强度的增大而显著增加。However, the static data layout of S-RAID is suitable for a relatively stable workload, and the local parallelism cannot be dynamically adjusted according to the performance requirements of fluctuating load and burst load. For fluctuating load and burst load, S-RAID needs performance according to peak load. The demand determines the degree of local parallelism, but the degree of parallelism is clearly excessive for the valley load. This excess performance leads to additional energy consumption and increases significantly with fluctuating loads and burst load strength.

在连续数据存储中，较强的波动负载、突发负载普遍存在。例如在视频监控中，视频内容的动态特性会产生严重的波动负载。视频数据一般要压缩后进行传输和存储，现有视频压缩标准，如H.264/MPEG-4等，都是基于视频内容的时间、空间冗余性进行视频压缩的，视频压缩比会在很大范围内变化。白天运动物体较多，视频压缩比较小，产生的视频数据量大；夜间运动物体较少，视频压缩比较高，产生的视频数据量小。此外，视频监控中各摄像机的工作时间、分辨率不同时，也会产生较高强度的波动负载。In continuous data storage, strong fluctuation loads and burst loads are common. For example, in video surveillance, the dynamic nature of video content can create severe fluctuating loads. Video data is generally compressed and transmitted and stored. Existing video compression standards, such as H.264/MPEG-4, are based on the temporal and spatial redundancy of video content for video compression. The video compression ratio will be very high. Change within a wide range. There are more moving objects during the day, the video compression is relatively small, and the amount of video data generated is large; there are fewer moving objects at night, the video compression is relatively high, and the amount of generated video data is small. In addition, when the working time and resolution of each camera in video surveillance are different, a high-intensity fluctuation load is also generated.

对于较强的波动、突发负载，采用缓存措施是不可行的。例如在视频监控中，负载不仅波动幅度大，而且波动周期足够长，需要大容量缓存设备。磁盘缓存不仅增加硬件成本，还会引入额外功耗；SSD缓存虽然功耗低，但大量使用会显著增加成本。深度缓存还会极大增加数据丢失的概率，缓存设备通常没有容错机制，而为缓存设备增加容错机制，又将进一步增加硬件成本和功耗。For strong fluctuations and sudden load, it is not feasible to adopt caching measures. For example, in video surveillance, the load not only has a large fluctuation range, but also the fluctuation period is long enough to require a large-capacity cache device. Disk caching not only increases hardware costs, but also introduces additional power consumption; while SSD caching is low in power consumption, heavy use can add significant cost. Deep cache also greatly increases the probability of data loss. Cache devices usually have no fault tolerance mechanism, and add fault tolerance to cache devices, which will further increase hardware cost and power consumption.

为此，提出一种面向连续数据存储的动态局部并行数据布局(Dynamic Partial-Parallel Data Layout，DPPDL)，DPPDL采用动态局部并行策略，根据不同负载的性能需求，动态分配具有合适并行度的存储空间。DPPDL既可保证多数硬盘长时间待机节能，又能动态提供合适的局部并行度，具有更高的可用性，以及更高的节能效率。To this end, a Dynamic Partial-Parallel Data Layout (DPPDL) for continuous data storage is proposed. DPPDL adopts dynamic local parallel strategy to dynamically allocate storage space with appropriate parallelism according to the performance requirements of different loads. . DPPDL not only guarantees long-term standby energy saving for most hard disks, but also dynamically provides appropriate local parallelism, higher availability, and higher energy efficiency.

发明内容Summary of the invention

本发明的目的是针对已有静态局部并行数据布局不能更好适应波动负载、突发负载的不足，为了提高存储系统的节能效率，提出一种面向连续数据存储的动态局部并行数据布局方法及装置。The object of the present invention is to solve the problem that the existing static partial parallel data layout can not be better adapted to the fluctuating load and the sudden load. In order to improve the energy-saving efficiency of the storage system, a dynamic localization oriented to continuous data storage is proposed. Row data layout method and device.

一种面向连续数据存储的动态局部并行数据布局方法(DPPDL)，具体通过下述技术方案实现：A dynamic local parallel data layout method (DPPDL) for continuous data storage is realized by the following technical solutions:

包括条带划分、存储空间动态映射、访问竞争避让以及性能需求感知；首先通过性能需求感知获得存储应用的性能需求，然后执行存储空间动态映射机制，根据性能需求动态分配具有合适硬盘并行度的存储空间，最后通过访问竞争避让优化该存储空间动态映射机制；其中，存储空间动态映射是核心，性能需求感知是存储空间动态映射的依据，条带划分是存储空间动态映射的前提，而访问竞争避让是存储空间动态映射的优化与完备；Including stripe partitioning, storage space dynamic mapping, access competition avoidance, and performance demand awareness; firstly, the performance requirements of the storage application are obtained through performance requirement perception, and then the storage space dynamic mapping mechanism is executed, and the storage with the appropriate hard disk parallelism is dynamically allocated according to the performance requirement. Space, finally optimize the storage space dynamic mapping mechanism by accessing competition avoidance; among them, storage space dynamic mapping is the core, performance requirement perception is the basis of storage space dynamic mapping, stripe partitioning is the premise of storage space dynamic mapping, and access competition avoidance It is the optimization and completeness of the dynamic mapping of storage space;

所述条带划分，具体步骤为：The strip is divided, and the specific steps are as follows:

步骤1.1将N块硬盘中的每块硬盘平均分成1×N个存储块；其中，1大于等于1，N大于等于3；Step 1.1 divides each hard disk in the N hard disks into 1×N storage blocks; wherein, 1 is greater than or equal to 1, and N is greater than or equal to 3;

步骤1.1中每块硬盘内的起始地址相同的N个存储块组成一个条带，共组成1×N个条带；每个条带包含1个校验存储块，N-1个数据存储块，校验存储块简称校验块，数据存储块简称数据块；In step 1.1, the N storage blocks with the same starting address in each hard disk form a stripe, which constitutes 1×N strips; each strip contains 1 parity storage block, and N-1 data storage blocks. , the check storage block is referred to as a check block, and the data storage block is referred to as a data block;

其中，条带i中的校验块位于硬盘N-1-j中；若j+v＜N-1，则第v个数据块位于硬盘v，否则位于硬盘v+1，其中，0≤i＜(1×N)，j＝i MOD N(MOD为模运算)，0≤v＜N-1；Wherein, the check block in the strip i is located in the hard disk N-1-j; if j+v<N-1, the vth data block is located on the hard disk v, otherwise it is located on the hard disk v+1, wherein 0≤i <(1×N), j=i MOD N (MOD is a modulo operation), 0≤v<N-1;

步骤1.2划分步骤1.1中的每个数据块和校验块为M个大小相等的子块，每个子块包含若干个地址连续的存储区，分别称为数据子块Strip和校验子块PStrip；Step 1.2: Each data block and the check block in step 1.1 are M equal-sized sub-blocks, and each sub-block includes a plurality of consecutive storage areas, which are respectively called a data sub-block strip and a parity sub-block PStrip;

步骤1.3步骤1.1中每个条带中的盘内起始地址相同的子块组成一个子条带Stripe，再将该子条带内的Strip进行异或运算，生成该子条带内的PStrip；Step 1.3 in step 1.1, each sub-block with the same starting address in the strip is formed into a sub-strip stripe, and then the strip in the sub-strip is XORed to generate a PStrip in the sub-strip;

其中，每个条带中包含M个大小相同的子条带；子条带Stripe m的校验子块PStrip m由其N-1个数据子块Strip m异或生成，见式(1)，；Wherein each strip includes M sub-strips of the same size; the sub-band PStrip m of the strip stripe stripe is generated by X X of its N-1 data sub-blocks, see equation (1), ;

所述存储空间动态映射，具体为：The storage space dynamic mapping is specifically:

采用逻辑区块地址与物理区块地址的动态映射机制对磁盘阵列存储空间进行分配管理，磁盘阵列层收到的写数据可以被动态映射到不同数量的硬盘上；即：根据负载的性能需求参数k，动态分配具有k个硬盘并行度的存储空间，即k为需要并发写数据的硬盘数量，不包括所写数据的校验数据所在的硬盘；当负载最小时，仅映射到1块硬盘上，仅向该硬盘写数据；负载最大时映射到不包括其校验数据所在硬盘的其余硬盘上，向不包括其校验数据所在硬盘的其余硬盘并发写数据；The disk array storage space is allocated and managed by the dynamic mapping mechanism of the logical block address and the physical block address, and the write data received by the disk array layer can be dynamically mapped to different numbers of hard disks; that is, according to the performance requirement parameters of the load k, dynamically allocate storage space with k hard disk parallelism, that is, k is the number of hard disks that need to write data concurrently, and does not include the hard disk where the verification data of the written data is located; when the load is minimum, only maps to one hard disk Write data only to the hard disk; when the load is maximum, it maps to the remaining hard disks of the hard disk that does not include the check data, and writes data to the remaining hard disks of the hard disk that does not include the check data;

存储空间动态映射，具体步骤为： Dynamic mapping of storage space, the specific steps are:

(1)在CurBank中选择自由Strip最多的Stripe作为CurStripe；(1) Select the Stripe with the most free strips in CurBank as CurStripe;

其中，所述的自由Strip为未进行映射的Strip；条带链表是由所有条带组成的一个单向循环链表，CurBank为当前进行映射的条带，初始值为条带0；CurStripe为CurBank中可进行映射的Stripe；The free strip is an unmapped strip; the strip list is a one-way circular list consisting of all strips, CurBank is the currently mapped strip, the initial value is strip 0; CurStripe is CurBank a stripe that can be mapped;

(2)如果CurStripe中自由Strip数为0，表明CurStripe无自由Strip可映射，等价于CurBank无自由Strip可映射，转到(3)，否则转到(5)；(2) If the number of free strips in CurStripe is 0, it means that CurStripe has no free strip mapable, equivalent to CurBank without free strip mapable, go to (3), otherwise go to (5);

(3)判断NextBank是否有自由Strip可映射，如果没有自由Strip可映射，则删除NextBank上的存储数据，进行存储空间回收；其中，NextBank为下一个进行映射的条带，与CurBank编号相邻，初始值为条带1；(3) Determine whether NextBank has free strip mapability. If there is no free strip mappable, delete the storage data on NextBank and perform storage space recovery. Among them, NextBank is the next stripe to be mapped, adjacent to the CurBank number. The initial value is strip 1;

(4)将NextBank作为CurBank并重新获取CurStripe，然后将NextBank顺序后移；(4) Take NextBank as CurBank and re-acquire CurStripe, then move the NextBank back in order;

(5)如果CurStripe中自由Strip数不小于k，则从CurStripe中顺序取出k个Strip，转到(7)，否则转到(6)；(5) If the number of free strips in CurStripe is not less than k, then sequentially extract k strips from CurStripe, go to (7), otherwise go to (6);

(6)先从CurStripe中取出所有自由Strip，再从NextStripe取出余下所需的自由Strip，一起组成k个自由Strip，如果NextStripe没有足够的自由Strip，就删除NextBank上的存储数据，回收存储空间，并重新获取NextStripe；其中，NextStripe为NextBank中可进行映射的Stripe；(6) First take all the free strips from CurStripe, and then take the remaining free strips from NextStripe to form k free strips. If NextStripe does not have enough free strips, delete the stored data on NextBank and recycle the storage space. And re-acquire NextStripe; where, NextStripe is the Stripe that can be mapped in NextBank;

(7)获得k个自由Strip后，进行存储空间映射，把逻辑地址空间映射到具有k个硬盘并行度的物理地址空间；并将映射关系记录在映射表中；(7) after obtaining k free strips, performing storage space mapping, mapping the logical address space to a physical address space having k hard disk parallelism; and recording the mapping relationship in the mapping table;

根据Strip所在的条带、子条带、以及子条带内的编号，确定该Strip所在磁盘及盘内偏移量，并将其记录到映射表中；读操作时根据映射表获得数据在磁盘上的位置；映射表作为元数据的重要组成部分，保存在每个正在工作的磁盘的尾部，内带一个版本编号，版本编号按时间先后由小到大，磁盘阵列在断电恢复时装入最大编号的版本；Determine the offset of the disk and disk in the strip according to the strip, the sub-strip, and the number in the sub-strip where the strip is located, and record it in the mapping table. When reading, obtain data on the disk according to the mapping table. The location; the mapping table is an important part of the metadata, stored at the end of each working disk, with a version number, the version number is from small to large, and the disk array is loaded the most when the power is restored. Numbered version

所述访问竞争避让，具体步骤为：The access competition avoids, the specific steps are:

1)DPPDL选择跨越2个条带的Strip进行存储空间映射时，首先从CurStripe中取出所有自由Strip，再从NextStripe取出余下所需的自由Strip，一起组成k+1个自由Strip，如果NextStripe没有足够的自由Strip，就删除NextBank上的存储数据，回收存储空间，并重新获取NextStripe；1) When DPPDL selects a strip across 2 stripes for storage space mapping, first take all free strips from CurStripe, and then take the remaining free strips from NextStripe to form k+1 free strips together, if NextStripe is not enough Free Strip, delete the stored data on NextBank, reclaim storage space, and regain NextStripe;

2)DPPDL进行访问竞争检查，若没有访问竞争或末尾Strip引起访问竞争，则删除末尾Strip；否则，删除引起访问竞争的Strip。最后获得不会引发访问竞争且可并发访问的k个Strip；2) DPPDL performs the access competition check. If there is no access competition or the end strip causes the access competition, the last strip is deleted; otherwise, the strip causing the access competition is deleted. Finally, obtain k strips that do not cause access competition and can be accessed concurrently;

其中，所述的访问竞争检查，指同一硬盘是否有2个子块被并发访问； The access competition check refers to whether two sub-blocks of the same hard disk are concurrently accessed;

访问竞争避让可用于替换存储空间动态映射中的步骤(6)；The access competition avoidance can be used to replace the step (6) in the storage space dynamic mapping;

性能需求感知，具体为：Performance demand awareness, specifically:

连续数据存储应用，需要稳定的数据传输率，因此DPPDL把数据传输率作为性能需求指标。Continuous data storage applications require a stable data transfer rate, so DPPDL uses the data transfer rate as a performance requirement indicator.

为了在线感知负载的性能需求，即数据传输率需求，In order to sense the performance requirements of the load online, that is, the data transmission rate requirement,

步骤A.DPPDL统计磁盘阵列层I/O请求队列的历史信息；Step A. DPPDL statistics history information of the disk array layer I/O request queue;

步骤B.DPPDL进行分析预测；Step B. DPPDL performs analysis and prediction;

对于连续数据存储应用，负载的波动周期或突发时间一般较大，可根据时间窗口T内的平均数据传输率来感知负载需求的数据传输率，用rn＝(ta，pos，len)记录时间窗口T内的第n个I/O请求；其中，ta、pos、len分别为请求rn的到来时间、起始逻辑地址和请求长度，用rn.len表示rn的请求长度len；For continuous data storage applications, the fluctuation period or burst time of the load is generally large, and the data transmission rate of the load demand can be perceived according to the average data transmission rate in the time window T, and the time is recorded by rn=(ta, pos, len). The nth I/O request in the window T; wherein, ta, pos, and len are the arrival time, the starting logical address, and the request length of the request rn, respectively, and the request length len of rn is represented by rn.len;

感知负载需求的数据传输率：Data transfer rate for perceived load demand:

其中，num为时间窗口T内到来的I/O数；β为性能系数，可在1.2～1.5之间取值；Where num is the number of I/Os coming in the time window T; β is the coefficient of performance, which can be between 1.2 and 1.5;

5s＜T＜15s；5s<T<15s;

表示对时间窗口T内的num个I/O请求进行长度求和；Representing the length summation of num I/O requests in the time window T;

其中，公式(2)中的I/O请求rn来自RAID层的请求队列Where the I/O request rn in equation (2) comes from the request queue of the RAID layer.

DPPDL感知到负载需求的数据传输率之后，根据实际应用场景中不同硬盘并行度能够提供的数据传输率，确定需要并发写数据的硬盘数量kAfter DPPDL senses the data transmission rate of the load demand, it determines the number of hard disks that need to concurrently write data according to the data transmission rate that can be provided by different hard disk parallelism in the actual application scenario.

Beneficial effect

本发明提出的一种面向连续数据存储的动态局部并行数据布局，与已有技术比较，具有以下优点：The dynamic local parallel data layout oriented by continuous data storage proposed by the invention has the following advantages compared with the prior art:

1.可提供大弹性并行度，具有更高的节能效率，具体为：1. It can provide large elastic parallelism and has higher energy saving efficiency, specifically:

DPPDL采用动态局部并行策略，根据不同负载的性能需求，动态分配具有合适并行度的存储空间。DPPDL既可保证多数磁盘长时间待机节能，又能动态提供合适的局部并行度，具有更高的可用性，以及更高的节能效率；DPPDL uses a dynamic local parallel strategy to dynamically allocate storage space with appropriate parallelism according to the performance requirements of different workloads. DPPDL not only guarantees long-term standby energy saving for most disks, but also dynamically provides appropriate local parallelism, higher availability, and higher energy efficiency;

2.解决了动态局部并行与顺序删除特性之间的矛盾，具体为：2. Solved the contradiction between dynamic local parallelism and sequential deletion features, specifically:

对于连续数据存储系统，当存储空间写满后，一般按时间删除最早的存储数据、然后写入新数据，称为顺序删除特性。动态局部并行与顺序删除特性之间存在矛盾，例如当存储空间写满后，如果当前负载需要5块磁盘并行，但最早数据都存储在2块局部并行的磁盘上，此时无法在顺序删除数据的条件下，再增加3块磁盘并行；For a continuous data storage system, when the storage space is full, the oldest stored data is generally deleted by time, and then new data is written, which is called a sequential deletion feature. There is a contradiction between the dynamic local parallelism and the sequential deletion feature. For example, when the storage space is full, if the current load requires 5 disks in parallel, but the oldest data is stored on 2 partially parallel disks, the data cannot be deleted sequentially. Under the conditions, add 3 more disks in parallel;

DPPDL在宏观上按条带依次进行存储空间的分配与回收，当条带数较大时(对于大容量磁盘是可行的)，可保证基本按时间顺序删除数据。微观上以Strip为映射单元，根据性能需求在Stripe上选择数量合适的Strip并行，动态提供合适的并行度，因此解决了动态局部并行与顺序删除特性之间的矛盾。DPPDL sequentially allocates and reclaims storage space in strips, when the number of stripes is large (for Large-capacity disks are possible) to ensure that data is deleted in a substantially chronological order. Microscopically, Strip is used as the mapping unit, and the appropriate amount of Strip parallel is selected on Stripe according to performance requirements, and the appropriate parallelism is dynamically provided, thus solving the contradiction between dynamic local parallelism and sequential deletion characteristics.

DRAWINGS

被结合在说明书中并构成说明书的一部分的附图示出了本发明的实施例，并且连同其说明一起用于解释本发明的原理。The accompanying drawings, which are incorporated in FIG

图1为本发明实施例一种面向连续数据存储的动态局部并行数据布局方法的步骤流程图；1 is a flow chart showing the steps of a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention;

图2为本发明实施例一种面向连续数据存储的动态局部并行数据布局方法中的条带划分的总体示意图；2 is a schematic diagram of strip partitioning in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention;

图3为本发明实施例一种面向连续数据存储的动态局部并行数据布局方法中的条带划分的细分示意图；3 is a schematic diagram of subdivision of strip partitioning in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention;

图4为本发明实施例一种面向连续数据存储的动态局部并行数据布局方法中的存储空间动态映射示意图；4 is a schematic diagram of dynamic mapping of a storage space in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention;

图5为本发明实施例一种面向连续数据存储的动态局部并行数据布局方法中的访问竞争产生示意图；FIG. 5 is a schematic diagram of access competition generation in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention; FIG.

图6为本发明实施例一种面向连续数据存储的动态局部并行数据布局方法中的访问竞争避让示意图；6 is a schematic diagram of access competition avoidance in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention;

图7为本发明实施例一种面向连续数据存储的动态局部并行数据布局方法中的访问竞争避让后的存储空间动态映射示意图；FIG. 7 is a schematic diagram of dynamic mapping of storage space after access competition avoidance in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention; FIG.

图8为本发明实施例一种面向连续数据存储的动态局部并行数据布局装置的结构示意图。FIG. 8 is a schematic structural diagram of a dynamic local parallel data layout apparatus for continuous data storage according to an embodiment of the present invention.

detailed description

为使本发明的上述目的、特征和优点能够更加明显易懂，下面结合附图和具体实施例对本发明进行详细说明。The present invention will be described in detail below with reference to the drawings and specific embodiments.

如图1所示，本发明实施例的面向连续数据存储的动态局部并行数据布局方法，具体可以包括如下步骤：As shown in FIG. 1 , the dynamic local parallel data layout method for continuous data storage in the embodiment of the present invention may specifically include the following steps:

步骤101：条带划分，将N块硬盘中的每块硬盘平均分成1×N个存储块；Step 101: Striping, dividing each hard disk in the N hard disks into 1×N storage blocks equally;

步骤102：性能需求感知，获得存储应用的性能需求；Step 102: Perceive performance requirements and obtain performance requirements of the storage application;

步骤103：判断是否存在访问竞争，如果是，则执行步骤104，否则执行步骤105；Step 103: Determine whether there is access competition, if yes, proceed to step 104, otherwise perform step 105;

步骤104：访问竞争避让，优化该存储空间动态映射机制； Step 104: Accessing the competition avoidance and optimizing the dynamic mapping mechanism of the storage space;

步骤105：采用动态映射机制对存储空间进行分配管理。Step 105: Perform allocation management on the storage space by using a dynamic mapping mechanism.

具体通过下述技术方案实现：Specifically achieved through the following technical solutions:

包括条带划分、存储空间动态映射、访问竞争避让以及性能需求感知；首先通过性能需求感知获得存储应用的性能需求，然后执行存储空间动态映射机制，根据性能需求动态分配具有合适硬盘并行度的存储空间，最后通过访问竞争避让优化该存储空间动态映射机制；Including stripe partitioning, storage space dynamic mapping, access competition avoidance, and performance demand awareness; firstly, the performance requirements of the storage application are obtained through performance requirement perception, and then the storage space dynamic mapping mechanism is executed, and the storage with the appropriate hard disk parallelism is dynamically allocated according to the performance requirement. Space, and finally optimize the storage space dynamic mapping mechanism by accessing the competition avoidance;

其中，存储空间动态映射是核心，性能需求感知是存储空间动态映射的依据，条带划分是存储空间动态映射的前提，而访问竞争避让是存储空间动态映射的优化与完备。Among them, storage space dynamic mapping is the core, performance requirement perception is the basis of storage space dynamic mapping, stripe partitioning is the premise of storage space dynamic mapping, and access competition avoidance is the optimization and completeness of storage space dynamic mapping.

本发明实施例是在5块磁盘上构建动态局部并行数据布局DPPDL的基本过程，包括条带划分、存储空间动态映射、访问竞争避让、性能需求感知四个方面的内容；其中，每个磁盘的磁盘容量为4TB(1TB＝103GB＝106MB＝109KB，1KB＝1024B)。The embodiment of the present invention is a basic process for constructing a dynamic local parallel data layout DPPDL on five disks, including strip partitioning, storage space dynamic mapping, access competition avoidance, and performance requirement sensing; wherein each disk is The disk capacity is 4TB (1TB=103GB=106MB=109KB, 1KB=1024B).

优选地，所述条带划分，具体步骤为：Preferably, the strip is divided, and the specific steps are:

步骤1.2划分步骤1.1中的每个数据块和校验块为M个大小相等的子块，每个子块包含若干个地址连续的存储区(如磁盘扇区)，分别称为数据子块(记作Strip)和校验子块(记作PStrip)；Step 1.2: Each data block and check block in step 1.1 is M equal-sized sub-blocks, and each sub-block includes a plurality of consecutive storage areas (such as disk sectors), which are respectively called data sub-blocks. Strip and check sub-blocks (denoted as PStrip);

步骤1.3步骤1.1中每个条带中的盘内起始地址相同的子块组成一个子条带(记作Stripe)，再将该子条带内的Strip进行异或运算，生成该子条带内的PStrip。In step 1.3, the sub-blocks in the strip with the same starting address in each strip form a sub-strip (recorded as stripe), and then the strip in the sub-band is XORed to generate the sub-strip. Inside the PStrip.

优选地，上述步骤1.1中每块硬盘内的起始地址相同的N个存储块组成一个条带，共组成1×N个条带；每个条带包含1个校验存储块，N-1个数据存储块，校验存储块简称校验块，数据存储块简称数据块；Preferably, in the above step 1.1, the N storage blocks having the same starting address in each hard disk form a stripe, which constitutes 1×N strips; each strip includes one parity storage block, N-1 Data storage block, check storage block referred to as check block, data storage block referred to as data block;

其中，条带i中的校验块位于硬盘N-1-j中；若j+v＜N-1，则第v个数据块位于硬盘v，否则位于硬盘v+1，其中，0≤i＜(1×N)，j＝i MOD N(MOD为模运算)，0≤v＜N-1。Wherein, the check block in the strip i is located in the hard disk N-1-j; if j+v<N-1, the vth data block is located on the hard disk v, otherwise it is located on the hard disk v+1, wherein 0≤i <(1×N), j=i MOD N (MOD is a modulo operation), 0≤v<N-1.

优选地，在步骤1.3中，每个条带中包含M个大小相同的子条带；子条带Stripe m的校验子块PStrip m由其N-1个数据子块Strip m异或生成，；Preferably, in step 1.3, each strip includes M sub-bands of the same size; the parity sub-block PStrip m of the strip stripe stripe is generated by X X of its N-1 data sub-blocks, ;

则生成方式为：Then the generation method is:

优选地，所述存储空间动态映射的操作思路为：Preferably, the operation idea of the dynamic mapping of the storage space is:

采用逻辑区块地址与物理区块地址的动态映射机制对磁盘阵列存储空间进行分配管理，磁盘阵列层收到的写数据可以被动态映射到不同数量的硬盘上；即：根据负载的性能需求参数k，动态分配具有k个硬盘并行度的存储空间，即k为需要并发写数据的硬盘数量，不包括所写数据的校验数据所在的硬盘；当负载最小时，仅映射到某1块硬盘上，仅向该硬盘写数据；负载最大时映射到除某一块1块硬盘之外的其余硬盘上，向除某一块1块硬盘之外的其余硬盘并发写数据；The disk array storage space is allocated and managed by the dynamic mapping mechanism of the logical block address and the physical block address, and the write data received by the disk array layer can be dynamically mapped to different numbers of hard disks; that is, according to the load The performance requirement parameter k dynamically allocates the storage space with k hard disk parallelism, that is, k is the number of hard disks that need to concurrently write data, and does not include the hard disk where the verification data of the written data is located; when the load is minimum, only maps to On a certain hard disk, only data is written to the hard disk; when the load is maximum, it is mapped to the remaining hard disks except one block of one hard disk, and the data is written to the remaining hard disks except one block of one hard disk;

本发明实施例中，存储空间动态映射，涉及的基本术语定义如下：In the embodiment of the present invention, the storage space dynamic mapping, the basic terms involved are defined as follows:

条带链表：由所有条带组成的一个单向循环链表；Strip linked list: a one-way circular linked list consisting of all strips;

CurBank：当前进行映射的条带，称为当前映射条带，初始值为条带0；CurBank: the current stripe to be mapped, called the current map strip, the initial value is stripe 0;

NextBank：下一个进行映射的条带，称为相邻映射条带，与CurBank编号相邻，初始值为条带1；NextBank: the next stripe to be mapped, called the adjacent map strip, adjacent to the CurBank number, the initial value is strip 1;

CurStripe：CurBank中可进行映射的Stripe；CurStripe: a route that can be mapped in CurBank;

NextStripe：NextBank中可进行映射的Stripe；NextStripe: Stripe that can be mapped in NextBank;

所述存储空间动态映射，具体步骤为：The storage space is dynamically mapped, and the specific steps are:

步骤(1)在CurBank中选择自由Strip最多的Stripe作为CurStripe；Step (1) Selecting the stripe with the largest free strip as the CurStripe in CurBank;

其中，所述自由Strip为未进行映射的Strip；Wherein the free strip is a strip that is not mapped;

步骤(2)如果CurStripe中自由Strip数为0，表明CurStripe无自由Strip可映射，则转到步骤(3)，否则转到步骤(5)；Step (2) If the number of free strips in CurStripe is 0, indicating that CurStripe has no free strip mappable, go to step (3), otherwise go to step (5);

步骤(3)判断NextBank是否有自由Strip可映射，如果没有自由Strip可映射，则删除NextBank上的存储数据，进行存储空间回收；Step (3) determines whether NextBank has a free strip mappable, and if there is no free strip mappable, delete the stored data on the NextBank to perform storage space recovery;

步骤(4)将NextBank作为CurBank并重新获取CurStripe，然后将NextBank顺序后移；Step (4) takes NextBank as CurBank and reacquires CurStripe, and then moves NextBank back;

步骤(5)如果CurStripe中自由Strip数不小于k，则从CurStripe中顺序取出k个Strip，转到步骤(7)，否则转到步骤(6)；Step (5) If the number of free strips in CurStripe is not less than k, then sequentially extract k strips from CurStripe, go to step (7), otherwise go to step (6);

步骤(6)先从CurStripe中取出所有自由Strip，再从NextStripe取出余下所需的自由Strip，一起组成k个自由Strip。如果NextStripe没有足够的自由Strip，就删除NextBank上的存储数据，回收存储空间，并重新获取NextStripe；Step (6) First take all free strips from CurStripe, and then take the remaining free strips from NextStripe to form k free strips. If NextStripe does not have enough free strips, delete the stored data on NextBank, reclaim the storage space, and re-acquire NextStripe;

步骤(7)获得k个自由Strip后，进行存储空间映射，把逻辑地址空间映射到具有k个硬盘并行度的物理地址空间，并将映射关系记录在映射表中；Step (7) obtains k free strips, performs storage space mapping, maps the logical address space to a physical address space having k hard disks parallelism, and records the mapping relationship in the mapping table;

根据Strip所在的条带、子条带、以及子条带内的编号，确定该Strip所在磁盘及盘内偏移量，并将其记录到映射表中；读操作时根据映射表获得数据在磁盘上的位置；映射表作为元数据的重要组成部分，保存在每个正在工作的磁盘的尾部，内带一个版本编号，版本编号按时间先后由小到大，磁盘阵列在断电恢复时装入最大编号的版本。Determine the offset of the disk and disk in the strip according to the strip, the sub-strip, and the number in the sub-strip where the strip is located, and record it in the mapping table. When reading, obtain data on the disk according to the mapping table. The location; the mapping table is an important part of the metadata, stored at the end of each working disk, with a version number, the version number is from small to large, and the disk array is loaded the most when the power is restored. Numbered version.

优选地，所述访问竞争避让，具体步骤为： Preferably, the access competition avoids, and the specific steps are:

步骤1)DPPDL选择跨越2个条带的Strip进行存储空间映射时，首先从CurStripe中取出所有自由Strip，再从NextStripe取出余下所需的自由Strip，一起组成k+1个自由Strip，如果NextStripe没有足够的自由Strip，就删除NextBank上的存储数据，回收存储空间，并重新获取NextStripe；Step 1) When DPPDL selects a strip across 2 strips for storage space mapping, first take all free strips from CurStripe, and then take the remaining free strips from NextStripe and form k+1 free strips together. If NextStripe does not. With enough free strips, delete the stored data on NextBank, reclaim the storage space, and re-acquire NextStripe;

步骤2)DPPDL进行访问竞争检查，若没有访问竞争或末尾Strip引起访问竞争，则删除末尾Strip；否则，删除引起访问竞争的Strip；最后获得不会引发访问竞争且可并发访问的k个Strip；Step 2) DPPDL performs an access competition check. If there is no access competition or the end strip causes the access competition, the last strip is deleted; otherwise, the strip causing the access competition is deleted; finally, k strips that do not cause the access competition and can be concurrently accessed are obtained;

其中，所述访问竞争检查是指同一硬盘是否有2个子块被并发访问。The access competition check refers to whether two sub-blocks of the same hard disk are concurrently accessed.

优选地，所述性能需求感知，具体为：Preferably, the performance requirement is perceived, specifically:

连续数据存储应用，对响应时间不十分敏感，需要稳定的数据传输率，因此将数据传输率作为性能需求指标；Continuous data storage applications are not very sensitive to response time and require a stable data transfer rate, so the data transfer rate is used as a performance requirement indicator;

所述数据传输率需求，具体为：The data transmission rate requirement is specifically:

对于连续数据存储应用，负载的波动周期或突发时间较大，可根据时间窗口T内的平均数据传输率来感知负载需求的数据传输率，用rn＝(ta，pos，len)记录时间窗口T内的第n个I/O请求；其中，ta、pos、len分别为请求rn的到来时间、起始逻辑地址和请求长度，rn.len表示rn的请求长度len；For continuous data storage applications, the fluctuation period or burst time of the load is large, and the data transmission rate of the load demand can be perceived according to the average data transmission rate in the time window T, and the time window is recorded with rn=(ta, pos, len). The nth I/O request in T; where ta, pos, and len are the arrival time, the starting logical address, and the request length of the request rn, respectively, and rn.len represents the request length len of rn;

感知负载需求的数据传输率表示为：The data transfer rate for perceived load demand is expressed as:

5s＜T＜15s；5s<T<15s;

其中，公式(2)中的I/O请求rn来自RAID层的请求队列；Wherein, the I/O request rn in formula (2) comes from the request queue of the RAID layer;

DPPDL感知到负载需求的数据传输率之后，根据实际应用场景中不同硬盘并行度能够提供的数据传输率，确定需要并发写数据的硬盘数量。After the DPPDL senses the data transmission rate of the load demand, it determines the number of hard disks that need to concurrently write data according to the data transmission rate that can be provided by different hard disk parallelism in the actual application scenario.

下面将结合说明书附图对所述条带划分、所述存储空间动态映射、所述访问竞争避让以及所述性能需求感知进行详细说明，具体地：The strip partitioning, the storage space dynamic mapping, the access competition avoidance, and the performance requirement perception are described in detail below with reference to the accompanying drawings, specifically:

1、条带划分1, strip division

图2为本实施例中的条带划分总体示意图。从图2可以看出每块磁盘平均分成5个存储块(其中，l＝1)，由于每个磁盘容量为4TB，所以每个存储块大小为4TB /5＝800GB；每个磁盘中盘内起始地址相同的5个存储块组成一个条带，共组成5个条带；从图2还可以看出，每个条带包含1个校验块和4个数据块，条带0的校验块位于磁盘4、条带1的校验块位于磁盘3、……、条带4的校验块位于磁盘0。FIG. 2 is a schematic overall view of the strip division in the embodiment. It can be seen from Figure 2 that each disk is equally divided into 5 memory blocks (where l=1). Since each disk has a capacity of 4TB, each memory block size is 4TB. /5=800GB; each of the five memory blocks in the disk with the same starting address in the disk form a stripe, which forms a total of five strips; as can be seen from Figure 2, each strip contains one check block. And 4 data blocks, the check block of stripe 0 is located on the disk 4, the check block of the stripe 1 is located on the disk 3, ..., and the check block of the stripe 4 is located on the disk 0.

进一步地，可以对上述条带划分进行细分，如图3所示。从图3中可以看出，把每个数据块、校验块划分为若干个大小相等的子块；本实施例中取子块大小为100KB(即：包含200个地址连续的扇区，扇区大小为512字节)，则每个数据块、校验块分成M＝800,0000个相等的子块，分别称为数据子块、校验子块。子条带的校验子块由该子条带的4个数据子块异或运算生成；如条带0内子条带1的校验子块，由该子条带的4个数据子块异或运算生成。Further, the above-described strip division may be subdivided as shown in FIG. As can be seen from FIG. 3, each data block and the check block are divided into a plurality of equal-sized sub-blocks; in this embodiment, the sub-block size is 100 KB (ie, a sector containing 200 consecutive addresses, a fan) The size of the area is 512 bytes, and each data block and check block is divided into M=800,0000 equal sub-blocks, which are respectively called data sub-blocks and parity sub-blocks. The parity sub-block of the sub-strip is generated by the X-OR operation of the four data sub-blocks of the sub-strip; for example, the parity sub-block of the sub-strip 1 in the strip 0, and the four data sub-blocks of the sub-strip Or the operation is generated.

2、存储空间动态映射2, storage space dynamic mapping

在进行条带划分后，DPPDL采用动态映射机制对存储空间进行分配管理。图4给出了5种写负载(A～E)在存储空间内的地址映射过程，选取其中的条带0和条带1进行说明，假设每个条带包含6个子条带(此处仅为阐述方便，本实施例中实际子条带数为M＝800,0000)，映射粒度为子块大小。After striping, DPPDL uses dynamic mapping mechanism to allocate and manage storage space. Figure 4 shows the address mapping process of five kinds of write loads (A ~ E) in the storage space, select strip 0 and strip 1 to illustrate, assuming each strip contains 6 sub strips (here only For convenience of description, the actual number of sub-bands in this embodiment is M=800, 0000), and the mapping granularity is the sub-block size.

假设负载A～E分别需要2、4、3、1、2块磁盘并行，其后数字为该负载持续的时间段编号(1～16)，如图4中的A2表示负载A在时段2内运行；负载持续的时间可各不相同。It is assumed that the loads A to E respectively need 2, 4, 3, 1, 2 disks in parallel, and the numbers thereafter are the time period number (1 to 16) of the load duration, as shown in Figure 4, A2 indicates that the load A is in the period 2 Run; the duration of the load can vary.

负载A需要2块磁盘(不包括其校验数据所在的磁盘)并行，即并发向磁盘0、磁盘1写数据，持续了3个时间段，即时间段1、时间段2、时间段3；Load A requires two disks (excluding the disk where the check data is located) in parallel, that is, concurrently writes data to disk 0 and disk 1, for three time periods, namely, time period 1, time period 2, and time period 3;

负载B需要4块磁盘(不包括其校验数据所在的磁盘)并行，即并发向磁盘0、磁盘1、磁盘2、磁盘3写数据，持续了3个时间段，即时间段4、时间段5、时间段6；Load B requires 4 disks (excluding the disk where the check data is located) in parallel, that is, concurrently writes data to disk 0, disk 1, disk 2, and disk 3 for 3 time periods, that is, time period 4, time period 5, time period 6;

负载C需要3块磁盘(不包括其校验数据所在的磁盘)并行，即并发向磁盘2、磁盘3、磁盘0写数据，持续了3个时间段，即时间段7、时间段8、时间段9；Load C requires 3 disks (excluding the disk where the check data is located) in parallel, that is, concurrently writes data to disk 2, disk 3, and disk 0 for 3 time periods, that is, time period 7, time period 8, time Paragraph 9;

负载D需要1块磁盘(不包括其校验数据所在的磁盘)并行，持续了5个时间段。在时间10、时间段11、时间段12内，向磁盘0写数据；在时间段13、时间段14内向磁盘1写数据。Load D requires 1 disk (not including the disk where the parity data is located) in parallel for 5 periods. In time 10, time period 11, time period 12, data is written to disk 0; data is written to disk 1 during time period 13, time period 14.

负载E需要2块磁盘(不包括其校验数据所在的磁盘)并行，即并发向磁盘1、磁盘2写数据，持续了2个时间段，即时间段15、时间段16。The load E needs two disks (excluding the disk where the check data is located) in parallel, that is, concurrently writes data to the disk 1, the disk 2, and lasts for two time periods, that is, the time period 15 and the time period 16.

数据优先写入当前工作磁盘，当前工作磁盘能够满足性能需求时，不需访问待机磁盘，既具有良好的局部并行性，又可动态分配具有合适并行度的存储空间。写负载最小时(如负载D)，仅向1个磁盘写数据；写负载最大时(如负载B)，向4块磁盘并发写数据。DPPDL可提供大弹性并行度，满足不同负载的性能需求，同时具有很高的节能效率。The data is preferentially written to the current working disk. When the current working disk can meet the performance requirements, it does not need to access the standby disk, and has good local parallelism and dynamically allocates storage space with appropriate parallelism. When the write load is minimum (such as load D), only write data to 1 disk; when the write load is maximum (such as load B), write data to 4 disks concurrently. DPPDL provides large elastic parallelism to meet the performance needs of different loads, and has very High energy efficiency.

3、访问竞争避让3, access to competition to avoid

DPPDL在采用上述动态映射机制对存储空间进行分配管理时，存在访问竞争问题。当并发访问来自2个条带的数据子块时，可能需要并发访问某一块磁盘，从而引发访问竞争，产生性能瓶颈。DPPDL has access competition problems when using the above dynamic mapping mechanism to allocate and manage storage space. When concurrently accessing data sub-blocks from two stripes, it may be necessary to access a certain disk concurrently, causing access competition and performance bottlenecks.

图5为本实施例中的访问竞争产生示意图。FIG. 5 is a schematic diagram of the access competition generated in the embodiment.

访问竞争避让的前提条件为：The prerequisites for accessing competition avoidance are:

当从2个条带取出k个自由Strip进行映射时，由于校验子块PStrip的存在，可能会并发访问相同的磁盘，从而引发访问竞争，产生性能瓶颈；访问竞争会严重影响存储性能，需要采取有效措施予以消除。When k free strips are taken from two strips for mapping, the same disk may be accessed concurrently due to the existence of the parity strip PStrip, causing access competition and performance bottlenecks. Access competition may seriously affect storage performance. Take effective measures to eliminate it.

从图5可以看出，负载C在时间段8(C8)内运行时，并发访问来自条带0、条带1内的数据子块。由于需要生产校验数据，在条带0中需要并发访问磁盘2、磁盘3、磁盘4；在条带1中需要并发访问磁盘0、磁盘3。此时，磁盘3需要被并发访问，其负载大约是磁盘2、磁盘4、磁盘0的2倍，因此成为性能瓶颈。时间段7(C7)、时间段9(C9)也存在访问竞争问题。As can be seen from FIG. 5, when the load C is running in the time period 8 (C8), the data sub-blocks from the strip 0 and the strip 1 are concurrently accessed. Due to the need to produce verification data, it is necessary to access disk 2, disk 3, and disk 4 concurrently in strip 0; in strip 1, concurrent access to disk 0 and disk 3 is required. At this time, the disk 3 needs to be accessed concurrently, and its load is about twice that of the disk 2, the disk 4, and the disk 0, thus becoming a performance bottleneck. There are also access competition issues in time period 7 (C7) and time period 9 (C9).

进一步的，为了消除访问竞争的问题则需采用访问竞争避让策略。如图6所示，负载C7需要并发访问跨越2个条带的3个数据子块，①先选择4个数据子块(图6中虚线方框处)；②进行访问竞争检查，发现磁盘3上的数据子块(图6中带×的虚线方框处)、条带1的校验子块都位于磁盘3，会引起磁盘3的访问竞争，因此删除该数据子块；③把负载C并行写入剩余的3个数据子块(图6中标有C7的虚线方框处)。最后获得不会引发访问竞争且可并发访问的3个数据子块。Further, in order to eliminate the problem of access competition, an access competition avoidance strategy is needed. As shown in FIG. 6, the load C7 needs to concurrently access three data sub-blocks spanning two strips, one selects four data sub-blocks first (at the dotted line in FIG. 6); 2 performs an access competition check, and finds the disk 3 The upper data sub-block (at the dashed box with × in Figure 6) and the check sub-block of strip 1 are all located on the disk 3, which will cause the access of the disk 3 to compete, thus deleting the data sub-block; 3 load C The remaining 3 data sub-blocks are written in parallel (at the dashed box labeled C7 in Figure 6). Finally, three data sub-blocks that do not cause access competition and can be accessed concurrently are obtained.

C8、C9采用相同的方法进行访问竞争避让。最后，消除访问竞争的存储空间映射情况，如图7所示。从图7中可以看出，在任何一块磁盘上，都没有2个子块(包括数据子块和校验子块)被并发访问；C8 and C9 use the same method to perform access avoidance. Finally, the storage space mapping for accessing the competition is eliminated, as shown in Figure 7. As can be seen from Figure 7, no sub-blocks (including data sub-blocks and parity sub-blocks) are concurrently accessed on any one disk;

4、性能需求感知4, performance demand perception

DPPDL需要感知负载的性能需求，然后动态调整并行的磁盘数，以提供合适的性能，获得更高的节能效率。本实例采用公式(2)来感知负载需求的数据传输率，其中β取值为1.2，窗口时间T取值为5秒。DPPDL needs to sense the performance requirements of the load and then dynamically adjust the number of parallel disks to provide the right performance for higher energy efficiency. This example uses equation (2) to sense the data transmission rate of the load demand, where β is 1.2 and window time T is 5 seconds.

本公开的实施例可被实现为使用任意适当的硬件，固件，软件，或及其任意组合进行想要的配置的装置。图8示意性地示出了可被用于实现本申请中所述的各个实施例的示例性装置1300。 Embodiments of the present disclosure can be implemented as a device that performs the desired configuration using any suitable hardware, firmware, software, or any combination thereof. FIG. 8 schematically illustrates an exemplary apparatus 1300 that can be used to implement various embodiments described in this application.

对于一个实施例，图8示出了示例性装置1300，该装置具有一个或多个处理器1302、被耦合到(一个或多个)处理器1302中的至少一个的控制模块(芯片组)1304、被耦合到控制模块1304的存储器1306、被耦合到控制模块1304的非易失性存储器(NVM)/存储设备1308、被耦合到控制模块1304的一个或多个输入/输出设备1310，以及被耦合到控制模块1306的网络接口1312。For one embodiment, FIG. 8 illustrates an exemplary apparatus 1300 having one or more processors 1302, a control module (chipset) 1304 coupled to at least one of the processor(s) 1302. a memory 1306 coupled to the control module 1304, a non-volatile memory (NVM)/storage device 1308 coupled to the control module 1304, one or more input/output devices 1310 coupled to the control module 1304, and The network interface 1312 is coupled to the control module 1306.

处理器1302可包括一个或多个单核或多核处理器，处理器1302可包括通用处理器或专用处理器(例如图形处理器、应用处理器、基频处理器等)的任意组合。在一些实施例中，装置1300能够作为本申请实施例中所述的转码端的服务器等设备。Processor 1302 can include one or more single or multi-core processors, and processor 1302 can comprise any combination of general purpose or special purpose processors (eg, graphics processors, application processors, baseband processors, etc.). In some embodiments, the device 1300 can be used as a server or the like of the transcoding terminal described in the embodiment of the present application.

在一些实施例中，装置1300可包括具有指令1314的一个或多个计算机可读介质(例如，存储器1306或NVM/存储设备1308)以及与该一个或多个计算机可读介质相合并被配置为执行指令1314以实现模块从而执行本公开中所述的动作的一个或多个处理器1302。In some embodiments, apparatus 1300 can include one or more computer readable media (eg, memory 1306 or NVM/storage device 1308) having instructions 1314 and, in conjunction with the one or more computer readable media, configured to The one or more processors 1302 that execute the instructions 1314 to implement the modules to perform the actions described in this disclosure.

对于一个实施例，控制模块1304可包括任意适当的接口控制器，以向(一个或多个)处理器1302中的至少一个和/或与控制模块1304通信的任意适当的设备或组件提供任意适当的接口。For one embodiment, control module 1304 can include any suitable interface controller to provide any suitable device or component to at least one of processor(s) 1302 and/or any suitable device or component in communication with control module 1304. Interface.

控制模块1304可包括存储器控制器模块，以向存储器1306提供接口。存储器控制器模块可以是硬件模块、软件模块和/或固件模块。Control module 1304 can include a memory controller module to provide an interface to memory 1306. The memory controller module can be a hardware module, a software module, and/or a firmware module.

存储器1306可被用于例如为装置1300加载和存储数据和/或指令1314。对于一个实施例，存储器1306可包括任意适当的易失性存储器，例如，适当的DRAM。在一些实施例中，存储器1306可包括双倍数据速率类型四同步动态随机存取存储器(DDR4SDRAM)。Memory 1306 can be used, for example, to load and store data and/or instructions 1314 for device 1300. For one embodiment, memory 1306 can include any suitable volatile memory, such as a suitable DRAM. In some embodiments, memory 1306 can include double data rate type quad synchronous dynamic random access memory (DDR4 SDRAM).

对于一个实施例，控制模块1304可包括一个或多个输入/输出控制器，以向NVM/存储设备1308及(一个或多个)输入/输出设备1310提供接口。For one embodiment, control module 1304 can include one or more input/output controllers to provide an interface to NVM/storage device 1308 and input/output device(s) 1310.

例如，NVM/存储设备1308可被用于存储数据和/或指令1314。NVM/存储设备1308可包括任意适当的非易失性存储器(例如，闪存)和/或可包括任意适当的(一个或多个)非易失性存储设备(例如，一个或多个硬盘驱动器(HDD)、一个或多个光盘(CD)驱动器和/或一个或多个数字通用光盘(DVD)驱动器)。For example, NVM/storage device 1308 can be used to store data and/or instructions 1314. NVM/storage device 1308 may comprise any suitable non-volatile memory (eg, flash memory) and/or may include any suitable non-volatile storage device(s) (eg, one or more hard disk drives ( HDD), one or more compact disc (CD) drives and/or one or more digital versatile disc (DVD) drives).

NVM/存储设备1308可包括在物理上作为装置1300被安装在其上的设备的一部分的存储资源，或者其可被该设备访问可不必作为该设备的一部分。例如，NVM/存储设备1308可通过网络经由(一个或多个)输入/输出设备1310进行访问。NVM/storage device 1308 can include a storage resource physically part of a device on which device 1300 is installed, or it can be accessed by the device without necessarily being part of the device. For example, the NVM/storage device 1308 can be accessed via the network via the input/output device(s) 1310.

(一个或多个)输入/输出设备1310可为装置1300提供接口以与任意其他适当的设备通信，输入/输出设备1310可以包括通信组件、音频组件、传感器组件等。网络接口1312可为装置1300提供接口以通过一个或多个网络通信，装置1300可根据一个或多个无线网络标准和/或协议中的任意标准和/或协议来与无线网络的一个或多个组件进行无线通信，例如接入基于通信标准的无线网络，如WiFi，2G或3G，或它们的组合进行无线通信。The input/output device(s) 1310 can provide an interface to the device 1300 to communicate with any other suitable device, and the input/output device 1310 can include a communication component, an audio component, a sensor component, and the like. Network connection Port 1312 can provide an interface for device 1300 to communicate over one or more networks, and device 1300 can interact with one or more components of the wireless network in accordance with any of one or more wireless network standards and/or protocols. Wireless communication is performed, such as accessing a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof for wireless communication.

对于一个实施例，(一个或多个)处理器1302中的至少一个可与控制模块1304的一个或多个控制器(例如，存储器控制器模块)的逻辑封装在一起。对于一个实施例，(一个或多个)处理器1302中的至少一个可与控制模块1304的一个或多个控制器的逻辑封装在一起以形成系统级封装(SiP)。对于一个实施例，(一个或多个)处理器1302中的至少一个可与控制模块1304的一个或多个控制器的逻辑集成在同一模具上。对于一个实施例，(一个或多个)处理器1302中的至少一个可与控制模块1304的一个或多个控制器的逻辑集成在同一模具上以形成片上系统(SoC)。For one embodiment, at least one of the processor(s) 1302 can be packaged with the logic of one or more controllers (eg, memory controller modules) of the control module 1304. For one embodiment, at least one of the processor(s) 1302 can be packaged with the logic of one or more controllers of the control module 1304 to form a system in package (SiP). For one embodiment, at least one of the processor(s) 1302 can be integrated on the same mold as the logic of one or more controllers of the control module 1304. For one embodiment, at least one of the processor(s) 1302 can be integrated with the logic of one or more controllers of the control module 1304 on the same mold to form a system on a chip (SoC).

在各个实施例中，装置1300可以但不限于是：服务器、台式计算设备或移动计算设备(例如，膝上型计算设备、手持计算设备、平板电脑、上网本等)等终端设备。在各个实施例中，装置1300可具有更多或更少的组件和/或不同的架构。例如，在一些实施例中，装置1300包括一个或多个摄像机、键盘、液晶显示器(LCD)屏幕(包括触屏显示器)、非易失性存储器端口、多个天线、图形芯片、专用集成电路(ASIC)和扬声器。In various embodiments, device 1300 can be, but is not limited to, a terminal device such as a server, desktop computing device, or mobile computing device (eg, a laptop computing device, a handheld computing device, a tablet, a netbook, etc.). In various embodiments, device 1300 can have more or fewer components and/or different architectures. For example, in some embodiments, device 1300 includes one or more cameras, a keyboard, a liquid crystal display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an application specific integrated circuit ( ASIC) and speakers.

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进，或者对其中部分技术特征进行等同替换，这些改进和替换也应视为本发明的保护范围。 The above is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can make some improvements or carry out some of the technical features without departing from the principles of the present invention. These modifications and substitutions are also considered to be within the scope of the invention.

Claims

A dynamic local parallel data layout method for continuous data storage, which is specifically implemented by the following technical solutions:

Including stripe partitioning, storage space dynamic mapping, access competition avoidance, and performance demand awareness; firstly, the performance requirements of the storage application are obtained through performance requirement perception, and then the storage space dynamic mapping mechanism is executed, and the storage with the appropriate hard disk parallelism is dynamically allocated according to the performance requirement. Space, and finally optimize the storage space dynamic mapping mechanism by accessing the competition avoidance;

Among them, storage space dynamic mapping is the core, performance requirement perception is the basis of storage space dynamic mapping, stripe partitioning is the premise of storage space dynamic mapping, and access competition avoidance is the optimization and completeness of storage space dynamic mapping.

The method of claim 1 further characterized in that said striping is divided by: step 1.1: dividing each hard disk in each of the N hard disks into 1×N storage blocks; wherein, 1 is greater than or equal to 1 , N is greater than or equal to 3;

Step 1.2: Each data block and the check block in step 1.1 are M equal-sized sub-blocks, and each sub-block includes a plurality of consecutive storage areas, which are respectively called a data sub-block strip and a parity sub-block PStrip;

In step 1.3, the sub-blocks in the strip with the same start address in each strip form a sub-strip stripe, and then the strip in the sub-strip is XORed to generate a PStrip in the sub-strip.

The method of claim 2 further characterized in that

In step 1.1, the N storage blocks with the same starting address in each hard disk form a stripe, which constitutes 1×N strips; each stripe includes 1 check block and N-1 data blocks;

Wherein, the check block in the strip i is located in the hard disk N-1-j; if j+v<N-1, the vth data block is located on the hard disk v, otherwise it is located on the hard disk v+1, wherein 0≤i <(1×N), j=i MOD N, 0≤v<N-1.

The method of claim 2 further characterized in that

In step 1.3, each strip includes M sub-bands of the same size; the parity sub-block PStrip m of the strip stripe stripe is generated by X X of its N-1 data sub-blocks, m m;

Then the generation method is:

The method of claim 1 further characterized in that: the operation idea of the dynamic mapping of the storage space is:

The disk array storage space is allocated and managed by the dynamic mapping mechanism of the logical block address and the physical block address, and the write data received by the disk array layer can be dynamically mapped to different numbers of hard disks; that is, according to the performance requirement parameters of the load k, dynamically allocate storage space with k hard disk parallelism, that is, k needs to write data concurrently The number of hard disks does not include the hard disk where the verification data of the written data is located; when the load is minimum, only the data is mapped to a certain hard disk, and only the hard disk is written; when the load is maximum, the data is not included in the verification data. On the remaining hard disks of the hard disk, write data to the remaining hard disks of the hard disk that does not include the check data.

The storage space is dynamically mapped, and the specific steps are:

Step (1) Selecting the stripe with the largest free strip as the CurStripe in CurBank;

The free strip is an unmapped strip; the strip list is a one-way circular list consisting of all strips, CurBank is the currently mapped strip, and the initial value is strip 0; CurStripe is in CurBank. The route to be mapped;

Step (2) If the number of free strips in CurStripe is 0, indicating that CurStripe has no free strip mappable, go to step (3), otherwise go to step (5);

Step (3) determines whether NextBank has a free strip mappable. If there is no free strip mappable, delete the stored data on NextBank and perform storage space recovery; wherein NextBank is the next stripe to be mapped, and the CurBank number is Neighbor, the initial value is strip 1;

Step (4) takes NextBank as CurBank and reacquires CurStripe, and then moves NextBank back;

Step (5) If the number of free strips in CurStripe is not less than k, then sequentially extract k strips from CurStripe, go to step (7), otherwise go to step (6);

Step (6) First take all free strips from CurStripe, and then take the remaining free strips from NextStripe to form k free strips. If NextStripe does not have enough free strips, delete the stored data on NextBank and recycle the storage space. And re-acquire NextStripe; where NextStripe is the Stripe that can be mapped in NextBank;

Step (7) obtains k free strips, performs storage space mapping, maps the logical address space to a physical address space having k hard disks parallelism, and records the mapping relationship in the mapping table;

Determine the offset of the disk and disk in the strip according to the strip, the sub-strip, and the number in the sub-strip where the strip is located, and record it in the mapping table. When reading, obtain data on the disk according to the mapping table. The location; the mapping table is an important part of the metadata, stored at the end of each working disk, with a version number, the version number is from small to large, and the disk array is loaded the most when the power is restored. Numbered version.

The method of claim 1 further characterized in that said accessing competition avoidance comprises the following steps:

Step 1) When selecting a strip spanning 2 strips for storage space mapping, first take all free strips from CurStripe, and then take the remaining free strips from NextStripe to form k+1 free strips together, if NextStripe is not enough. Free Strip, delete the stored data on NextBank, reclaim storage space, and regain NextStripe;

Step 2) Perform an access competition check. If there is no access competition or the end strip causes the access competition, the last strip is deleted; otherwise, the strip causing the access competition is deleted; finally, k strips that do not cause the access competition and can be concurrently accessed are obtained;

The access competition check refers to whether two sub-blocks of the same hard disk are concurrently accessed.

The method of claim 1 further characterized in that

The performance requirement is perceived as:

Continuous data storage applications require a stable data transfer rate, so the data transfer rate is used as a performance requirement indicator;

The data transmission rate requirement is specifically:

Step A. Counting the history information of the disk array layer I/O request queue;

Step B. Perform analysis and prediction;

For continuous data storage applications, the fluctuation period or burst time of the load is large, and the data transmission rate of the load demand can be perceived according to the average data transmission rate in the time window T, and the time window is recorded with rn=(ta, pos, len). The nth I/O request in T; where ta, pos, and len are the arrival time, the starting logical address, and the request length of the request rn, respectively, and rn.len represents the request length len of rn;

The data transfer rate for perceived load demand is expressed as:

Where num is the number of I/Os coming in the time window T; β is the coefficient of performance, which can be between 1.2 and 1.5;

5s<T<15s;

Representing the length summation of num I/O requests in the time window T;

Wherein, the I/O request rn in formula (2) comes from the request queue of the RAID layer;

After the data transmission rate of the load demand is sensed, the number of hard disks that need to be concurrently written data is determined according to the data transmission rate that can be provided by different hard disk parallelism in the actual application scenario.

A dynamic local parallel data layout device for continuous data storage, comprising:

One or more processors; and

One or more machine-readable media having instructions stored thereon, when executed by the one or more processors, cause the apparatus to perform the method of one or more of claims 1-7.

One or more machine-readable medium having stored thereon instructions that, when executed by one or more processors, cause the apparatus to perform the method of one or more of claims 1-7.