WO2018019119A1 - Method and device for dynamic partial-parallel data layout for continuous data storage - Google Patents
Method and device for dynamic partial-parallel data layout for continuous data storage Download PDFInfo
- Publication number
- WO2018019119A1 WO2018019119A1 PCT/CN2017/092403 CN2017092403W WO2018019119A1 WO 2018019119 A1 WO2018019119 A1 WO 2018019119A1 CN 2017092403 W CN2017092403 W CN 2017092403W WO 2018019119 A1 WO2018019119 A1 WO 2018019119A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- strip
- data
- storage space
- storage
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0689—Disk arrays, e.g. RAID, JBOD
Definitions
- the present invention relates to the field of redundant array technology for independent hard disks, and in particular, to a dynamic local parallel data layout method and apparatus for continuous data storage.
- video surveillance has become an ubiquitous security facility in modern society because it plays an irreplaceable role in forensics and identification.
- This kind of application requires a large amount of storage space, mainly performs write operations, mainly sequential access, and has low requirements on random performance.
- the storage system is called a continuous data storage system.
- RAID Redundant Arrays of Independent Disks
- SSDs Solid State Disks
- Striping A method of dividing a piece of continuous data into blocks of the same size and writing each piece of data to different disks in RAID.
- Fault Tolerance Generates redundant verification data and saves it using some kind of operation, such as XOR operation. When the hard disk fails and data is lost, the verification data can be used for data recovery. XOR operations are usually indicated by "".
- Distribution check The check data is distributed on each disk constituting the RAID according to a certain rule.
- Partial Parallelism Only some of the hard disks in the array are parallel, rather than all hard disks in parallel, providing the right performance and facilitating the scheduling of the remaining hard drives.
- RAIDs are RAID0, RAID5, and so on.
- RAID0 is only striped, and does not have redundancy check capability.
- RAID 5 writes data to the hard disks in the array in a stripe manner. The verification data is distributed and stored on each disk in the array. The access speed is improved by global parallelism, and the single disk is fault tolerant.
- Continuous data storage systems are based on sequential access, which does not require random performance and generally does not require high performance provided by global parallelism.
- the invention patents ZL201010256899.5, ZL201010256665.0, ZL201010256711.7, ZL201010256908.0, ZL201010256679.2, ZL201010256699.X, ZL201010575578.1, ZL201010575625.2, ZL201010575611.0, etc. propose a variety of local parallel data layout, Energy-efficient RAIDs using this type of locally parallel data layout are collectively referred to as S-RAID.
- S-RAID The basic idea of S-RAID is: 1 to divide the storage in the array into several groups, and provide appropriate performance in parallel within the group. The grouping is convenient for scheduling some hard disks to run while the remaining hard disks are standby energy-saving; 2 adopting greedy addressing method in sequential access mode To ensure that the read and write operations are distributed over a certain period of time on a certain hard disk, other hard disks can be used for a long time to save energy.
- the data layout of the S-RAID adopts the static mapping mechanism of the storage space, that is, when creating, the logical block address (LBA) and the physical block address of the hard disk are established according to parameters such as the number of disk blocks, the S-RAID type, and the strip size. (Physical Block Address, PBA) mapping; this mapping remains unchanged throughout the life of the S-RAID.
- LBA logical block address
- PBA Physical Block Address
- S-RAID static data layout of S-RAID is suitable for a relatively stable workload, and the local parallelism cannot be dynamically adjusted according to the performance requirements of fluctuating load and burst load.
- S-RAID needs performance according to peak load. The demand determines the degree of local parallelism, but the degree of parallelism is clearly excessive for the valley load. This excess performance leads to additional energy consumption and increases significantly with fluctuating loads and burst load strength.
- Video data is generally compressed and transmitted and stored.
- Existing video compression standards such as H.264/MPEG-4, are based on the temporal and spatial redundancy of video content for video compression. The video compression ratio will be very high. Change within a wide range. There are more moving objects during the day, the video compression is relatively small, and the amount of video data generated is large; there are fewer moving objects at night, the video compression is relatively high, and the amount of generated video data is small.
- a high-intensity fluctuation load is also generated.
- Cache devices usually have no fault tolerance mechanism, and add fault tolerance to cache devices, which will further increase hardware cost and power consumption.
- DPPDL Dynamic Partial-Parallel Data Layout
- the object of the present invention is to solve the problem that the existing static partial parallel data layout can not be better adapted to the fluctuating load and the sudden load.
- a dynamic localization oriented to continuous data storage is proposed. Row data layout method and device.
- DPPDL dynamic local parallel data layout method
- Step 1.1 divides each hard disk in the N hard disks into 1 ⁇ N storage blocks; wherein, 1 is greater than or equal to 1, and N is greater than or equal to 3;
- step 1.1 the N storage blocks with the same starting address in each hard disk form a stripe, which constitutes 1 ⁇ N strips; each strip contains 1 parity storage block, and N-1 data storage blocks.
- the check storage block is referred to as a check block
- the data storage block is referred to as a data block;
- Step 1.2 Each data block and the check block in step 1.1 are M equal-sized sub-blocks, and each sub-block includes a plurality of consecutive storage areas, which are respectively called a data sub-block strip and a parity sub-block PStrip;
- Step 1.3 in step 1.1 each sub-block with the same starting address in the strip is formed into a sub-strip stripe, and then the strip in the sub-strip is XORed to generate a PStrip in the sub-strip;
- each strip includes M sub-strips of the same size; the sub-band PStrip m of the strip stripe stripe is generated by X X of its N-1 data sub-blocks, see equation (1), ;
- the storage space dynamic mapping is specifically:
- the disk array storage space is allocated and managed by the dynamic mapping mechanism of the logical block address and the physical block address, and the write data received by the disk array layer can be dynamically mapped to different numbers of hard disks; that is, according to the performance requirement parameters of the load k, dynamically allocate storage space with k hard disk parallelism, that is, k is the number of hard disks that need to write data concurrently, and does not include the hard disk where the verification data of the written data is located; when the load is minimum, only maps to one hard disk Write data only to the hard disk; when the load is maximum, it maps to the remaining hard disks of the hard disk that does not include the check data, and writes data to the remaining hard disks of the hard disk that does not include the check data;
- the free strip is an unmapped strip;
- the strip list is a one-way circular list consisting of all strips, CurBank is the currently mapped strip, the initial value is strip 0;
- CurStripe is CurBank a stripe that can be mapped;
- CurStripe If the number of free strips in CurStripe is 0, it means that CurStripe has no free strip mapable, equivalent to CurBank without free strip mapable, go to (3), otherwise go to (5);
- NextBank is the next stripe to be mapped, adjacent to the CurBank number.
- the initial value is strip 1;
- mapping the logical address space to a physical address space having k hard disk parallelism after obtaining k free strips, performing storage space mapping, mapping the logical address space to a physical address space having k hard disk parallelism; and recording the mapping relationship in the mapping table;
- mapping table Determine the offset of the disk and disk in the strip according to the strip, the sub-strip, and the number in the sub-strip where the strip is located, and record it in the mapping table.
- the mapping table is an important part of the metadata, stored at the end of each working disk, with a version number, the version number is from small to large, and the disk array is loaded the most when the power is restored. Numbered version
- DPPDL performs the access competition check. If there is no access competition or the end strip causes the access competition, the last strip is deleted; otherwise, the strip causing the access competition is deleted. Finally, obtain k strips that do not cause access competition and can be accessed concurrently;
- the access competition check refers to whether two sub-blocks of the same hard disk are concurrently accessed
- the access competition avoidance can be used to replace the step (6) in the storage space dynamic mapping
- DPPDL uses the data transfer rate as a performance requirement indicator.
- Step A DPPDL statistics history information of the disk array layer I/O request queue
- Step B DPPDL performs analysis and prediction
- the nth I/O request in the window T wherein, ta, pos, and len are the arrival time, the starting logical address, and the request length of the request rn, respectively, and the request length len of rn is represented by rn.len;
- num is the number of I/Os coming in the time window T
- ⁇ is the coefficient of performance, which can be between 1.2 and 1.5;
- DPPDL senses the data transmission rate of the load demand, it determines the number of hard disks that need to concurrently write data according to the data transmission rate that can be provided by different hard disk parallelism in the actual application scenario.
- DPPDL uses a dynamic local parallel strategy to dynamically allocate storage space with appropriate parallelism according to the performance requirements of different workloads. DPPDL not only guarantees long-term standby energy saving for most disks, but also dynamically provides appropriate local parallelism, higher availability, and higher energy efficiency;
- DPPDL sequentially allocates and reclaims storage space in strips, when the number of stripes is large (for Large-capacity disks are possible) to ensure that data is deleted in a substantially chronological order.
- Strip is used as the mapping unit, and the appropriate amount of Strip parallel is selected on Stripe according to performance requirements, and the appropriate parallelism is dynamically provided, thus solving the contradiction between dynamic local parallelism and sequential deletion characteristics.
- FIG. 1 is a flow chart showing the steps of a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention
- FIG. 2 is a schematic diagram of strip partitioning in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention
- FIG. 3 is a schematic diagram of subdivision of strip partitioning in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention
- FIG. 4 is a schematic diagram of dynamic mapping of a storage space in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention
- FIG. 5 is a schematic diagram of access competition generation in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention
- FIG. 6 is a schematic diagram of access competition avoidance in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention
- FIG. 7 is a schematic diagram of dynamic mapping of storage space after access competition avoidance in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention.
- FIG. 8 is a schematic structural diagram of a dynamic local parallel data layout apparatus for continuous data storage according to an embodiment of the present invention.
- the dynamic local parallel data layout method for continuous data storage in the embodiment of the present invention may specifically include the following steps:
- Step 101 Striping, dividing each hard disk in the N hard disks into 1 ⁇ N storage blocks equally;
- Step 102 Perceive performance requirements and obtain performance requirements of the storage application
- Step 103 Determine whether there is access competition, if yes, proceed to step 104, otherwise perform step 105;
- Step 104 Accessing the competition avoidance and optimizing the dynamic mapping mechanism of the storage space
- Step 105 Perform allocation management on the storage space by using a dynamic mapping mechanism.
- storage space dynamic mapping is the core
- performance requirement perception is the basis of storage space dynamic mapping
- stripe partitioning is the premise of storage space dynamic mapping
- access competition avoidance is the optimization and completeness of storage space dynamic mapping.
- the strip is divided, and the specific steps are:
- Step 1.1 divides each hard disk in the N hard disks into 1 ⁇ N storage blocks; wherein, 1 is greater than or equal to 1, and N is greater than or equal to 3;
- Step 1.2 Each data block and check block in step 1.1 is M equal-sized sub-blocks, and each sub-block includes a plurality of consecutive storage areas (such as disk sectors), which are respectively called data sub-blocks. Strip and check sub-blocks (denoted as PStrip);
- step 1.3 the sub-blocks in the strip with the same starting address in each strip form a sub-strip (recorded as stripe), and then the strip in the sub-band is XORed to generate the sub-strip. Inside the PStrip.
- the N storage blocks having the same starting address in each hard disk form a stripe, which constitutes 1 ⁇ N strips; each strip includes one parity storage block, N-1 Data storage block, check storage block referred to as check block, data storage block referred to as data block;
- each strip includes M sub-bands of the same size; the parity sub-block PStrip m of the strip stripe stripe is generated by X X of its N-1 data sub-blocks, ;
- the operation idea of the dynamic mapping of the storage space is:
- the disk array storage space is allocated and managed by the dynamic mapping mechanism of the logical block address and the physical block address, and the write data received by the disk array layer can be dynamically mapped to different numbers of hard disks; that is, according to the load
- the performance requirement parameter k dynamically allocates the storage space with k hard disk parallelism, that is, k is the number of hard disks that need to concurrently write data, and does not include the hard disk where the verification data of the written data is located; when the load is minimum, only maps to On a certain hard disk, only data is written to the hard disk; when the load is maximum, it is mapped to the remaining hard disks except one block of one hard disk, and the data is written to the remaining hard disks except one block of one hard disk;
- Strip linked list a one-way circular linked list consisting of all strips
- CurBank the current stripe to be mapped, called the current map strip, the initial value is stripe 0;
- NextBank the next stripe to be mapped, called the adjacent map strip, adjacent to the CurBank number, the initial value is strip 1;
- CurStripe a route that can be mapped in CurBank
- NextStripe Stripe that can be mapped in NextBank
- the storage space is dynamically mapped, and the specific steps are:
- Step (1) Selecting the stripe with the largest free strip as the CurStripe in CurBank;
- the free strip is a strip that is not mapped
- Step (2) If the number of free strips in CurStripe is 0, indicating that CurStripe has no free strip mappable, go to step (3), otherwise go to step (5);
- Step (3) determines whether NextBank has a free strip mappable, and if there is no free strip mappable, delete the stored data on the NextBank to perform storage space recovery;
- Step (4) takes NextBank as CurBank and reacquires CurStripe, and then moves NextBank back;
- Step (5) If the number of free strips in CurStripe is not less than k, then sequentially extract k strips from CurStripe, go to step (7), otherwise go to step (6);
- Step (6) First take all free strips from CurStripe, and then take the remaining free strips from NextStripe to form k free strips. If NextStripe does not have enough free strips, delete the stored data on NextBank, reclaim the storage space, and re-acquire NextStripe;
- Step (7) obtains k free strips, performs storage space mapping, maps the logical address space to a physical address space having k hard disks parallelism, and records the mapping relationship in the mapping table;
- mapping table Determine the offset of the disk and disk in the strip according to the strip, the sub-strip, and the number in the sub-strip where the strip is located, and record it in the mapping table.
- the mapping table is an important part of the metadata, stored at the end of each working disk, with a version number, the version number is from small to large, and the disk array is loaded the most when the power is restored. Numbered version.
- the access competition avoids, and the specific steps are:
- Step 1) When DPPDL selects a strip across 2 strips for storage space mapping, first take all free strips from CurStripe, and then take the remaining free strips from NextStripe and form k+1 free strips together. If NextStripe does not. With enough free strips, delete the stored data on NextBank, reclaim the storage space, and re-acquire NextStripe;
- Step 2 DPPDL performs an access competition check. If there is no access competition or the end strip causes the access competition, the last strip is deleted; otherwise, the strip causing the access competition is deleted; finally, k strips that do not cause the access competition and can be concurrently accessed are obtained;
- the access competition check refers to whether two sub-blocks of the same hard disk are concurrently accessed.
- the performance requirement is perceived, specifically:
- Continuous data storage applications are not very sensitive to response time and require a stable data transfer rate, so the data transfer rate is used as a performance requirement indicator;
- the data transmission rate requirement is specifically:
- Step A DPPDL statistics history information of the disk array layer I/O request queue
- Step B DPPDL performs analysis and prediction
- the data transfer rate for perceived load demand is expressed as:
- num is the number of I/Os coming in the time window T
- ⁇ is the coefficient of performance, which can be between 1.2 and 1.5;
- the I/O request rn in formula (2) comes from the request queue of the RAID layer
- the DPPDL After the DPPDL senses the data transmission rate of the load demand, it determines the number of hard disks that need to concurrently write data according to the data transmission rate that can be provided by different hard disk parallelism in the actual application scenario.
- the parity sub-block of the sub-strip is generated by the X-OR operation of the four data sub-blocks of the sub-strip; for example, the parity sub-block of the sub-strip 1 in the strip 0, and the four data sub-blocks of the sub-strip Or the operation is generated.
- DPPDL uses dynamic mapping mechanism to allocate and manage storage space.
- Load A requires two disks (excluding the disk where the check data is located) in parallel, that is, concurrently writes data to disk 0 and disk 1, for three time periods, namely, time period 1, time period 2, and time period 3;
- Load C requires 3 disks (excluding the disk where the check data is located) in parallel, that is, concurrently writes data to disk 2, disk 3, and disk 0 for 3 time periods, that is, time period 7, time period 8, time Paragraph 9;
- Load D requires 1 disk (not including the disk where the parity data is located) in parallel for 5 periods.
- time period 11 time period 12
- data is written to disk 0
- data is written to disk 1 during time period 13, time period 14.
- the load E needs two disks (excluding the disk where the check data is located) in parallel, that is, concurrently writes data to the disk 1, the disk 2, and lasts for two time periods, that is, the time period 15 and the time period 16.
- the data is preferentially written to the current working disk.
- the current working disk can meet the performance requirements, it does not need to access the standby disk, and has good local parallelism and dynamically allocates storage space with appropriate parallelism.
- the write load is minimum (such as load D), only write data to 1 disk; when the write load is maximum (such as load B), write data to 4 disks concurrently.
- DPPDL provides large elastic parallelism to meet the performance needs of different loads, and has very High energy efficiency.
- DPPDL has access competition problems when using the above dynamic mapping mechanism to allocate and manage storage space.
- FIG. 5 is a schematic diagram of the access competition generated in the embodiment.
- the prerequisites for accessing competition avoidance are:
- the load C7 needs to concurrently access three data sub-blocks spanning two strips, one selects four data sub-blocks first (at the dotted line in FIG. 6); 2 performs an access competition check, and finds the disk 3
- the upper data sub-block (at the dashed box with ⁇ in Figure 6) and the check sub-block of strip 1 are all located on the disk 3, which will cause the access of the disk 3 to compete, thus deleting the data sub-block;
- 3 load C The remaining 3 data sub-blocks are written in parallel (at the dashed box labeled C7 in Figure 6). Finally, three data sub-blocks that do not cause access competition and can be accessed concurrently are obtained.
- DPPDL needs to sense the performance requirements of the load and then dynamically adjust the number of parallel disks to provide the right performance for higher energy efficiency.
- This example uses equation (2) to sense the data transmission rate of the load demand, where ⁇ is 1.2 and window time T is 5 seconds.
- Embodiments of the present disclosure can be implemented as a device that performs the desired configuration using any suitable hardware, firmware, software, or any combination thereof.
- FIG. 8 schematically illustrates an exemplary apparatus 1300 that can be used to implement various embodiments described in this application.
- FIG. 8 illustrates an exemplary apparatus 1300 having one or more processors 1302, a control module (chipset) 1304 coupled to at least one of the processor(s) 1302. a memory 1306 coupled to the control module 1304, a non-volatile memory (NVM)/storage device 1308 coupled to the control module 1304, one or more input/output devices 1310 coupled to the control module 1304, and
- the network interface 1312 is coupled to the control module 1306.
- Processor 1302 can include one or more single or multi-core processors, and processor 1302 can comprise any combination of general purpose or special purpose processors (eg, graphics processors, application processors, baseband processors, etc.).
- the device 1300 can be used as a server or the like of the transcoding terminal described in the embodiment of the present application.
- apparatus 1300 can include one or more computer readable media (eg, memory 1306 or NVM/storage device 1308) having instructions 1314 and, in conjunction with the one or more computer readable media, configured to The one or more processors 1302 that execute the instructions 1314 to implement the modules to perform the actions described in this disclosure.
- computer readable media eg, memory 1306 or NVM/storage device 1308
- the one or more processors 1302 that execute the instructions 1314 to implement the modules to perform the actions described in this disclosure.
- control module 1304 can include any suitable interface controller to provide any suitable device or component to at least one of processor(s) 1302 and/or any suitable device or component in communication with control module 1304. Interface.
- Control module 1304 can include a memory controller module to provide an interface to memory 1306.
- the memory controller module can be a hardware module, a software module, and/or a firmware module.
- Memory 1306 can be used, for example, to load and store data and/or instructions 1314 for device 1300.
- memory 1306 can include any suitable volatile memory, such as a suitable DRAM.
- memory 1306 can include double data rate type quad synchronous dynamic random access memory (DDR4 SDRAM).
- control module 1304 can include one or more input/output controllers to provide an interface to NVM/storage device 1308 and input/output device(s) 1310.
- NVM/storage device 1308 can be used to store data and/or instructions 1314.
- NVM/storage device 1308 may comprise any suitable non-volatile memory (eg, flash memory) and/or may include any suitable non-volatile storage device(s) (eg, one or more hard disk drives (HDD), one or more compact disc (CD) drives and/or one or more digital versatile disc (DVD) drives).
- HDD hard disk drives
- CD compact disc
- DVD digital versatile disc
- NVM/storage device 1308 can include a storage resource physically part of a device on which device 1300 is installed, or it can be accessed by the device without necessarily being part of the device.
- the NVM/storage device 1308 can be accessed via the network via the input/output device(s) 1310.
- the input/output device(s) 1310 can provide an interface to the device 1300 to communicate with any other suitable device, and the input/output device 1310 can include a communication component, an audio component, a sensor component, and the like.
- Network connection Port 1312 can provide an interface for device 1300 to communicate over one or more networks, and device 1300 can interact with one or more components of the wireless network in accordance with any of one or more wireless network standards and/or protocols.
- Wireless communication is performed, such as accessing a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof for wireless communication.
- At least one of the processor(s) 1302 can be packaged with the logic of one or more controllers (eg, memory controller modules) of the control module 1304.
- at least one of the processor(s) 1302 can be packaged with the logic of one or more controllers of the control module 1304 to form a system in package (SiP).
- at least one of the processor(s) 1302 can be integrated on the same mold as the logic of one or more controllers of the control module 1304.
- at least one of the processor(s) 1302 can be integrated with the logic of one or more controllers of the control module 1304 on the same mold to form a system on a chip (SoC).
- SoC system on a chip
- device 1300 can be, but is not limited to, a terminal device such as a server, desktop computing device, or mobile computing device (eg, a laptop computing device, a handheld computing device, a tablet, a netbook, etc.).
- device 1300 can have more or fewer components and/or different architectures.
- device 1300 includes one or more cameras, a keyboard, a liquid crystal display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an application specific integrated circuit ( ASIC) and speakers.
- LCD liquid crystal display
- ASIC application specific integrated circuit
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
Abstract
Description
本发明涉及独立硬盘冗余阵列技术领域,尤其涉及一种面向连续数据存储的动态局部并行数据布局方法及装置。The present invention relates to the field of redundant array technology for independent hard disks, and in particular, to a dynamic local parallel data layout method and apparatus for continuous data storage.
近年来,视频监控、备份、归档等相关技术应用十分广泛,以视频监控为例,由于视频监控在取证与识别方面具有不可替代的作用,已成为现代社会无处不在的安防设施。该类应用需要海量存储空间,主要执行写操作,以顺序访问为主,对随机性能要求不高,称该类存储系统为连续数据存储系统。In recent years, video surveillance, backup, archiving and other related technologies have been widely used. Taking video surveillance as an example, video surveillance has become an ubiquitous security facility in modern society because it plays an irreplaceable role in forensics and identification. This kind of application requires a large amount of storage space, mainly performs write operations, mainly sequential access, and has low requirements on random performance. The storage system is called a continuous data storage system.
在海量数据存储中,为了满足存储系统性能、容量和可靠性需求,人们提出了各种类型的独立硬盘冗余阵列(Redundant Arrays of Independent Disks,RAID)。RAID把多个物理存储设备,如磁盘、固态盘(Solid State Disk,SSD)等联合起来,形成统一的逻辑存储设备,可提供更大的容量、更高的性能、更可靠的数据安全性。In the massive data storage, in order to meet the performance, capacity and reliability requirements of the storage system, various types of Redundant Arrays of Independent Disks (RAID) have been proposed. RAID combines multiple physical storage devices, such as disks and Solid State Disks (SSDs), to form a unified logical storage device that provides greater capacity, higher performance, and more reliable data security.
RAID中常用的技术术语如下:The technical terms commonly used in RAID are as follows:
条带化:把一段连续数据分割成相同大小的数据块,把每块数据分别写入RAID中不同盘上的方法。Striping: A method of dividing a piece of continuous data into blocks of the same size and writing each piece of data to different disks in RAID.
容错:利用某种运算,如异或运算,生成冗余的校验数据并保存。当硬盘出现故障丢失数据时,可利用校验数据进行数据恢复。异或运算通常用“”表示。Fault Tolerance: Generates redundant verification data and saves it using some kind of operation, such as XOR operation. When the hard disk fails and data is lost, the verification data can be used for data recovery. XOR operations are usually indicated by "".
分布校验:校验数据按一定规律分布在构成RAID的各个盘上。Distribution check: The check data is distributed on each disk constituting the RAID according to a certain rule.
局部并行:阵列中仅部分硬盘并行,而不是全部硬盘并行,能够提供合适的性能,并且便于调度其余硬盘待机节能。Partial Parallelism: Only some of the hard disks in the array are parallel, rather than all hard disks in parallel, providing the right performance and facilitating the scheduling of the remaining hard drives.
常用的RAID有RAID0、RAID5等。RAID0只进行条带化,不具有冗余校验能力。RAID5以条带的方式向阵列中的硬盘写数据,校验数据分布存储在阵列中的各个盘上,通过全局并行提高访问速度,具有单盘容错能力。Commonly used RAIDs are RAID0, RAID5, and so on. RAID0 is only striped, and does not have redundancy check capability. RAID 5 writes data to the hard disks in the array in a stripe manner. The verification data is distributed and stored on each disk in the array. The access speed is improved by global parallelism, and the single disk is fault tolerant.
连续数据存储系统以顺序访问为主,对随机性能要求不高,一般不需要全局并行提供的高性能。为此,发明专利ZL201010256899.5、ZL201010256665.0、ZL201010256711.7、ZL201010256908.0、ZL201010256679.2、ZL201010256699.X、ZL201010575578.1、ZL201010575625.2、ZL201010575611.0等提出多种局部并行数据布局,把采用该类局部并行数据布局的节能RAID统称为S-RAID。 Continuous data storage systems are based on sequential access, which does not require random performance and generally does not require high performance provided by global parallelism. To this end, the invention patents ZL201010256899.5, ZL201010256665.0, ZL201010256711.7, ZL201010256908.0, ZL201010256679.2, ZL201010256699.X, ZL201010575578.1, ZL201010575625.2, ZL201010575611.0, etc. propose a variety of local parallel data layout, Energy-efficient RAIDs using this type of locally parallel data layout are collectively referred to as S-RAID.
S-RAID基本思想是:①把阵列中的存储区分成若干组,组内并行提供合适的性能,分组便于调度部分硬盘运行而其余硬盘待机节能;②采用贪婪编址法,在顺序访问模式下,保证读写操作在较长时间内分布在部分确定的硬盘上,其它硬盘可以长时间待机节能。The basic idea of S-RAID is: 1 to divide the storage in the array into several groups, and provide appropriate performance in parallel within the group. The grouping is convenient for scheduling some hard disks to run while the remaining hard disks are standby energy-saving; 2 adopting greedy addressing method in sequential access mode To ensure that the read and write operations are distributed over a certain period of time on a certain hard disk, other hard disks can be used for a long time to save energy.
S-RAID的数据布局采用存储空间静态映射机制,即:在创建时,根据磁盘块数、S-RAID类型、Strip大小等参数,建立逻辑块地址(Logical Block Address,LBA)与硬盘物理块地址(Physical Block Address,PBA)的映射关系;此映射关系在S-RAID整个生命周期内保持不变。The data layout of the S-RAID adopts the static mapping mechanism of the storage space, that is, when creating, the logical block address (LBA) and the physical block address of the hard disk are established according to parameters such as the number of disk blocks, the S-RAID type, and the strip size. (Physical Block Address, PBA) mapping; this mapping remains unchanged throughout the life of the S-RAID.
然而,S-RAID的静态数据布局适合比较平稳的工作负载,不能根据波动负载、突发负载的性能需求动态调整局部并行度;对于波动负载、突发负载,S-RAID需要根据峰值负载的性能需求确定局部并行度,但该并行度对于谷值负载显然是过剩的。这种性能过剩会导致额外能耗,并且随着波动负载、突发负载强度的增大而显著增加。However, the static data layout of S-RAID is suitable for a relatively stable workload, and the local parallelism cannot be dynamically adjusted according to the performance requirements of fluctuating load and burst load. For fluctuating load and burst load, S-RAID needs performance according to peak load. The demand determines the degree of local parallelism, but the degree of parallelism is clearly excessive for the valley load. This excess performance leads to additional energy consumption and increases significantly with fluctuating loads and burst load strength.
在连续数据存储中,较强的波动负载、突发负载普遍存在。例如在视频监控中,视频内容的动态特性会产生严重的波动负载。视频数据一般要压缩后进行传输和存储,现有视频压缩标准,如H.264/MPEG-4等,都是基于视频内容的时间、空间冗余性进行视频压缩的,视频压缩比会在很大范围内变化。白天运动物体较多,视频压缩比较小,产生的视频数据量大;夜间运动物体较少,视频压缩比较高,产生的视频数据量小。此外,视频监控中各摄像机的工作时间、分辨率不同时,也会产生较高强度的波动负载。In continuous data storage, strong fluctuation loads and burst loads are common. For example, in video surveillance, the dynamic nature of video content can create severe fluctuating loads. Video data is generally compressed and transmitted and stored. Existing video compression standards, such as H.264/MPEG-4, are based on the temporal and spatial redundancy of video content for video compression. The video compression ratio will be very high. Change within a wide range. There are more moving objects during the day, the video compression is relatively small, and the amount of video data generated is large; there are fewer moving objects at night, the video compression is relatively high, and the amount of generated video data is small. In addition, when the working time and resolution of each camera in video surveillance are different, a high-intensity fluctuation load is also generated.
对于较强的波动、突发负载,采用缓存措施是不可行的。例如在视频监控中,负载不仅波动幅度大,而且波动周期足够长,需要大容量缓存设备。磁盘缓存不仅增加硬件成本,还会引入额外功耗;SSD缓存虽然功耗低,但大量使用会显著增加成本。深度缓存还会极大增加数据丢失的概率,缓存设备通常没有容错机制,而为缓存设备增加容错机制,又将进一步增加硬件成本和功耗。For strong fluctuations and sudden load, it is not feasible to adopt caching measures. For example, in video surveillance, the load not only has a large fluctuation range, but also the fluctuation period is long enough to require a large-capacity cache device. Disk caching not only increases hardware costs, but also introduces additional power consumption; while SSD caching is low in power consumption, heavy use can add significant cost. Deep cache also greatly increases the probability of data loss. Cache devices usually have no fault tolerance mechanism, and add fault tolerance to cache devices, which will further increase hardware cost and power consumption.
为此,提出一种面向连续数据存储的动态局部并行数据布局(Dynamic Partial-Parallel Data Layout,DPPDL),DPPDL采用动态局部并行策略,根据不同负载的性能需求,动态分配具有合适并行度的存储空间。DPPDL既可保证多数硬盘长时间待机节能,又能动态提供合适的局部并行度,具有更高的可用性,以及更高的节能效率。To this end, a Dynamic Partial-Parallel Data Layout (DPPDL) for continuous data storage is proposed. DPPDL adopts dynamic local parallel strategy to dynamically allocate storage space with appropriate parallelism according to the performance requirements of different loads. . DPPDL not only guarantees long-term standby energy saving for most hard disks, but also dynamically provides appropriate local parallelism, higher availability, and higher energy efficiency.
发明内容Summary of the invention
本发明的目的是针对已有静态局部并行数据布局不能更好适应波动负载、突发负载的不足,为了提高存储系统的节能效率,提出一种面向连续数据存储的动态局部并 行数据布局方法及装置。The object of the present invention is to solve the problem that the existing static partial parallel data layout can not be better adapted to the fluctuating load and the sudden load. In order to improve the energy-saving efficiency of the storage system, a dynamic localization oriented to continuous data storage is proposed. Row data layout method and device.
一种面向连续数据存储的动态局部并行数据布局方法(DPPDL),具体通过下述技术方案实现:A dynamic local parallel data layout method (DPPDL) for continuous data storage is realized by the following technical solutions:
包括条带划分、存储空间动态映射、访问竞争避让以及性能需求感知;首先通过性能需求感知获得存储应用的性能需求,然后执行存储空间动态映射机制,根据性能需求动态分配具有合适硬盘并行度的存储空间,最后通过访问竞争避让优化该存储空间动态映射机制;其中,存储空间动态映射是核心,性能需求感知是存储空间动态映射的依据,条带划分是存储空间动态映射的前提,而访问竞争避让是存储空间动态映射的优化与完备;Including stripe partitioning, storage space dynamic mapping, access competition avoidance, and performance demand awareness; firstly, the performance requirements of the storage application are obtained through performance requirement perception, and then the storage space dynamic mapping mechanism is executed, and the storage with the appropriate hard disk parallelism is dynamically allocated according to the performance requirement. Space, finally optimize the storage space dynamic mapping mechanism by accessing competition avoidance; among them, storage space dynamic mapping is the core, performance requirement perception is the basis of storage space dynamic mapping, stripe partitioning is the premise of storage space dynamic mapping, and access competition avoidance It is the optimization and completeness of the dynamic mapping of storage space;
所述条带划分,具体步骤为:The strip is divided, and the specific steps are as follows:
步骤1.1将N块硬盘中的每块硬盘平均分成1×N个存储块;其中,1大于等于1,N大于等于3;Step 1.1 divides each hard disk in the N hard disks into 1×N storage blocks; wherein, 1 is greater than or equal to 1, and N is greater than or equal to 3;
步骤1.1中每块硬盘内的起始地址相同的N个存储块组成一个条带,共组成1×N个条带;每个条带包含1个校验存储块,N-1个数据存储块,校验存储块简称校验块,数据存储块简称数据块;In step 1.1, the N storage blocks with the same starting address in each hard disk form a stripe, which constitutes 1×N strips; each strip contains 1 parity storage block, and N-1 data storage blocks. , the check storage block is referred to as a check block, and the data storage block is referred to as a data block;
其中,条带i中的校验块位于硬盘N-1-j中;若j+v<N-1,则第v个数据块位于硬盘v,否则位于硬盘v+1,其中,0≤i<(1×N),j=i MOD N(MOD为模运算),0≤v<N-1;Wherein, the check block in the strip i is located in the hard disk N-1-j; if j+v<N-1, the vth data block is located on the hard disk v, otherwise it is located on the hard disk v+1, wherein 0≤i <(1×N), j=i MOD N (MOD is a modulo operation), 0≤v<N-1;
步骤1.2划分步骤1.1中的每个数据块和校验块为M个大小相等的子块,每个子块包含若干个地址连续的存储区,分别称为数据子块Strip和校验子块PStrip;Step 1.2: Each data block and the check block in step 1.1 are M equal-sized sub-blocks, and each sub-block includes a plurality of consecutive storage areas, which are respectively called a data sub-block strip and a parity sub-block PStrip;
步骤1.3步骤1.1中每个条带中的盘内起始地址相同的子块组成一个子条带Stripe,再将该子条带内的Strip进行异或运算,生成该子条带内的PStrip;Step 1.3 in step 1.1, each sub-block with the same starting address in the strip is formed into a sub-strip stripe, and then the strip in the sub-strip is XORed to generate a PStrip in the sub-strip;
其中,每个条带中包含M个大小相同的子条带;子条带Stripe m的校验子块PStrip m由其N-1个数据子块Strip m异或生成,见式(1),;Wherein each strip includes M sub-strips of the same size; the sub-band PStrip m of the strip stripe stripe is generated by X X of its N-1 data sub-blocks, see equation (1), ;
所述存储空间动态映射,具体为:The storage space dynamic mapping is specifically:
采用逻辑区块地址与物理区块地址的动态映射机制对磁盘阵列存储空间进行分配管理,磁盘阵列层收到的写数据可以被动态映射到不同数量的硬盘上;即:根据负载的性能需求参数k,动态分配具有k个硬盘并行度的存储空间,即k为需要并发写数据的硬盘数量,不包括所写数据的校验数据所在的硬盘;当负载最小时,仅映射到1块硬盘上,仅向该硬盘写数据;负载最大时映射到不包括其校验数据所在硬盘的其余硬盘上,向不包括其校验数据所在硬盘的其余硬盘并发写数据;The disk array storage space is allocated and managed by the dynamic mapping mechanism of the logical block address and the physical block address, and the write data received by the disk array layer can be dynamically mapped to different numbers of hard disks; that is, according to the performance requirement parameters of the load k, dynamically allocate storage space with k hard disk parallelism, that is, k is the number of hard disks that need to write data concurrently, and does not include the hard disk where the verification data of the written data is located; when the load is minimum, only maps to one hard disk Write data only to the hard disk; when the load is maximum, it maps to the remaining hard disks of the hard disk that does not include the check data, and writes data to the remaining hard disks of the hard disk that does not include the check data;
存储空间动态映射,具体步骤为: Dynamic mapping of storage space, the specific steps are:
(1)在CurBank中选择自由Strip最多的Stripe作为CurStripe;(1) Select the Stripe with the most free strips in CurBank as CurStripe;
其中,所述的自由Strip为未进行映射的Strip;条带链表是由所有条带组成的一个单向循环链表,CurBank为当前进行映射的条带,初始值为条带0;CurStripe为CurBank中可进行映射的Stripe;The free strip is an unmapped strip; the strip list is a one-way circular list consisting of all strips, CurBank is the currently mapped strip, the initial value is strip 0; CurStripe is CurBank a stripe that can be mapped;
(2)如果CurStripe中自由Strip数为0,表明CurStripe无自由Strip可映射,等价于CurBank无自由Strip可映射,转到(3),否则转到(5);(2) If the number of free strips in CurStripe is 0, it means that CurStripe has no free strip mapable, equivalent to CurBank without free strip mapable, go to (3), otherwise go to (5);
(3)判断NextBank是否有自由Strip可映射,如果没有自由Strip可映射,则删除NextBank上的存储数据,进行存储空间回收;其中,NextBank为下一个进行映射的条带,与CurBank编号相邻,初始值为条带1;(3) Determine whether NextBank has free strip mapability. If there is no free strip mappable, delete the storage data on NextBank and perform storage space recovery. Among them, NextBank is the next stripe to be mapped, adjacent to the CurBank number. The initial value is strip 1;
(4)将NextBank作为CurBank并重新获取CurStripe,然后将NextBank顺序后移;(4) Take NextBank as CurBank and re-acquire CurStripe, then move the NextBank back in order;
(5)如果CurStripe中自由Strip数不小于k,则从CurStripe中顺序取出k个Strip,转到(7),否则转到(6);(5) If the number of free strips in CurStripe is not less than k, then sequentially extract k strips from CurStripe, go to (7), otherwise go to (6);
(6)先从CurStripe中取出所有自由Strip,再从NextStripe取出余下所需的自由Strip,一起组成k个自由Strip,如果NextStripe没有足够的自由Strip,就删除NextBank上的存储数据,回收存储空间,并重新获取NextStripe;其中,NextStripe为NextBank中可进行映射的Stripe;(6) First take all the free strips from CurStripe, and then take the remaining free strips from NextStripe to form k free strips. If NextStripe does not have enough free strips, delete the stored data on NextBank and recycle the storage space. And re-acquire NextStripe; where, NextStripe is the Stripe that can be mapped in NextBank;
(7)获得k个自由Strip后,进行存储空间映射,把逻辑地址空间映射到具有k个硬盘并行度的物理地址空间;并将映射关系记录在映射表中;(7) after obtaining k free strips, performing storage space mapping, mapping the logical address space to a physical address space having k hard disk parallelism; and recording the mapping relationship in the mapping table;
根据Strip所在的条带、子条带、以及子条带内的编号,确定该Strip所在磁盘及盘内偏移量,并将其记录到映射表中;读操作时根据映射表获得数据在磁盘上的位置;映射表作为元数据的重要组成部分,保存在每个正在工作的磁盘的尾部,内带一个版本编号,版本编号按时间先后由小到大,磁盘阵列在断电恢复时装入最大编号的版本;Determine the offset of the disk and disk in the strip according to the strip, the sub-strip, and the number in the sub-strip where the strip is located, and record it in the mapping table. When reading, obtain data on the disk according to the mapping table. The location; the mapping table is an important part of the metadata, stored at the end of each working disk, with a version number, the version number is from small to large, and the disk array is loaded the most when the power is restored. Numbered version
所述访问竞争避让,具体步骤为:The access competition avoids, the specific steps are:
1)DPPDL选择跨越2个条带的Strip进行存储空间映射时,首先从CurStripe中取出所有自由Strip,再从NextStripe取出余下所需的自由Strip,一起组成k+1个自由Strip,如果NextStripe没有足够的自由Strip,就删除NextBank上的存储数据,回收存储空间,并重新获取NextStripe;1) When DPPDL selects a strip across 2 stripes for storage space mapping, first take all free strips from CurStripe, and then take the remaining free strips from NextStripe to form k+1 free strips together, if NextStripe is not enough Free Strip, delete the stored data on NextBank, reclaim storage space, and regain NextStripe;
2)DPPDL进行访问竞争检查,若没有访问竞争或末尾Strip引起访问竞争,则删除末尾Strip;否则,删除引起访问竞争的Strip。最后获得不会引发访问竞争且可并发访问的k个Strip;2) DPPDL performs the access competition check. If there is no access competition or the end strip causes the access competition, the last strip is deleted; otherwise, the strip causing the access competition is deleted. Finally, obtain k strips that do not cause access competition and can be accessed concurrently;
其中,所述的访问竞争检查,指同一硬盘是否有2个子块被并发访问; The access competition check refers to whether two sub-blocks of the same hard disk are concurrently accessed;
访问竞争避让可用于替换存储空间动态映射中的步骤(6);The access competition avoidance can be used to replace the step (6) in the storage space dynamic mapping;
性能需求感知,具体为:Performance demand awareness, specifically:
连续数据存储应用,需要稳定的数据传输率,因此DPPDL把数据传输率作为性能需求指标。Continuous data storage applications require a stable data transfer rate, so DPPDL uses the data transfer rate as a performance requirement indicator.
为了在线感知负载的性能需求,即数据传输率需求,In order to sense the performance requirements of the load online, that is, the data transmission rate requirement,
步骤A.DPPDL统计磁盘阵列层I/O请求队列的历史信息;Step A. DPPDL statistics history information of the disk array layer I/O request queue;
步骤B.DPPDL进行分析预测;Step B. DPPDL performs analysis and prediction;
对于连续数据存储应用,负载的波动周期或突发时间一般较大,可根据时间窗口T内的平均数据传输率来感知负载需求的数据传输率,用rn=(ta,pos,len)记录时间窗口T内的第n个I/O请求;其中,ta、pos、len分别为请求rn的到来时间、起始逻辑地址和请求长度,用rn.len表示rn的请求长度len;For continuous data storage applications, the fluctuation period or burst time of the load is generally large, and the data transmission rate of the load demand can be perceived according to the average data transmission rate in the time window T, and the time is recorded by rn=(ta, pos, len). The nth I/O request in the window T; wherein, ta, pos, and len are the arrival time, the starting logical address, and the request length of the request rn, respectively, and the request length len of rn is represented by rn.len;
感知负载需求的数据传输率:Data transfer rate for perceived load demand:
其中,num为时间窗口T内到来的I/O数;β为性能系数,可在1.2~1.5之间取值;Where num is the number of I/Os coming in the time window T; β is the coefficient of performance, which can be between 1.2 and 1.5;
5s<T<15s;5s<T<15s;
表示对时间窗口T内的num个I/O请求进行长度求和;Representing the length summation of num I/O requests in the time window T;
其中,公式(2)中的I/O请求rn来自RAID层的请求队列Where the I/O request rn in equation (2) comes from the request queue of the RAID layer.
DPPDL感知到负载需求的数据传输率之后,根据实际应用场景中不同硬盘并行度能够提供的数据传输率,确定需要并发写数据的硬盘数量kAfter DPPDL senses the data transmission rate of the load demand, it determines the number of hard disks that need to concurrently write data according to the data transmission rate that can be provided by different hard disk parallelism in the actual application scenario.
本发明提出的一种面向连续数据存储的动态局部并行数据布局,与已有技术比较,具有以下优点:The dynamic local parallel data layout oriented by continuous data storage proposed by the invention has the following advantages compared with the prior art:
1.可提供大弹性并行度,具有更高的节能效率,具体为:1. It can provide large elastic parallelism and has higher energy saving efficiency, specifically:
DPPDL采用动态局部并行策略,根据不同负载的性能需求,动态分配具有合适并行度的存储空间。DPPDL既可保证多数磁盘长时间待机节能,又能动态提供合适的局部并行度,具有更高的可用性,以及更高的节能效率;DPPDL uses a dynamic local parallel strategy to dynamically allocate storage space with appropriate parallelism according to the performance requirements of different workloads. DPPDL not only guarantees long-term standby energy saving for most disks, but also dynamically provides appropriate local parallelism, higher availability, and higher energy efficiency;
2.解决了动态局部并行与顺序删除特性之间的矛盾,具体为:2. Solved the contradiction between dynamic local parallelism and sequential deletion features, specifically:
对于连续数据存储系统,当存储空间写满后,一般按时间删除最早的存储数据、然后写入新数据,称为顺序删除特性。动态局部并行与顺序删除特性之间存在矛盾,例如当存储空间写满后,如果当前负载需要5块磁盘并行,但最早数据都存储在2块局部并行的磁盘上,此时无法在顺序删除数据的条件下,再增加3块磁盘并行;For a continuous data storage system, when the storage space is full, the oldest stored data is generally deleted by time, and then new data is written, which is called a sequential deletion feature. There is a contradiction between the dynamic local parallelism and the sequential deletion feature. For example, when the storage space is full, if the current load requires 5 disks in parallel, but the oldest data is stored on 2 partially parallel disks, the data cannot be deleted sequentially. Under the conditions, add 3 more disks in parallel;
DPPDL在宏观上按条带依次进行存储空间的分配与回收,当条带数较大时(对于 大容量磁盘是可行的),可保证基本按时间顺序删除数据。微观上以Strip为映射单元,根据性能需求在Stripe上选择数量合适的Strip并行,动态提供合适的并行度,因此解决了动态局部并行与顺序删除特性之间的矛盾。DPPDL sequentially allocates and reclaims storage space in strips, when the number of stripes is large (for Large-capacity disks are possible) to ensure that data is deleted in a substantially chronological order. Microscopically, Strip is used as the mapping unit, and the appropriate amount of Strip parallel is selected on Stripe according to performance requirements, and the appropriate parallelism is dynamically provided, thus solving the contradiction between dynamic local parallelism and sequential deletion characteristics.
被结合在说明书中并构成说明书的一部分的附图示出了本发明的实施例,并且连同其说明一起用于解释本发明的原理。The accompanying drawings, which are incorporated in FIG
图1为本发明实施例一种面向连续数据存储的动态局部并行数据布局方法的步骤流程图;1 is a flow chart showing the steps of a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention;
图2为本发明实施例一种面向连续数据存储的动态局部并行数据布局方法中的条带划分的总体示意图;2 is a schematic diagram of strip partitioning in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention;
图3为本发明实施例一种面向连续数据存储的动态局部并行数据布局方法中的条带划分的细分示意图;3 is a schematic diagram of subdivision of strip partitioning in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention;
图4为本发明实施例一种面向连续数据存储的动态局部并行数据布局方法中的存储空间动态映射示意图;4 is a schematic diagram of dynamic mapping of a storage space in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention;
图5为本发明实施例一种面向连续数据存储的动态局部并行数据布局方法中的访问竞争产生示意图;FIG. 5 is a schematic diagram of access competition generation in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention; FIG.
图6为本发明实施例一种面向连续数据存储的动态局部并行数据布局方法中的访问竞争避让示意图;6 is a schematic diagram of access competition avoidance in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention;
图7为本发明实施例一种面向连续数据存储的动态局部并行数据布局方法中的访问竞争避让后的存储空间动态映射示意图;FIG. 7 is a schematic diagram of dynamic mapping of storage space after access competition avoidance in a dynamic local parallel data layout method for continuous data storage according to an embodiment of the present invention; FIG.
图8为本发明实施例一种面向连续数据存储的动态局部并行数据布局装置的结构示意图。FIG. 8 is a schematic structural diagram of a dynamic local parallel data layout apparatus for continuous data storage according to an embodiment of the present invention.
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施例对本发明进行详细说明。The present invention will be described in detail below with reference to the drawings and specific embodiments.
如图1所示,本发明实施例的面向连续数据存储的动态局部并行数据布局方法,具体可以包括如下步骤:As shown in FIG. 1 , the dynamic local parallel data layout method for continuous data storage in the embodiment of the present invention may specifically include the following steps:
步骤101:条带划分,将N块硬盘中的每块硬盘平均分成1×N个存储块;Step 101: Striping, dividing each hard disk in the N hard disks into 1×N storage blocks equally;
步骤102:性能需求感知,获得存储应用的性能需求;Step 102: Perceive performance requirements and obtain performance requirements of the storage application;
步骤103:判断是否存在访问竞争,如果是,则执行步骤104,否则执行步骤105;Step 103: Determine whether there is access competition, if yes, proceed to step 104, otherwise perform
步骤104:访问竞争避让,优化该存储空间动态映射机制; Step 104: Accessing the competition avoidance and optimizing the dynamic mapping mechanism of the storage space;
步骤105:采用动态映射机制对存储空间进行分配管理。Step 105: Perform allocation management on the storage space by using a dynamic mapping mechanism.
具体通过下述技术方案实现:Specifically achieved through the following technical solutions:
包括条带划分、存储空间动态映射、访问竞争避让以及性能需求感知;首先通过性能需求感知获得存储应用的性能需求,然后执行存储空间动态映射机制,根据性能需求动态分配具有合适硬盘并行度的存储空间,最后通过访问竞争避让优化该存储空间动态映射机制;Including stripe partitioning, storage space dynamic mapping, access competition avoidance, and performance demand awareness; firstly, the performance requirements of the storage application are obtained through performance requirement perception, and then the storage space dynamic mapping mechanism is executed, and the storage with the appropriate hard disk parallelism is dynamically allocated according to the performance requirement. Space, and finally optimize the storage space dynamic mapping mechanism by accessing the competition avoidance;
其中,存储空间动态映射是核心,性能需求感知是存储空间动态映射的依据,条带划分是存储空间动态映射的前提,而访问竞争避让是存储空间动态映射的优化与完备。Among them, storage space dynamic mapping is the core, performance requirement perception is the basis of storage space dynamic mapping, stripe partitioning is the premise of storage space dynamic mapping, and access competition avoidance is the optimization and completeness of storage space dynamic mapping.
本发明实施例是在5块磁盘上构建动态局部并行数据布局DPPDL的基本过程,包括条带划分、存储空间动态映射、访问竞争避让、性能需求感知四个方面的内容;其中,每个磁盘的磁盘容量为4TB(1TB=103GB=106MB=109KB,1KB=1024B)。The embodiment of the present invention is a basic process for constructing a dynamic local parallel data layout DPPDL on five disks, including strip partitioning, storage space dynamic mapping, access competition avoidance, and performance requirement sensing; wherein each disk is The disk capacity is 4TB (1TB=103GB=106MB=109KB, 1KB=1024B).
优选地,所述条带划分,具体步骤为:Preferably, the strip is divided, and the specific steps are:
步骤1.1将N块硬盘中的每块硬盘平均分成1×N个存储块;其中,1大于等于1,N大于等于3;Step 1.1 divides each hard disk in the N hard disks into 1×N storage blocks; wherein, 1 is greater than or equal to 1, and N is greater than or equal to 3;
步骤1.2划分步骤1.1中的每个数据块和校验块为M个大小相等的子块,每个子块包含若干个地址连续的存储区(如磁盘扇区),分别称为数据子块(记作Strip)和校验子块(记作PStrip);Step 1.2: Each data block and check block in step 1.1 is M equal-sized sub-blocks, and each sub-block includes a plurality of consecutive storage areas (such as disk sectors), which are respectively called data sub-blocks. Strip and check sub-blocks (denoted as PStrip);
步骤1.3步骤1.1中每个条带中的盘内起始地址相同的子块组成一个子条带(记作Stripe),再将该子条带内的Strip进行异或运算,生成该子条带内的PStrip。In step 1.3, the sub-blocks in the strip with the same starting address in each strip form a sub-strip (recorded as stripe), and then the strip in the sub-band is XORed to generate the sub-strip. Inside the PStrip.
优选地,上述步骤1.1中每块硬盘内的起始地址相同的N个存储块组成一个条带,共组成1×N个条带;每个条带包含1个校验存储块,N-1个数据存储块,校验存储块简称校验块,数据存储块简称数据块;Preferably, in the above step 1.1, the N storage blocks having the same starting address in each hard disk form a stripe, which constitutes 1×N strips; each strip includes one parity storage block, N-1 Data storage block, check storage block referred to as check block, data storage block referred to as data block;
其中,条带i中的校验块位于硬盘N-1-j中;若j+v<N-1,则第v个数据块位于硬盘v,否则位于硬盘v+1,其中,0≤i<(1×N),j=i MOD N(MOD为模运算),0≤v<N-1。Wherein, the check block in the strip i is located in the hard disk N-1-j; if j+v<N-1, the vth data block is located on the hard disk v, otherwise it is located on the hard disk v+1, wherein 0≤i <(1×N), j=i MOD N (MOD is a modulo operation), 0≤v<N-1.
优选地,在步骤1.3中,每个条带中包含M个大小相同的子条带;子条带Stripe m的校验子块PStrip m由其N-1个数据子块Strip m异或生成,;Preferably, in step 1.3, each strip includes M sub-bands of the same size; the parity sub-block PStrip m of the strip stripe stripe is generated by X X of its N-1 data sub-blocks, ;
则生成方式为:Then the generation method is:
优选地,所述存储空间动态映射的操作思路为:Preferably, the operation idea of the dynamic mapping of the storage space is:
采用逻辑区块地址与物理区块地址的动态映射机制对磁盘阵列存储空间进行分配管理,磁盘阵列层收到的写数据可以被动态映射到不同数量的硬盘上;即:根据负载 的性能需求参数k,动态分配具有k个硬盘并行度的存储空间,即k为需要并发写数据的硬盘数量,不包括所写数据的校验数据所在的硬盘;当负载最小时,仅映射到某1块硬盘上,仅向该硬盘写数据;负载最大时映射到除某一块1块硬盘之外的其余硬盘上,向除某一块1块硬盘之外的其余硬盘并发写数据;The disk array storage space is allocated and managed by the dynamic mapping mechanism of the logical block address and the physical block address, and the write data received by the disk array layer can be dynamically mapped to different numbers of hard disks; that is, according to the load The performance requirement parameter k dynamically allocates the storage space with k hard disk parallelism, that is, k is the number of hard disks that need to concurrently write data, and does not include the hard disk where the verification data of the written data is located; when the load is minimum, only maps to On a certain hard disk, only data is written to the hard disk; when the load is maximum, it is mapped to the remaining hard disks except one block of one hard disk, and the data is written to the remaining hard disks except one block of one hard disk;
本发明实施例中,存储空间动态映射,涉及的基本术语定义如下:In the embodiment of the present invention, the storage space dynamic mapping, the basic terms involved are defined as follows:
条带链表:由所有条带组成的一个单向循环链表;Strip linked list: a one-way circular linked list consisting of all strips;
CurBank:当前进行映射的条带,称为当前映射条带,初始值为条带0;CurBank: the current stripe to be mapped, called the current map strip, the initial value is stripe 0;
NextBank:下一个进行映射的条带,称为相邻映射条带,与CurBank编号相邻,初始值为条带1;NextBank: the next stripe to be mapped, called the adjacent map strip, adjacent to the CurBank number, the initial value is strip 1;
CurStripe:CurBank中可进行映射的Stripe;CurStripe: a route that can be mapped in CurBank;
NextStripe:NextBank中可进行映射的Stripe;NextStripe: Stripe that can be mapped in NextBank;
所述存储空间动态映射,具体步骤为:The storage space is dynamically mapped, and the specific steps are:
步骤(1)在CurBank中选择自由Strip最多的Stripe作为CurStripe;Step (1) Selecting the stripe with the largest free strip as the CurStripe in CurBank;
其中,所述自由Strip为未进行映射的Strip;Wherein the free strip is a strip that is not mapped;
步骤(2)如果CurStripe中自由Strip数为0,表明CurStripe无自由Strip可映射,则转到步骤(3),否则转到步骤(5);Step (2) If the number of free strips in CurStripe is 0, indicating that CurStripe has no free strip mappable, go to step (3), otherwise go to step (5);
步骤(3)判断NextBank是否有自由Strip可映射,如果没有自由Strip可映射,则删除NextBank上的存储数据,进行存储空间回收;Step (3) determines whether NextBank has a free strip mappable, and if there is no free strip mappable, delete the stored data on the NextBank to perform storage space recovery;
步骤(4)将NextBank作为CurBank并重新获取CurStripe,然后将NextBank顺序后移;Step (4) takes NextBank as CurBank and reacquires CurStripe, and then moves NextBank back;
步骤(5)如果CurStripe中自由Strip数不小于k,则从CurStripe中顺序取出k个Strip,转到步骤(7),否则转到步骤(6);Step (5) If the number of free strips in CurStripe is not less than k, then sequentially extract k strips from CurStripe, go to step (7), otherwise go to step (6);
步骤(6)先从CurStripe中取出所有自由Strip,再从NextStripe取出余下所需的自由Strip,一起组成k个自由Strip。如果NextStripe没有足够的自由Strip,就删除NextBank上的存储数据,回收存储空间,并重新获取NextStripe;Step (6) First take all free strips from CurStripe, and then take the remaining free strips from NextStripe to form k free strips. If NextStripe does not have enough free strips, delete the stored data on NextBank, reclaim the storage space, and re-acquire NextStripe;
步骤(7)获得k个自由Strip后,进行存储空间映射,把逻辑地址空间映射到具有k个硬盘并行度的物理地址空间,并将映射关系记录在映射表中;Step (7) obtains k free strips, performs storage space mapping, maps the logical address space to a physical address space having k hard disks parallelism, and records the mapping relationship in the mapping table;
根据Strip所在的条带、子条带、以及子条带内的编号,确定该Strip所在磁盘及盘内偏移量,并将其记录到映射表中;读操作时根据映射表获得数据在磁盘上的位置;映射表作为元数据的重要组成部分,保存在每个正在工作的磁盘的尾部,内带一个版本编号,版本编号按时间先后由小到大,磁盘阵列在断电恢复时装入最大编号的版本。Determine the offset of the disk and disk in the strip according to the strip, the sub-strip, and the number in the sub-strip where the strip is located, and record it in the mapping table. When reading, obtain data on the disk according to the mapping table. The location; the mapping table is an important part of the metadata, stored at the end of each working disk, with a version number, the version number is from small to large, and the disk array is loaded the most when the power is restored. Numbered version.
优选地,所述访问竞争避让,具体步骤为: Preferably, the access competition avoids, and the specific steps are:
步骤1)DPPDL选择跨越2个条带的Strip进行存储空间映射时,首先从CurStripe中取出所有自由Strip,再从NextStripe取出余下所需的自由Strip,一起组成k+1个自由Strip,如果NextStripe没有足够的自由Strip,就删除NextBank上的存储数据,回收存储空间,并重新获取NextStripe;Step 1) When DPPDL selects a strip across 2 strips for storage space mapping, first take all free strips from CurStripe, and then take the remaining free strips from NextStripe and form k+1 free strips together. If NextStripe does not. With enough free strips, delete the stored data on NextBank, reclaim the storage space, and re-acquire NextStripe;
步骤2)DPPDL进行访问竞争检查,若没有访问竞争或末尾Strip引起访问竞争,则删除末尾Strip;否则,删除引起访问竞争的Strip;最后获得不会引发访问竞争且可并发访问的k个Strip;Step 2) DPPDL performs an access competition check. If there is no access competition or the end strip causes the access competition, the last strip is deleted; otherwise, the strip causing the access competition is deleted; finally, k strips that do not cause the access competition and can be concurrently accessed are obtained;
其中,所述访问竞争检查是指同一硬盘是否有2个子块被并发访问。The access competition check refers to whether two sub-blocks of the same hard disk are concurrently accessed.
优选地,所述性能需求感知,具体为:Preferably, the performance requirement is perceived, specifically:
连续数据存储应用,对响应时间不十分敏感,需要稳定的数据传输率,因此将数据传输率作为性能需求指标;Continuous data storage applications are not very sensitive to response time and require a stable data transfer rate, so the data transfer rate is used as a performance requirement indicator;
所述数据传输率需求,具体为:The data transmission rate requirement is specifically:
步骤A.DPPDL统计磁盘阵列层I/O请求队列的历史信息;Step A. DPPDL statistics history information of the disk array layer I/O request queue;
步骤B.DPPDL进行分析预测;Step B. DPPDL performs analysis and prediction;
对于连续数据存储应用,负载的波动周期或突发时间较大,可根据时间窗口T内的平均数据传输率来感知负载需求的数据传输率,用rn=(ta,pos,len)记录时间窗口T内的第n个I/O请求;其中,ta、pos、len分别为请求rn的到来时间、起始逻辑地址和请求长度,rn.len表示rn的请求长度len;For continuous data storage applications, the fluctuation period or burst time of the load is large, and the data transmission rate of the load demand can be perceived according to the average data transmission rate in the time window T, and the time window is recorded with rn=(ta, pos, len). The nth I/O request in T; where ta, pos, and len are the arrival time, the starting logical address, and the request length of the request rn, respectively, and rn.len represents the request length len of rn;
感知负载需求的数据传输率表示为:The data transfer rate for perceived load demand is expressed as:
其中,num为时间窗口T内到来的I/O数;β为性能系数,可在1.2~1.5之间取值;Where num is the number of I/Os coming in the time window T; β is the coefficient of performance, which can be between 1.2 and 1.5;
5s<T<15s;5s<T<15s;
表示对时间窗口T内的num个I/O请求进行长度求和;Representing the length summation of num I/O requests in the time window T;
其中,公式(2)中的I/O请求rn来自RAID层的请求队列;Wherein, the I/O request rn in formula (2) comes from the request queue of the RAID layer;
DPPDL感知到负载需求的数据传输率之后,根据实际应用场景中不同硬盘并行度能够提供的数据传输率,确定需要并发写数据的硬盘数量。After the DPPDL senses the data transmission rate of the load demand, it determines the number of hard disks that need to concurrently write data according to the data transmission rate that can be provided by different hard disk parallelism in the actual application scenario.
下面将结合说明书附图对所述条带划分、所述存储空间动态映射、所述访问竞争避让以及所述性能需求感知进行详细说明,具体地:The strip partitioning, the storage space dynamic mapping, the access competition avoidance, and the performance requirement perception are described in detail below with reference to the accompanying drawings, specifically:
1、条带划分1, strip division
图2为本实施例中的条带划分总体示意图。从图2可以看出每块磁盘平均分成5个存储块(其中,l=1),由于每个磁盘容量为4TB,所以每个存储块大小为4TB /5=800GB;每个磁盘中盘内起始地址相同的5个存储块组成一个条带,共组成5个条带;从图2还可以看出,每个条带包含1个校验块和4个数据块,条带0的校验块位于磁盘4、条带1的校验块位于磁盘3、……、条带4的校验块位于磁盘0。FIG. 2 is a schematic overall view of the strip division in the embodiment. It can be seen from Figure 2 that each disk is equally divided into 5 memory blocks (where l=1). Since each disk has a capacity of 4TB, each memory block size is 4TB. /5=800GB; each of the five memory blocks in the disk with the same starting address in the disk form a stripe, which forms a total of five strips; as can be seen from Figure 2, each strip contains one check block. And 4 data blocks, the check block of stripe 0 is located on the disk 4, the check block of the stripe 1 is located on the disk 3, ..., and the check block of the stripe 4 is located on the disk 0.
进一步地,可以对上述条带划分进行细分,如图3所示。从图3中可以看出,把每个数据块、校验块划分为若干个大小相等的子块;本实施例中取子块大小为100KB(即:包含200个地址连续的扇区,扇区大小为512字节),则每个数据块、校验块分成M=800,0000个相等的子块,分别称为数据子块、校验子块。子条带的校验子块由该子条带的4个数据子块异或运算生成;如条带0内子条带1的校验子块,由该子条带的4个数据子块异或运算生成。Further, the above-described strip division may be subdivided as shown in FIG. As can be seen from FIG. 3, each data block and the check block are divided into a plurality of equal-sized sub-blocks; in this embodiment, the sub-block size is 100 KB (ie, a sector containing 200 consecutive addresses, a fan) The size of the area is 512 bytes, and each data block and check block is divided into M=800,0000 equal sub-blocks, which are respectively called data sub-blocks and parity sub-blocks. The parity sub-block of the sub-strip is generated by the X-OR operation of the four data sub-blocks of the sub-strip; for example, the parity sub-block of the sub-strip 1 in the strip 0, and the four data sub-blocks of the sub-strip Or the operation is generated.
2、存储空间动态映射2, storage space dynamic mapping
在进行条带划分后,DPPDL采用动态映射机制对存储空间进行分配管理。图4给出了5种写负载(A~E)在存储空间内的地址映射过程,选取其中的条带0和条带1进行说明,假设每个条带包含6个子条带(此处仅为阐述方便,本实施例中实际子条带数为M=800,0000),映射粒度为子块大小。After striping, DPPDL uses dynamic mapping mechanism to allocate and manage storage space. Figure 4 shows the address mapping process of five kinds of write loads (A ~ E) in the storage space, select strip 0 and strip 1 to illustrate, assuming each strip contains 6 sub strips (here only For convenience of description, the actual number of sub-bands in this embodiment is M=800, 0000), and the mapping granularity is the sub-block size.
假设负载A~E分别需要2、4、3、1、2块磁盘并行,其后数字为该负载持续的时间段编号(1~16),如图4中的A2表示负载A在时段2内运行;负载持续的时间可各不相同。It is assumed that the loads A to E respectively need 2, 4, 3, 1, 2 disks in parallel, and the numbers thereafter are the time period number (1 to 16) of the load duration, as shown in Figure 4, A2 indicates that the load A is in the period 2 Run; the duration of the load can vary.
负载A需要2块磁盘(不包括其校验数据所在的磁盘)并行,即并发向磁盘0、磁盘1写数据,持续了3个时间段,即时间段1、时间段2、时间段3;Load A requires two disks (excluding the disk where the check data is located) in parallel, that is, concurrently writes data to disk 0 and disk 1, for three time periods, namely, time period 1, time period 2, and time period 3;
负载B需要4块磁盘(不包括其校验数据所在的磁盘)并行,即并发向磁盘0、磁盘1、磁盘2、磁盘3写数据,持续了3个时间段,即时间段4、时间段5、时间段6;Load B requires 4 disks (excluding the disk where the check data is located) in parallel, that is, concurrently writes data to disk 0, disk 1, disk 2, and disk 3 for 3 time periods, that is, time period 4, time period 5, time period 6;
负载C需要3块磁盘(不包括其校验数据所在的磁盘)并行,即并发向磁盘2、磁盘3、磁盘0写数据,持续了3个时间段,即时间段7、时间段8、时间段9;Load C requires 3 disks (excluding the disk where the check data is located) in parallel, that is, concurrently writes data to disk 2, disk 3, and disk 0 for 3 time periods, that is, time period 7, time period 8, time Paragraph 9;
负载D需要1块磁盘(不包括其校验数据所在的磁盘)并行,持续了5个时间段。在时间10、时间段11、时间段12内,向磁盘0写数据;在时间段13、时间段14内向磁盘1写数据。Load D requires 1 disk (not including the disk where the parity data is located) in parallel for 5 periods. In time 10, time period 11, time period 12, data is written to disk 0; data is written to disk 1 during time period 13, time period 14.
负载E需要2块磁盘(不包括其校验数据所在的磁盘)并行,即并发向磁盘1、磁盘2写数据,持续了2个时间段,即时间段15、时间段16。The load E needs two disks (excluding the disk where the check data is located) in parallel, that is, concurrently writes data to the disk 1, the disk 2, and lasts for two time periods, that is, the time period 15 and the time period 16.
数据优先写入当前工作磁盘,当前工作磁盘能够满足性能需求时,不需访问待机磁盘,既具有良好的局部并行性,又可动态分配具有合适并行度的存储空间。写负载最小时(如负载D),仅向1个磁盘写数据;写负载最大时(如负载B),向4块磁盘并发写数据。DPPDL可提供大弹性并行度,满足不同负载的性能需求,同时具有很 高的节能效率。The data is preferentially written to the current working disk. When the current working disk can meet the performance requirements, it does not need to access the standby disk, and has good local parallelism and dynamically allocates storage space with appropriate parallelism. When the write load is minimum (such as load D), only write data to 1 disk; when the write load is maximum (such as load B), write data to 4 disks concurrently. DPPDL provides large elastic parallelism to meet the performance needs of different loads, and has very High energy efficiency.
3、访问竞争避让3, access to competition to avoid
DPPDL在采用上述动态映射机制对存储空间进行分配管理时,存在访问竞争问题。当并发访问来自2个条带的数据子块时,可能需要并发访问某一块磁盘,从而引发访问竞争,产生性能瓶颈。DPPDL has access competition problems when using the above dynamic mapping mechanism to allocate and manage storage space. When concurrently accessing data sub-blocks from two stripes, it may be necessary to access a certain disk concurrently, causing access competition and performance bottlenecks.
图5为本实施例中的访问竞争产生示意图。FIG. 5 is a schematic diagram of the access competition generated in the embodiment.
访问竞争避让的前提条件为:The prerequisites for accessing competition avoidance are:
当从2个条带取出k个自由Strip进行映射时,由于校验子块PStrip的存在,可能会并发访问相同的磁盘,从而引发访问竞争,产生性能瓶颈;访问竞争会严重影响存储性能,需要采取有效措施予以消除。When k free strips are taken from two strips for mapping, the same disk may be accessed concurrently due to the existence of the parity strip PStrip, causing access competition and performance bottlenecks. Access competition may seriously affect storage performance. Take effective measures to eliminate it.
从图5可以看出,负载C在时间段8(C8)内运行时,并发访问来自条带0、条带1内的数据子块。由于需要生产校验数据,在条带0中需要并发访问磁盘2、磁盘3、磁盘4;在条带1中需要并发访问磁盘0、磁盘3。此时,磁盘3需要被并发访问,其负载大约是磁盘2、磁盘4、磁盘0的2倍,因此成为性能瓶颈。时间段7(C7)、时间段9(C9)也存在访问竞争问题。As can be seen from FIG. 5, when the load C is running in the time period 8 (C8), the data sub-blocks from the strip 0 and the strip 1 are concurrently accessed. Due to the need to produce verification data, it is necessary to access disk 2, disk 3, and disk 4 concurrently in strip 0; in strip 1, concurrent access to disk 0 and disk 3 is required. At this time, the disk 3 needs to be accessed concurrently, and its load is about twice that of the disk 2, the disk 4, and the disk 0, thus becoming a performance bottleneck. There are also access competition issues in time period 7 (C7) and time period 9 (C9).
进一步的,为了消除访问竞争的问题则需采用访问竞争避让策略。如图6所示,负载C7需要并发访问跨越2个条带的3个数据子块,①先选择4个数据子块(图6中虚线方框处);②进行访问竞争检查,发现磁盘3上的数据子块(图6中带×的虚线方框处)、条带1的校验子块都位于磁盘3,会引起磁盘3的访问竞争,因此删除该数据子块;③把负载C并行写入剩余的3个数据子块(图6中标有C7的虚线方框处)。最后获得不会引发访问竞争且可并发访问的3个数据子块。Further, in order to eliminate the problem of access competition, an access competition avoidance strategy is needed. As shown in FIG. 6, the load C7 needs to concurrently access three data sub-blocks spanning two strips, one selects four data sub-blocks first (at the dotted line in FIG. 6); 2 performs an access competition check, and finds the disk 3 The upper data sub-block (at the dashed box with × in Figure 6) and the check sub-block of strip 1 are all located on the disk 3, which will cause the access of the disk 3 to compete, thus deleting the data sub-block; 3 load C The remaining 3 data sub-blocks are written in parallel (at the dashed box labeled C7 in Figure 6). Finally, three data sub-blocks that do not cause access competition and can be accessed concurrently are obtained.
C8、C9采用相同的方法进行访问竞争避让。最后,消除访问竞争的存储空间映射情况,如图7所示。从图7中可以看出,在任何一块磁盘上,都没有2个子块(包括数据子块和校验子块)被并发访问;C8 and C9 use the same method to perform access avoidance. Finally, the storage space mapping for accessing the competition is eliminated, as shown in Figure 7. As can be seen from Figure 7, no sub-blocks (including data sub-blocks and parity sub-blocks) are concurrently accessed on any one disk;
4、性能需求感知4, performance demand perception
DPPDL需要感知负载的性能需求,然后动态调整并行的磁盘数,以提供合适的性能,获得更高的节能效率。本实例采用公式(2)来感知负载需求的数据传输率,其中β取值为1.2,窗口时间T取值为5秒。DPPDL needs to sense the performance requirements of the load and then dynamically adjust the number of parallel disks to provide the right performance for higher energy efficiency. This example uses equation (2) to sense the data transmission rate of the load demand, where β is 1.2 and window time T is 5 seconds.
本公开的实施例可被实现为使用任意适当的硬件,固件,软件,或及其任意组合进行想要的配置的装置。图8示意性地示出了可被用于实现本申请中所述的各个实施例的示例性装置1300。
Embodiments of the present disclosure can be implemented as a device that performs the desired configuration using any suitable hardware, firmware, software, or any combination thereof. FIG. 8 schematically illustrates an
对于一个实施例,图8示出了示例性装置1300,该装置具有一个或多个处理器1302、被耦合到(一个或多个)处理器1302中的至少一个的控制模块(芯片组)1304、被耦合到控制模块1304的存储器1306、被耦合到控制模块1304的非易失性存储器(NVM)/存储设备1308、被耦合到控制模块1304的一个或多个输入/输出设备1310,以及被耦合到控制模块1306的网络接口1312。For one embodiment, FIG. 8 illustrates an
处理器1302可包括一个或多个单核或多核处理器,处理器1302可包括通用处理器或专用处理器(例如图形处理器、应用处理器、基频处理器等)的任意组合。在一些实施例中,装置1300能够作为本申请实施例中所述的转码端的服务器等设备。
在一些实施例中,装置1300可包括具有指令1314的一个或多个计算机可读介质(例如,存储器1306或NVM/存储设备1308)以及与该一个或多个计算机可读介质相合并被配置为执行指令1314以实现模块从而执行本公开中所述的动作的一个或多个处理器1302。In some embodiments,
对于一个实施例,控制模块1304可包括任意适当的接口控制器,以向(一个或多个)处理器1302中的至少一个和/或与控制模块1304通信的任意适当的设备或组件提供任意适当的接口。For one embodiment,
控制模块1304可包括存储器控制器模块,以向存储器1306提供接口。存储器控制器模块可以是硬件模块、软件模块和/或固件模块。
存储器1306可被用于例如为装置1300加载和存储数据和/或指令1314。对于一个实施例,存储器1306可包括任意适当的易失性存储器,例如,适当的DRAM。在一些实施例中,存储器1306可包括双倍数据速率类型四同步动态随机存取存储器(DDR4SDRAM)。Memory 1306 can be used, for example, to load and store data and/or instructions 1314 for
对于一个实施例,控制模块1304可包括一个或多个输入/输出控制器,以向NVM/存储设备1308及(一个或多个)输入/输出设备1310提供接口。For one embodiment,
例如,NVM/存储设备1308可被用于存储数据和/或指令1314。NVM/存储设备1308可包括任意适当的非易失性存储器(例如,闪存)和/或可包括任意适当的(一个或多个)非易失性存储设备(例如,一个或多个硬盘驱动器(HDD)、一个或多个光盘(CD)驱动器和/或一个或多个数字通用光盘(DVD)驱动器)。For example, NVM/
NVM/存储设备1308可包括在物理上作为装置1300被安装在其上的设备的一部分的存储资源,或者其可被该设备访问可不必作为该设备的一部分。例如,NVM/存储设备1308可通过网络经由(一个或多个)输入/输出设备1310进行访问。NVM/
(一个或多个)输入/输出设备1310可为装置1300提供接口以与任意其他适当的设备通信,输入/输出设备1310可以包括通信组件、音频组件、传感器组件等。网络接
口1312可为装置1300提供接口以通过一个或多个网络通信,装置1300可根据一个或多个无线网络标准和/或协议中的任意标准和/或协议来与无线网络的一个或多个组件进行无线通信,例如接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合进行无线通信。The input/output device(s) 1310 can provide an interface to the
对于一个实施例,(一个或多个)处理器1302中的至少一个可与控制模块1304的一个或多个控制器(例如,存储器控制器模块)的逻辑封装在一起。对于一个实施例,(一个或多个)处理器1302中的至少一个可与控制模块1304的一个或多个控制器的逻辑封装在一起以形成系统级封装(SiP)。对于一个实施例,(一个或多个)处理器1302中的至少一个可与控制模块1304的一个或多个控制器的逻辑集成在同一模具上。对于一个实施例,(一个或多个)处理器1302中的至少一个可与控制模块1304的一个或多个控制器的逻辑集成在同一模具上以形成片上系统(SoC)。For one embodiment, at least one of the processor(s) 1302 can be packaged with the logic of one or more controllers (eg, memory controller modules) of the
在各个实施例中,装置1300可以但不限于是:服务器、台式计算设备或移动计算设备(例如,膝上型计算设备、手持计算设备、平板电脑、上网本等)等终端设备。在各个实施例中,装置1300可具有更多或更少的组件和/或不同的架构。例如,在一些实施例中,装置1300包括一个或多个摄像机、键盘、液晶显示器(LCD)屏幕(包括触屏显示器)、非易失性存储器端口、多个天线、图形芯片、专用集成电路(ASIC)和扬声器。In various embodiments,
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进,或者对其中部分技术特征进行等同替换,这些改进和替换也应视为本发明的保护范围。 The above is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can make some improvements or carry out some of the technical features without departing from the principles of the present invention. These modifications and substitutions are also considered to be within the scope of the invention.
Claims (9)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610594843.8A CN106293511B (en) | 2016-07-26 | 2016-07-26 | A kind of dynamic local parallel data layout method towards continuous data storage |
| CN201610594843.8 | 2016-07-26 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018019119A1 true WO2018019119A1 (en) | 2018-02-01 |
Family
ID=57652864
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2017/092403 Ceased WO2018019119A1 (en) | 2016-07-26 | 2017-07-10 | Method and device for dynamic partial-parallel data layout for continuous data storage |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN106293511B (en) |
| WO (1) | WO2018019119A1 (en) |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110858122A (en) * | 2018-08-23 | 2020-03-03 | 杭州海康威视系统技术有限公司 | Method and device for storing data |
| CN111124296A (en) * | 2019-12-12 | 2020-05-08 | 北京浪潮数据技术有限公司 | Method, device, equipment and storage medium for writing data to solid state disk |
| CN111338782A (en) * | 2020-03-06 | 2020-06-26 | 中国科学技术大学 | A Contention-Aware Node Allocation Method for Shared Burst Data Cache |
| CN115599315A (en) * | 2022-12-14 | 2023-01-13 | 阿里巴巴(中国)有限公司(Cn) | Data processing method, device, system, equipment and medium |
| CN116027990A (en) * | 2023-03-29 | 2023-04-28 | 苏州浪潮智能科技有限公司 | A kind of RAID card and its data access method, system and storage medium |
| CN116301662A (en) * | 2023-05-12 | 2023-06-23 | 合肥联宝信息技术有限公司 | Method for managing power consumption of solid-state hard disk and solid-state hard disk |
| CN117499442A (en) * | 2023-12-27 | 2024-02-02 | 天津数智物联科技有限公司 | Data efficient processing method for Internet of things energy monitoring device |
| CN120215842A (en) * | 2025-05-28 | 2025-06-27 | 苏州元脑智能科技有限公司 | A method for determining block coding of a storage system and a storage system |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109725826B (en) * | 2017-10-27 | 2022-05-24 | 伊姆西Ip控股有限责任公司 | Method, apparatus and computer readable medium for managing storage system |
| CN108073363B (en) * | 2017-12-28 | 2021-10-01 | 深圳市得一微电子有限责任公司 | Data storage method, storage device and computer readable storage medium |
| CN108519926B (en) * | 2018-03-31 | 2020-12-29 | 深圳忆联信息系统有限公司 | Self-adaptive RAID (redundant array of independent disks) group calculation method and device |
| WO2020010604A1 (en) * | 2018-07-13 | 2020-01-16 | 华为技术有限公司 | Ssd data reading method and device |
| CN109933570B (en) * | 2019-03-15 | 2020-02-07 | 中山大学 | Metadata management method, system and medium |
| CN110308875B (en) * | 2019-06-27 | 2023-07-14 | 深信服科技股份有限公司 | Data read-write method, device, equipment and computer readable storage medium |
| CN115202576B (en) * | 2022-06-22 | 2025-09-09 | 成都飞机工业(集团)有限责任公司 | Data read-write method based on layered shelf type storage structure |
| CN117075821B (en) * | 2023-10-13 | 2024-01-16 | 杭州优云科技有限公司 | Distributed storage method and device, electronic equipment and storage medium |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104461914A (en) * | 2014-11-10 | 2015-03-25 | 浪潮电子信息产业股份有限公司 | Self-adaptive optimization method for automatic thin provisioning |
| CN105204785A (en) * | 2015-10-15 | 2015-12-30 | 中国科学技术大学 | Disk array writemode selecting method based on I/O queue of disk |
| CN105426427A (en) * | 2015-11-04 | 2016-03-23 | 国家计算机网络与信息安全管理中心 | MPP database cluster replica realization method based on RAID 0 storage |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080276041A1 (en) * | 2007-05-01 | 2008-11-06 | International Business Machines Corporation | Data storage array scaling method and system with minimal data movement |
| CN101976174B (en) * | 2010-08-19 | 2012-01-25 | 北京同有飞骥科技股份有限公司 | Method for constructing energy-saving disk array of vertical configuration distribution check |
-
2016
- 2016-07-26 CN CN201610594843.8A patent/CN106293511B/en not_active Expired - Fee Related
-
2017
- 2017-07-10 WO PCT/CN2017/092403 patent/WO2018019119A1/en not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104461914A (en) * | 2014-11-10 | 2015-03-25 | 浪潮电子信息产业股份有限公司 | Self-adaptive optimization method for automatic thin provisioning |
| CN105204785A (en) * | 2015-10-15 | 2015-12-30 | 中国科学技术大学 | Disk array writemode selecting method based on I/O queue of disk |
| CN105426427A (en) * | 2015-11-04 | 2016-03-23 | 国家计算机网络与信息安全管理中心 | MPP database cluster replica realization method based on RAID 0 storage |
Non-Patent Citations (1)
| Title |
|---|
| LI, YUANZHANG ET AL.: "S-RAID 5: An Energy-Saving RAID for Sequential Access Based Applications", CHINESE JOURNAL OF COMPUTERS, vol. 36, no. 6, 30 June 2013 (2013-06-30) * |
Cited By (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110858122A (en) * | 2018-08-23 | 2020-03-03 | 杭州海康威视系统技术有限公司 | Method and device for storing data |
| CN110858122B (en) * | 2018-08-23 | 2023-10-20 | 杭州海康威视系统技术有限公司 | Method and device for storing data |
| CN111124296A (en) * | 2019-12-12 | 2020-05-08 | 北京浪潮数据技术有限公司 | Method, device, equipment and storage medium for writing data to solid state disk |
| CN111338782A (en) * | 2020-03-06 | 2020-06-26 | 中国科学技术大学 | A Contention-Aware Node Allocation Method for Shared Burst Data Cache |
| CN115599315A (en) * | 2022-12-14 | 2023-01-13 | 阿里巴巴(中国)有限公司(Cn) | Data processing method, device, system, equipment and medium |
| CN116027990A (en) * | 2023-03-29 | 2023-04-28 | 苏州浪潮智能科技有限公司 | A kind of RAID card and its data access method, system and storage medium |
| CN116301662A (en) * | 2023-05-12 | 2023-06-23 | 合肥联宝信息技术有限公司 | Method for managing power consumption of solid-state hard disk and solid-state hard disk |
| CN116301662B (en) * | 2023-05-12 | 2023-08-01 | 合肥联宝信息技术有限公司 | Method for managing power consumption of solid-state hard disk and solid-state hard disk |
| CN117499442A (en) * | 2023-12-27 | 2024-02-02 | 天津数智物联科技有限公司 | Data efficient processing method for Internet of things energy monitoring device |
| CN117499442B (en) * | 2023-12-27 | 2024-05-10 | 天津数智物联科技有限公司 | Data efficient processing method for Internet of things energy monitoring device |
| CN120215842A (en) * | 2025-05-28 | 2025-06-27 | 苏州元脑智能科技有限公司 | A method for determining block coding of a storage system and a storage system |
| CN120215842B (en) * | 2025-05-28 | 2025-08-08 | 苏州元脑智能科技有限公司 | A method for determining block coding of a storage system and a storage system |
Also Published As
| Publication number | Publication date |
|---|---|
| CN106293511A (en) | 2017-01-04 |
| CN106293511B (en) | 2018-12-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2018019119A1 (en) | Method and device for dynamic partial-parallel data layout for continuous data storage | |
| US12216929B2 (en) | Storage system, memory management method, and management node | |
| US11880579B2 (en) | Data migration method and apparatus | |
| US9940261B2 (en) | Zoning of logical to physical data address translation tables with parallelized log list replay | |
| US10019352B2 (en) | Systems and methods for adaptive reserve storage | |
| JP6007329B2 (en) | Storage controller, storage device, storage system | |
| KR102170539B1 (en) | Method for storing data by storage device and storage device | |
| JP5944587B2 (en) | Computer system and control method | |
| US9612758B1 (en) | Performing a pre-warm-up procedure via intelligently forecasting as to when a host computer will access certain host data | |
| CN104484130A (en) | Construction method of horizontal expansion storage system | |
| CN102982182B (en) | Data storage planning method and device | |
| EP3671423B1 (en) | Data access method and storage array | |
| CN103150128A (en) | Implementation method of solid state drive (SSD) and disk-based reliable mixed storage system | |
| CN103713861A (en) | File processing method and system based on hierarchical division | |
| CN109739696B (en) | Double-control storage array solid state disk caching acceleration method | |
| US10853252B2 (en) | Performance of read operations by coordinating read cache management and auto-tiering | |
| US11561695B1 (en) | Using drive compression in uncompressed tier | |
| CN106775453B (en) | A kind of construction method mixing storage array | |
| KR20150127434A (en) | Memory management apparatus and control method thereof | |
| US11947803B2 (en) | Effective utilization of different drive capacities | |
| CN117348789A (en) | Data access method, storage device, hard disk, storage system and storage medium | |
| US20250245149A1 (en) | Generating a logical to physical data structure for a solid state drive using sectors of different sizes | |
| US20250315163A1 (en) | System and Method for Machine Learning-Based Forecasting of Active Data Sets in Storage Systems | |
| TWI522797B (en) | System and method for energy-efficient distributed storage with fault tolerance | |
| CN119225651A (en) | Solid state drive use method, device and equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17833422 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17833422 Country of ref document: EP Kind code of ref document: A1 |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 250719) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17833422 Country of ref document: EP Kind code of ref document: A1 |