[go: up one dir, main page]

CN116185949B - Cache storage methods and related devices - Google Patents

Cache storage methods and related devices

Info

Publication number
CN116185949B
CN116185949B CN202211717784.0A CN202211717784A CN116185949B CN 116185949 B CN116185949 B CN 116185949B CN 202211717784 A CN202211717784 A CN 202211717784A CN 116185949 B CN116185949 B CN 116185949B
Authority
CN
China
Prior art keywords
area
file
metadata
current
partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211717784.0A
Other languages
Chinese (zh)
Other versions
CN116185949A (en
Inventor
刘日新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202211717784.0A priority Critical patent/CN116185949B/en
Publication of CN116185949A publication Critical patent/CN116185949A/en
Application granted granted Critical
Publication of CN116185949B publication Critical patent/CN116185949B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/164File meta data generation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The embodiment of the application discloses a cache storage method and related equipment, which are used for reducing the competition pressure of metadata update caused by mass cache storage. The method comprises the steps of storing current data of current file change in a data area, determining a first partition corresponding to the current file in a file identification area based on the current storage condition of the file identification area, wherein the first partition is used for recording the current file identification, storing file identification updating information in a metadata updating area, wherein the file identification updating information comprises identification of the first partition and the current file identification, and writing the current file identification into the first partition according to the identification of the first partition if the metadata updating area meets preset metadata updating conditions. The cache comprises a metadata area and a data area, wherein the metadata area comprises a file identification area and a current file identification for acquiring a current file by a metadata updating area.

Description

Cache storage method and related equipment
Technical Field
The embodiment of the application relates to the field of computer storage, in particular to a cache storage method and related equipment.
Background
Cache refers to a type of high-speed memory that has a faster access speed than a typical random access memory, and typically does not use dynamic random access memory (DRAM, dynamic random access memory) technology as in system main memory, but rather uses expensive but faster static random access memory (SRAM, static random access memory) technology, the placement of which is one of the important factors in achieving high performance in all modern computer systems.
The cache includes a metadata area for storing data and a data area for describing the data stored in the data area. If there is a large amount of data to be stored, a large amount of metadata to be stored in the metadata area is correspondingly generated, and in the prior art, the metadata is directly written into the corresponding writing position after determining the metadata to be written and the corresponding writing position.
When facing a large amount of metadata to be updated, the continuous writing of a large amount of metadata can affect the query performance of the cache, thereby bringing about competitive pressure during metadata updating.
Disclosure of Invention
The embodiment of the application provides a cache storage method and related equipment, which are used for reducing competitive pressure generated during metadata updating.
An embodiment of the present application provides a method for storing a cache, where the cache includes a metadata area and a data area, the metadata area includes a file identification area and a metadata update area, and the method includes:
acquiring a current file identifier of a current file;
Storing the current data of the current file change in the data area, and determining a first block corresponding to the current file in the file identification area based on the current storage condition of the file identification area, wherein the first block is used for recording the current file identification;
Storing file identification updating information in the metadata updating area, wherein the file identification updating information comprises the identification of the first block and the current file identification;
And if the metadata updating area meets the preset metadata updating condition, writing the current file identification into the first partition according to the identification of the first partition.
In a specific implementation manner, the metadata area further includes a file partition area and a file block area, and the method further includes:
If the file identification area does not have the file identification consistent with the current file identification, determining at least one second partition corresponding to the current file in the file partition area and at least one third partition corresponding to the current file in the file partition area;
Storing at least one file fragment update information and at least one file fragment update information in the metadata update area, wherein each file fragment update information comprises an identifier of a second fragment and an association relationship between the second fragment and the current file, and each file fragment update information comprises an identifier of a third fragment, an association relationship between the third fragment and a second fragment corresponding to the third fragment, data corresponding to the third fragment in the data area and offset of data corresponding to the data area relative to the second fragment corresponding to the third fragment;
And if the metadata updating area meets preset metadata updating conditions, writing the association relation between the second block and the current file into the second block according to the second block identification, and writing the association relation between the third block and the second block corresponding to the third block into the third block according to the third block identification.
In a specific implementation manner, the file block update information further includes a dirty data identifier for identifying whether the first portion of data of the current file corresponding to the third block is dirty data, a heat identifier for identifying a frequency of use of the first portion of data of the current file corresponding to the third block, and an association relationship between the third block and a fourth block in the data area.
In a specific implementation, each metadata update information further includes an update order, the metadata update information including the file identification update information, the file fragment update information, and the file identification update information, the method further including:
and if the cache meets the preset abnormal recovery condition, sequentially updating the metadata according to the updating sequence of each piece of metadata updating information in the metadata updating area.
In a specific implementation, the method further includes:
If the file identification area has the file identification consistent with the current file identification, determining that a second partition with the residual storage space exists in a fourth partition corresponding to the data area from second partitions corresponding to the current file identification as a second partition corresponding to the current data;
and determining a target fourth partition and/or an idle fourth partition with a residual storage space from the second partition in each fourth partition corresponding to the data area, and storing the current data in the target fourth partition and/or the idle fourth partition.
In a specific implementation, the size of each third partition of the data area is any one of 8k to 64 k.
In a specific implementation manner, the metadata updating condition includes that the current free space of the metadata updating area is smaller than or equal to a preset free space threshold, or the time length from the last updating time of the metadata updating area to the current time meets a preset updating time threshold.
A second aspect of an embodiment of the present application provides a cache, including:
The acquisition unit is used for acquiring the current file identification of the current file;
The determining unit is used for storing the current data of the current file change in the data area, determining a first block corresponding to the current file in the file identification area based on the current storage condition of the file identification area, and recording the current file identification;
The storage unit is used for storing file identification updating information in the metadata updating area, wherein the file identification updating information comprises the identification of the first block and the current file identification;
And the writing unit is used for writing the current file identifier into the first block according to the identifier of the first block if the metadata updating area meets the preset metadata updating condition.
In a specific implementation manner, the metadata area further includes a file partition area and a file partition area, and the determining unit is further configured to determine at least one second partition corresponding to the current file in the file partition area and at least one third partition corresponding to the current file in the file partition area if the file identification area does not have a file identification consistent with the current file identification;
The storage unit is further configured to store at least one file fragment update information and at least one file fragment update information in the metadata update area, where each file fragment update information includes an identifier of the second fragment and an association relationship between the second fragment and the current file, and each file fragment update information includes an identifier of the third fragment, an association relationship between the third fragment and the second fragment corresponding to the third fragment, and an offset of data corresponding to the third fragment in the data area relative to data corresponding to the second fragment corresponding to the third fragment;
The writing unit is further configured to write, if the metadata update area meets a preset metadata update condition, an association relationship between the second partition and the current file into the second partition according to the second partition identifier, and write, according to the third partition identifier, an association relationship between the third partition and a second partition corresponding to the third partition into the third partition.
In a specific implementation manner, the file block update information further includes a dirty data identifier for identifying whether the first portion of data of the current file corresponding to the third block is dirty data, a heat identifier for identifying a frequency of use of the first portion of data of the current file corresponding to the third block, and an association relationship between the third block and a fourth block in the data area.
In a specific implementation manner, each piece of metadata update information further includes an update order, where the metadata update information includes the file identification update information, the file fragment update information, and the file identification update information, and the writing unit is further configured to sequentially perform metadata update according to the update order of each piece of metadata update information in the metadata update area if the cache meets a preset abnormal recovery condition.
In a specific implementation manner, the determining unit is further configured to determine, from each second partition corresponding to the current file identifier, that a second partition in which a remaining storage space exists in a fourth partition corresponding to the data area is a second partition corresponding to the current data, if the file identifier exists in the file identifier area in accordance with the current file identifier;
The determining unit is further configured to determine, from the second partitions in each fourth partition corresponding to the data area, a target fourth partition and/or an idle fourth partition in which a remaining storage space exists, and store the current data in the target fourth partition and/or the idle fourth partition.
In a specific implementation, the size of each third partition of the data area is any one of 8k to 64 k.
In a specific implementation manner, the metadata updating condition includes that the current free space of the metadata updating area is smaller than or equal to a preset free space threshold, or the time length from the last updating time of the metadata updating area to the current time meets a preset updating time threshold.
A third aspect of an embodiment of the present application provides a cache, including:
a central processing unit, a memory and an input/output interface;
the memory is a short-term memory or a persistent memory;
The central processor is configured to communicate with the memory and to execute instruction operations in the memory to perform the method of the first aspect.
A fourth aspect of the embodiments of the application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method according to the first aspect.
A fifth aspect of an embodiment of the present application provides a computer storage medium having instructions stored therein, which when executed on a computer, cause the computer to perform the method according to the first aspect.
As can be seen from the technical scheme, the embodiment of the application has the advantage that after the current file identifier of the current file is obtained, the current data can be directly stored in the data area. And then, after determining the first block corresponding to the current file in the file identification area, directly storing the identification of the first block and the current file identification as file identification updating information in the metadata updating area. And finally, when the metadata updating creep meets the preset metadata updating condition, writing the current file identification into the first partition of the file identification area according to the identification of the first partition. In consideration of competition pressure during writing of a large amount of metadata, when a metadata update area meets preset metadata update conditions, the metadata update is actually performed, namely, the current file identification is written into the corresponding first partition according to the identification of the first partition, so that the metadata update (namely, the file identification update) is completed, and the competition pressure during the metadata update is greatly reduced.
Drawings
FIG. 1 is a schematic flow chart of a cache storage method according to an embodiment of the present application;
FIG. 2 is a diagram illustrating an exemplary structure of a cache according to an embodiment of the present application;
FIG. 3 is another flow chart of a cache storage method according to an embodiment of the present application;
FIG. 4 is a diagram showing an exemplary structure of a metadata update area according to an embodiment of the present application;
FIG. 5 is a diagram illustrating an exemplary structure of a file offset index according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a buffer according to an embodiment of the present application;
fig. 7 is a schematic diagram of another structure of a buffer according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In order to better explain the technical solution of the embodiments of the present application, the following explanation will explain some technical concepts appearing later.
Cache (cache), which is originally meant to be a high-speed memory that has faster access than a typical Random Access Memory (RAM), typically does not use DRAM technology as is the case with system main memory, but rather uses expensive but faster SRAM technology, the setting of which is one of the important factors for all modern computer systems to exert high performance.
Hybrid storage is a compromise storage solution. In particular, storing critical data on high performance flash media while storing other data on lower cost tiered storage, hybrid storage enables organizations to manage data in a unified storage system while still balancing performance and cost.
The software system needs to isolate different attention points (Concern Point) through a Layer (Layer), so as to cope with the change of different requirements, and the change can be independently managed, for example, a hybrid storage system consisting of storage media with different performances is managed in a layered manner according to the data cold-hot separation mode of the storage system.
Logical block addresses (LBAs, logical block address) are a common mechanism used on PC data storage devices to indicate where data is located, and the most common device that uses this mechanism is the hard disk. A LBA may refer to an address of a certain data block or a data block pointed to on a certain address. In short, the LBA corresponds to a house number address that is commonly used.
The physical block address (PBA, physics block address) corresponds to the latitude and longitude used for GPS positioning with respect to the LBA. For example, the longitude and latitude of the house number plate address can be 113 DEG 16'40.0621' ', and the north latitude 23 DEG 07'37.6129''.
The embodiment of the application provides a cache storage method and related equipment, which are used for reducing the competitive pressure during metadata updating.
Referring to fig. 1, an embodiment of the present application provides a cache storage method, which includes the following steps:
101. and obtaining the current file identification of the current file.
In order to better explain the technical scheme of the embodiment of the application, in the embodiment of the application, each part of cache which needs to be written into a disk is used as the current data of the corresponding current file change, and the cache storage flow of the embodiment of the application is executed on the current data so as to finish storage.
It will be appreciated that each cache is associated with a file in the system, that is, each cache is the change data for a file in the system. Thus, each cache has a corresponding current file identification. For example, the buffer a is the current change data of the system file user, and then the current file identifier corresponding to the buffer a is the file identifier of the system file user.
In practical application, any preset abstract algorithm can be used for generating the file identifier of the system file user so as to ensure the uniqueness of the file identifier.
102. Storing the current data of the current file change in a data area, and determining a first block corresponding to the current file in a file identification area based on the current storage condition of the file identification area, wherein the first block is used for recording the current file identification.
After determining the file identification of the current file, it can be determined to which file the current data of the current file change belongs, and the first block corresponding to the current file in the file identification area can be determined based on the current storage condition of the file identification area, where the first block is used for recording the current file identification.
Specifically, the first partition is one of a plurality of idle partitions in the file identification area, and may be determined according to the current use condition of the file identification area and a preset storage mode (such as breadth-first storage and/or compact storage).
It should be noted that, after the current data of the current file change is stored in the area corresponding to the data area, the initial storage of the current data of the change is considered to be completed, but the actual storage is completed, and it is required to wait until the corresponding metadata is stored in the corresponding metadata area to complete (for example, the file identifier is stored in the corresponding file identifier area).
103. Storing file identification update information in a metadata update area, wherein the file identification update information comprises an identification of a first partition and a current file identification.
After determining the first block storing the current file identification, the current file identification may not be written into the first block, but the identification of the first block and the current file identification are stored in the metadata update area as file identification update information, so as to avoid metadata competition pressure caused by real-time update.
104. And if the metadata updating area meets the preset metadata updating condition, writing the current file identification into the first partition according to the identification of the first partition.
The preset metadata updating conditions include, but are not limited to, that the current free space of the metadata updating area is smaller than or equal to a preset free space threshold, or that the time length from the last updating time of the metadata updating area to the current time meets a preset updating time threshold. That is, when the free space of the metadata update area is insufficient and/or the metadata update area does not process any update information for a long time, the metadata update writing is performed.
Specifically, the processing of the update information is to write the metadata corresponding to the update information into the partition corresponding to the update information. For example, the current file identifier in the file update information is written into the first partition corresponding to the identifier of the first partition in the file update information.
In this embodiment, after the current file identifier of the current file is obtained, the current data may be directly stored in the data area. And then, after determining the first block corresponding to the current file in the file identification area, directly storing the identification of the first block and the current file identification as file identification updating information in the metadata updating area. And finally, when the metadata updating creep meets the preset metadata updating condition, writing the current file identification into the first partition of the file identification area according to the identification of the first partition. In consideration of competition pressure during writing of a large amount of metadata, when a metadata update area meets preset metadata update conditions, the metadata update is actually performed, namely, the current file identification is written into the corresponding first partition according to the identification of the first partition, so that the metadata update (namely, the file identification update) is completed, and the competition pressure during the metadata update is greatly reduced.
Furthermore, in order to implement multi-level management of metadata, the metadata area of the embodiment of the application may further include a file partition area and a file partition area, where one partition in the file partition area and one partition in the file partition area record metadata information of data of the file in different granularity ranges respectively. For example, the size of the system file user is 4G, a corresponding one of the partitions of the file may be used to describe data information of all 4G of the system file user (e.g., PBA of the 4G), and a corresponding one of the partitions of the file may be used to describe data information of all 4k of the 4G of the system file user (e.g., LBA of the 4G). The file partition and the metadata information of the file partition can be updated in the following mode, if the file identification area does not have the file identification consistent with the current file identification, at least one second partition corresponding to the current file in the file partition and at least one third partition corresponding to the current file in the file partition are determined, the at least one file partition updating information and the at least one file partition updating information are stored in the metadata updating area, each file partition updating information comprises the identification of the second partition and the association relation between the second partition and the current file, each file partition updating information comprises the identification of the third partition, the association relation between the third partition and the second partition corresponding to the third partition and the offset of the data corresponding to the second partition corresponding to the third partition in the data area relative to the data corresponding to the third partition, and if the metadata updating area meets the preset metadata updating condition, the association relation between the second partition and the current partition is written into the third partition according to the second partition identification and the association relation between the second partition and the third partition.
Specifically, if the file identification area does not have the file identification consistent with the current file identification, it is indicated that the cache corresponding to the current file is stored for the first time, that is, the current file is the file recorded in the cache for the first time, and then, correspondingly, the file partition area and the file partition area do not have any metadata record of the current file, so that the second partition, in which the information required to be recorded in the file partition area corresponding to the current data of the current file change should be recorded, is determined according to a preset partition update rule, and similarly, the third partition, in which the file partition area corresponds, is determined according to the preset partition update rule. And the file block updating information and the file fragment updating information are updated when the metadata updating area meets the metadata updating condition. Wherein the data range of each second block description may be 4G and the data range of each third block description may be any one of 8k to 64 k.
It can be understood that the metadata update area stores a plurality of update information, if the metadata update area meets the metadata update condition, the metadata update area is automatically and sequentially processed according to the sequence of adding each metadata update information into the metadata update area, and all update information to be processed is not required to be processed in each class. In addition, when an abnormality occurs in the disk, an abnormality recovery of the metadata update region is triggered (i.e., an abnormality recovery condition is satisfied), and at this time, metadata update should be sequentially performed according to the order in which each metadata update information is added to the metadata update region (i.e., the corresponding update order).
Furthermore, on the basis of the file partition and the file partition design, each file partition updating information more specifically further comprises a dirty data identifier for identifying whether the first part of data of the current file corresponding to the third partition is dirty data, a heat identifier for identifying the use frequency of the first part of data of the current file corresponding to the third partition, and an association relation between the third partition and a fourth partition in the data area.
In other implementations, if the current file is not stored in the disk for the first time, that is, the file identifier area has a file identifier consistent with the current file identifier, the described data range and the remaining second blocks may be determined from a plurality of second blocks corresponding to the current file in the file partition area, where the remaining second blocks are used for recording metadata of the current data. And recording the current data in a corresponding fourth partition having free space in the data area, i.e., a fourth partition in which any content is not stored temporarily, or a fourth partition in which the space has been used is less than half of the maximum space, which is not particularly limited herein.
The cache storage method according to the embodiment of the present application is described below in some specific scenarios.
First, a block of SSD is formatted as a layered cache device in accordance with the layered metadata structure shown in fig. 2, and metadata and data management are performed in the manner shown in the following table.
Specifically, the metadata design according to the embodiment of the present application can be understood according to the file system, and the file system includes metadata (superblock, entry, directory entry, index, etc.) and data (service data content), and the application is similar to the file system in that it increases a cache (corresponding to a back-end storage) space, and one cache data is stored in a disk, and must be accurate and indexable, and includes a back-end physical disk (cache) to which the current data belongs, a current file (inode or referred to as a file identifier) to which the current data belongs, a slice position (shard) to which the current data belongs, and a position (extension) of the cache device to which the current data belongs, and a position (data, data area) to which the cache data is stored.
Referring to fig. 3, the following steps are performed during the write operation of the hierarchical one-time cache. For example, a file named problem list doc is created, which is assigned (with internal algorithm decisions) a back-end storage location (brick), and a unique identification (inode) of the file, such as brick-/dev/sdx, and the file identification inode-uuidx (sangford txt equivalent). Next, a file ID (inode-uuidx) associated in the back-end device (brick-/dev/sdx) is created to store the inode data to the cache device (index inode-uuidx). After creating an inode, it is necessary to create shard areas, store an index of inode- > shard, and include slice information of a file, and if the file is an 8GB file, two pieces of slice index information (with a slice granularity of 4G) are generated, which are shard-001 and shard-002, respectively, and the index information is written into the SSD cache device. Finally, an extension area, i.e., a data index information (offset=0, len=4kb) is applied, and data (data) is written to the corresponding data area. It should be noted that, when updating each cache according to different actual requirements, it is not necessary to update metadata corresponding to each metadata area (file identification area, file partition area and/or file block area).
Specifically, each 4KB existence metadata region holds 512 entries, holds the mapping of LBAs and PBAs. Wherein each third partition of the file partition may include the following:
1. The mapping of PBA- > LBA, the logic offset of the business file fragment (4 GB) of the block record is marked, and the LBA is aligned according to the minimum block granularity of 8KB, and only 19bits are needed to be used for the LBA;
2. shard the mapping of the content- > shard is stored, which shard the content belongs to is identified, and 19bits are needed because the maximum number of the supported files is 26144;
3. dirty, identifying whether the cache block data is dirty, i.e. whether the data is flushed back to the back-end storage device (the metadata design allows for the cache to be shared as a read-write cache);
4. On the basis of block granularity, storing data again with fine granularity (4 KB), improving space utilization rate, and according to the maximum block granularity of 64KB, 64KB/4 KB=16 bits are needed;
5. hot, namely, identifying the hot of the block and providing a basis for elimination of a replacement algorithm;
6. reserved, 1 bit, reserved space.
In total, a total of 8 bytes can identify the index of the logical offset of the business file to the cache data location, i.e., LBA- > PBA and the mapping of PBA- > LBA.
The Inode area, the extension area, and the shard area are stored in a compact manner, and three areas must be guaranteed for transaction, so the Journal area is specially used as an area for journaling the above areas.
Specifically, referring to fig. 4, the entire Journal area is divided into a super block area (super) and a Journal data area (including meta and data), wherein meta is a sector size (4 k), and includes all metadata information of a transaction or batch (block number of the update [ hierarchical metadata is numbered according to 4KB order ], unique number of the current request), and the data area sequentially stores the data content of the update. First, journal space is managed using a circular queue, the head pointer is incremented when data is inserted, the tail pointer is incremented when data is written back (WAL compare), wherein the data in the range of [ tail, head ] is the data to be played back, and the [ tail, head ] drops with one IO at a time. In addition, when the process is abnormal, the loading is restarted, the largest 'seq_id' in all metadata blocks is scanned as the final valid ID, and the log valid by the 'tail, head' is played back. Specifically, asynchronous playback in the business process, and playback time is played back with thresholds such as Journal area capacity (total capacity 25%) and timing (30 min).
The following illustrates that the service triggers an inode update event (the trigger event is an increase in file size). The data is written through a journ module, which populates a head index, which is the address of the journ area, and the data content of this write (4 KB of inode update write) shares 8KB of data, and adds the ID of this write, write once to disk. If the power is suddenly lost or the disk is pulled, after the recovery, the metadata needs to be loaded from the cache device again, the Journal area needs to be loaded first, if the power is lost after the writing is successful, the valid Journal data (the valid standard, namely, the maximum seq_id) is read, the state after the writing is successful can be recovered, and if the writing is unsuccessful, the state before the power is lost is recovered. After the Journal loading succeeds, the data in the journ area needs to be written into the area corresponding to the inode, which is called playback, and the entire metadata is restored to the original state.
One cache data index contains belonging back end storage (brick), belonging file (inode), belonging fragment (shard) and belonging SSD position (extension), each cache data is a set of (ssd_id, brick_id, inode_id, shard _id, extension_id);
In practical application, searching a cache flow is as follows, firstly, searching SSD device ID through configuration file, searching SSD memory structure to which data belong (same host may have multiple SSD cache devices). Next, the key structure is looked up in the SSD index table (the looked up tag is brickid, the unique identifier stored at the back end, where the index table may be a hash table). The inode structure is then looked up in a brick index table (the file to which the cached data belongs, where the index table may be a hash table). Then, the shard structures are looked up in the inode index (each file may split multiple slices, and the index table is a hash table). Finally, in the shard index structure, the content metadata is indexed by the logical offset of the service request, and the cache data can be read (the index table can be red-black tree) by storing the physical disk storage location of the data in the content metadata. The file offset index may be determined by the LBA to PBA index, referring to fig. 5, the standard, the extension, and the inode file index relationship.
Referring to fig. 6, an embodiment of the present application provides a cache, where the cache includes a metadata area and a data area, the metadata area includes a file identification area and a metadata update area, and the cache further includes:
An obtaining unit 601, configured to obtain a current file identifier of a current file;
A determining unit 602, configured to store current data of a current file change in a data area, and determine a first partition corresponding to the current file in the file identification area based on a current storage condition of the file identification area, where the first partition is used to record a current file identification;
a storage unit 603, configured to store file identification update information in a metadata update area, where the file identification update information includes an identification of a first partition and a current file identification;
and the writing unit 604 is configured to write the current file identifier into the first partition according to the identifier of the first partition if the metadata update area meets a preset metadata update condition.
In a specific implementation manner, the metadata area further includes a file partition area and a file block area, and the determining unit 602 is further configured to determine at least one second block corresponding to the current file in the file partition area and at least one third block corresponding to the current file in the file partition area if the file identifier area does not have a file identifier consistent with the current file identifier;
The storage unit 603 is further configured to store at least one file fragment update information and at least one file fragment update information in a metadata update area, where each file fragment update information includes an identifier of a second fragment and an association relationship between the second fragment and a current file, and each file fragment update information includes an identifier of a third fragment, an association relationship between the third fragment and a second fragment corresponding to the third fragment, data corresponding to the third fragment in a data area, and an offset relative to data corresponding to the second fragment corresponding to the third fragment in the data area;
The writing unit 604 is further configured to, if the metadata update area meets a preset metadata update condition, write the association relationship between the second partition and the current file into the second partition according to the second partition identifier, and write the third partition and the association relationship between the second partition corresponding to the third partition into the third partition according to the third partition identifier.
In a specific implementation mode, each file block updating information further comprises a dirty data identifier for identifying whether the first part of data of the current file corresponding to the third block is dirty data, a heat identifier for identifying the use frequency of the first part of data of the current file corresponding to the third block, and an association relation between the third block and a fourth block in the data area.
In a specific implementation manner, each piece of metadata update information further includes an update order, where the metadata update information includes file identification update information, file fragment update information, and file identification update information, and the writing unit 604 is further configured to sequentially perform metadata update according to the update order of each piece of metadata update information in the metadata update area if the cache meets a preset abnormal recovery condition.
In a specific implementation manner, the determining unit 602 is further configured to determine, from each second partition corresponding to the current file identifier, that a second partition in which a remaining storage space exists in a fourth partition corresponding to the data area is a second partition corresponding to the current data, if the file identifier area has a file identifier consistent with the current file identifier;
The determining unit 602 is further configured to determine, from the second partitions in each fourth partition corresponding to the data area, a target fourth partition and/or an idle fourth partition in which the remaining storage space exists, and store the current data in the target fourth partition and/or the idle fourth partition.
In one specific implementation, each third partition of the data region has a size of any one of 8k to 64 k.
In a specific implementation mode, the metadata updating condition comprises that the current free space of the metadata updating area is smaller than or equal to a preset free space threshold value, or the time length from the last updating time of the metadata updating area to the current time meets a preset updating time threshold value.
Fig. 7 is a schematic diagram of a cache structure provided in an embodiment of the present application, where the cache 700 may include one or more central processing units (central processing units, CPU) 701 and a memory 705, where the memory 705 stores one or more applications or data.
Wherein the memory 705 may be volatile storage or persistent storage. The program stored in the memory 705 may include one or more modules, each of which may include a series of instruction operations in a cache. Still further, the central processor 701 may be configured to communicate with the memory 705 and execute a series of instruction operations in the memory 705 on the cache 700.
Cache 700 may also include one or more power supplies 702, one or more wired or wireless network interfaces 703, one or more input/output interfaces 704, and/or one or more operating systems, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.
The cpu 701 may perform the operations performed by the cache in the embodiments shown in fig. 1 to 6, and detailed descriptions thereof are omitted herein.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes a usb disk, a removable hard disk, a read-only memory (ROM), a random-access memory (RAM, random access memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
Embodiments of the present application also provide a computer program product comprising instructions which, when executed on a computer, cause the computer to perform a cache storage method as described above.

Claims (10)

1. A cache storage method, wherein the cache includes a metadata area and a data area, the metadata area includes a file identification area and a metadata update area, the method comprising:
acquiring a current file identifier of a current file;
Storing the current data of the current file change in the data area, and determining a first block corresponding to the current file in the file identification area based on the current storage condition of the file identification area, wherein the first block is used for recording the current file identification;
Storing file identification updating information in the metadata updating area, wherein the file identification updating information comprises the identification of the first block and the current file identification;
And if the metadata updating area meets the preset metadata updating condition, writing the current file identification into the first partition according to the identification of the first partition.
2. The method of claim 1, wherein the metadata area further comprises a file partition and a file chunk area, the method further comprising:
If the file identification area does not have the file identification consistent with the current file identification, determining at least one second partition corresponding to the current file in the file partition area and at least one third partition corresponding to the current file in the file partition area;
Storing at least one file fragment update information and at least one file fragment update information in the metadata update area, wherein each file fragment update information comprises an identifier of a second fragment and an association relationship between the second fragment and the current file, and each file fragment update information comprises an identifier of a third fragment, an association relationship between the third fragment and a second fragment corresponding to the third fragment, data corresponding to the third fragment in the data area and offset of data corresponding to the data area relative to the second fragment corresponding to the third fragment;
And if the metadata updating area meets preset metadata updating conditions, writing the association relation between the second block and the current file into the second block according to the second block identification, and writing the association relation between the third block and the second block corresponding to the third block into the third block according to the third block identification.
3. The method of claim 2, wherein each file block update information further includes a dirty data identifier for identifying whether the first portion of data of the current file corresponding to the third block is dirty data, a hot identifier for identifying a frequency of use of the first portion of data of the current file corresponding to the third block, and an association of the third block with a fourth block in the data area.
4. A method according to claim 1 or 3, wherein each metadata update information further comprises an update order, the metadata update information comprising the file identification update information, the file fragment update information, and the file identification update information, the method further comprising:
and if the cache meets the preset abnormal recovery condition, sequentially updating the metadata according to the updating sequence of each piece of metadata updating information in the metadata updating area.
5. The method according to claim 1, wherein the method further comprises:
If the file identification area has the file identification consistent with the current file identification, determining that a second partition with the residual storage space exists in a fourth partition corresponding to the data area from second partitions corresponding to the current file identification as a second partition corresponding to the current data;
and determining a target fourth partition and/or an idle fourth partition with a residual storage space from the second partition in each fourth partition corresponding to the data area, and storing the current data in the target fourth partition and/or the idle fourth partition.
6. A method according to any one of claims 2 to 3, wherein the size of each third partition of the data area is any one of 8k to 64 k.
7. The method of claim 1, wherein the metadata update condition includes a current free space of the metadata update area being equal to or less than a preset free space threshold, or a length of time from a last update time of the metadata update area to a current time satisfying a preset update time threshold.
8. A cache comprising a metadata area and a data area, the metadata area comprising a file identification area and a metadata update area, the cache further comprising:
The acquisition unit is used for acquiring the current file identification of the current file;
The determining unit is used for storing the current data of the current file change in the data area, determining a first block corresponding to the current file in the file identification area based on the current storage condition of the file identification area, and recording the current file identification;
The storage unit is used for storing file identification updating information in the metadata updating area, wherein the file identification updating information comprises the identification of the first block and the current file identification;
And the writing unit is used for writing the current file identifier into the first block according to the identifier of the first block if the metadata updating area meets the preset metadata updating condition.
9. A cache, comprising:
a central processing unit, a memory and an input/output interface;
the memory is a short-term memory or a persistent memory;
the central processor is configured to communicate with the memory and to execute instruction operations in the memory to perform the method of any of claims 1 to 7.
10. A computer storage medium having instructions stored therein, which when executed on a computer, cause the computer to perform the method of any of claims 1 to 7.
CN202211717784.0A 2022-12-29 2022-12-29 Cache storage methods and related devices Active CN116185949B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211717784.0A CN116185949B (en) 2022-12-29 2022-12-29 Cache storage methods and related devices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211717784.0A CN116185949B (en) 2022-12-29 2022-12-29 Cache storage methods and related devices

Publications (2)

Publication Number Publication Date
CN116185949A CN116185949A (en) 2023-05-30
CN116185949B true CN116185949B (en) 2025-12-30

Family

ID=86441523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211717784.0A Active CN116185949B (en) 2022-12-29 2022-12-29 Cache storage methods and related devices

Country Status (1)

Country Link
CN (1) CN116185949B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119336276B (en) * 2024-12-19 2025-03-21 北京大道云行科技有限公司 High-performance bare disk management method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488128A (en) * 2019-12-30 2020-08-04 北京浪潮数据技术有限公司 Method, device, equipment and medium for updating metadata
CN111857556A (en) * 2019-04-30 2020-10-30 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for managing metadata of storage objects

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10324804B1 (en) * 2015-09-30 2019-06-18 EMC IP Holding Company LLC Incremental backup with eventual name space consistency
CN113050893B (en) * 2021-03-30 2022-08-30 重庆紫光华山智安科技有限公司 High-concurrency file storage method, system, medium and electronic terminal
CN119106035A (en) * 2023-06-07 2024-12-10 腾讯科技(深圳)有限公司 A block storage method and related equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111857556A (en) * 2019-04-30 2020-10-30 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for managing metadata of storage objects
CN111488128A (en) * 2019-12-30 2020-08-04 北京浪潮数据技术有限公司 Method, device, equipment and medium for updating metadata

Also Published As

Publication number Publication date
CN116185949A (en) 2023-05-30

Similar Documents

Publication Publication Date Title
CN108459826B (en) Method and device for processing IO (input/output) request
US10564850B1 (en) Managing known data patterns for deduplication
US11010300B2 (en) Optimized record lookups
US11580162B2 (en) Key value append
CN102779180B (en) The operation processing method of data-storage system, data-storage system
US9767140B2 (en) Deduplicating storage with enhanced frequent-block detection
EP3316150B1 (en) Method and apparatus for file compaction in key-value storage system
JP2018152116A (en) Reduce redundancy in stored data
CN107329692B (en) Data deduplication method and storage device
CN110908589B (en) Data file processing method, device, system and storage medium
WO2017113213A1 (en) Method and device for processing access request, and computer system
US11860840B2 (en) Update of deduplication fingerprint index in a cache memory
US11144508B2 (en) Region-integrated data deduplication implementing a multi-lifetime duplicate finder
CN113535670A (en) Virtual resource mirror image storage system and implementation method thereof
CN111443874A (en) Content-aware-based solid-state disk memory cache management method, device, and solid-state disk
CN118519827A (en) Data backup, recovery and query method and device for distributed database
KR20230026946A (en) Key value storage device with hashing
CN116185949B (en) Cache storage methods and related devices
KR102796494B1 (en) Hash based key value to block translation methods and systems
WO2016206070A1 (en) File updating method and storage device
CN117093579A (en) Data query and data storage method, device, equipment and storage medium
CN109165172B (en) Cache data processing method and related equipment
CN112597074B (en) Data processing method and device
CN119493518A (en) Data management method and device
CN117813591A (en) Deduplication of strong and weak hashes using cache eviction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant