[go: up one dir, main page]

CN116719813B - Hash table processing method and device and electronic equipment - Google Patents

Hash table processing method and device and electronic equipment

Info

Publication number
CN116719813B
CN116719813B CN202310630033.3A CN202310630033A CN116719813B CN 116719813 B CN116719813 B CN 116719813B CN 202310630033 A CN202310630033 A CN 202310630033A CN 116719813 B CN116719813 B CN 116719813B
Authority
CN
China
Prior art keywords
key
hash table
hash
value pair
bucket
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310630033.3A
Other languages
Chinese (zh)
Other versions
CN116719813A (en
Inventor
冯丹
胡燏翀
肖仁智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202310630033.3A priority Critical patent/CN116719813B/en
Publication of CN116719813A publication Critical patent/CN116719813A/en
Application granted granted Critical
Publication of CN116719813B publication Critical patent/CN116719813B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种哈希表的处理方法、装置及电子设备,哈希表位于持久数据层,包括多个哈希桶,每个哈希桶存储多个键值对,在易失性过滤层设置布隆过滤器;其中,哈希桶存储的每个键值对的键被插入到布隆过滤器中,与布隆过滤器中N个位的数据相互映射,哈希桶内任意两个键值对的键在布隆过滤器内映射的数据位不能完全重合,布隆过滤器内一个数据位能够被至少一个键映射;当需要对哈希表内某个键值对内的值进行查找或者删除时,先在布隆过滤器中查找对应的键,若对应的键不存在,则对应的键值对不存在于哈希表内,返回查找失败或删除失败指示,以降低对持久数据层的访问开销,提高对哈希表的负查询性能。本发明提高了对哈希表的处理性能。

The present invention provides a method, device, and electronic device for processing a hash table. The hash table is located in a persistent data layer and includes multiple hash buckets. Each hash bucket stores multiple key-value pairs. A Bloom filter is provided in a volatile filter layer. The key of each key-value pair stored in the hash bucket is inserted into the Bloom filter and mapped to N bits of data in the Bloom filter. The keys of any two key-value pairs in the hash bucket cannot completely overlap in the data bits mapped in the Bloom filter. A data bit in the Bloom filter can be mapped to at least one key. When a value in a key-value pair in the hash table needs to be searched or deleted, the corresponding key is first searched in the Bloom filter. If the corresponding key does not exist, the corresponding key-value pair does not exist in the hash table, and a search failure or deletion failure indication is returned, thereby reducing access overhead to the persistent data layer and improving negative query performance of the hash table. The present invention improves the processing performance of the hash table.

Description

Hash table processing method and device and electronic equipment
Technical Field
The invention belongs to the field of computer storage, and in particular relates to a hash table processing method, a hash table processing device and electronic equipment.
Background
Persistent memory (PERSISTENT MEMORY, PM) provides a new way to build large-scale, low-latency memory and storage systems. PM, such as phase change Memory and Intel Optane DC persistent Memory modules (DATA CENTER PERSISTENT Memory Module, DCPMM), provide desirable characteristics, such as high capacity, high performance, non-volatility, and byte addressing capability. PM is expected to replace DRAM to become a next generation candidate memory, and fills the performance gap between a mechanical hard disk (HARD DISK DRIVE, HDD) and a dynamic random access memory (Dynamic Random Access Memory, DRAM) to enrich the storage layer. Intel issued DCPMM single device capacity up to 256GB, with a maximum single device capacity of 8TB (512 GB/DIMM 16 DIMM), can provide a high throughput, low latency, and low total cost of ownership (Total Cost of Ownership, TCO) real-time processing system for various applications.
Hash indexes are widely used in memory index structures of key-value storage systems, such as Memcached and Redis, due to their constant level of fast lookup performance. One unique weakness of hash indexes is that if multiple entries have key collisions (i.e., hash collisions), then the hash table needs to be re-hashed or resized if it is full. As PM evolves, many researchers have been working on designing efficient PM-based persistent hash indexes, such as LEVEL HASHING, CCEH, PCLHT, SOFT, clevel, and Dash.
Positive searches find existing elements in the hash table, while negative searches find elements that do not exist in the hash table. However, the PM-based hash index described above still has poor negative search performance. Our study of their negative search analysis shows that the negative query throughput of a persistent hash index is only 45.1% -79.8% of its positive query throughput. Negative queries in PM-based hash index systems can result in large PM accesses, which can significantly reduce system performance, especially when handling workloads where frequent queries do not have elements present, and thus improving negative query performance is also important. And when the data load is too large, the average barrel chain length of the hash barrel is too long, so that the positive query throughput of the hash table is too large, and the system performance is affected.
In summary, the processing performance of the existing hash table needs to be improved.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a processing method, a device and electronic equipment of a hash table, and aims to solve the problem that the processing performance of the existing hash table is to be improved.
To achieve the above object, in a first aspect, the present invention provides a method for processing a hash table, where the hash table includes a plurality of hash buckets, each of which stores a plurality of key value pairs, the method including the steps of:
Setting a bloom filter in a volatile filter layer, wherein keys of each key value pair stored in the hash bucket are inserted into the bloom filter and are mapped with data of N bits in the bloom filter, and N is an integer greater than 1;
when a value in a certain key value pair in the hash table needs to be searched or deleted, searching a corresponding key in the bloom filter, if the corresponding key does not exist, returning a searching failure or deleting failure indication, so as to reduce the access cost to a persistent data layer when the key value pair does not exist in the hash table, and improve the negative query performance of the hash table.
In one possible implementation, the method further includes the steps of:
When a value in a certain key value pair in the hash table needs to be searched or deleted, searching a corresponding key in the bloom filter, and if the corresponding key exists, searching or deleting the corresponding key in the hash table according to the corresponding key value pair.
In one possible implementation, the method further includes the steps of:
When a certain key value pair needs to be inserted into the hash table, judging whether the key value pair to be inserted exists in the hash table, and if so, returning an insertion failure indication;
If the hash table does not have the key value pair to be inserted, calculating a corresponding head barrel number according to the key, and locking the hash barrel inner range corresponding to the head barrel number;
searching whether an idle key value pair slot exists in the locked hash bucket or not, if so, inserting the key value pair into the idle slot, and persisting the inserted key value pair into a nonvolatile memory by adopting a merging refreshing mode to finish key value pair insertion, wherein the merging refreshing mode is used for reducing cache line refreshing times and reducing the insertion delay cost of a hash table.
In one possible implementation manner, whether an idle key value pair slot exists is searched in a locked hash bucket, if not, a new chain bucket is distributed from a nonvolatile memory space by adopting a head insertion method and assigned to a temporary bucket pointer, a first slot position of the new chain bucket is set to be a position where a key value pair is to be inserted, the key value pair to be inserted is inserted, the inserted key value pair is persisted to the nonvolatile memory by adopting a merging refreshing mode, then the temporary bucket pointer is assigned to a next bucket pointer of the head bucket pointer, and the next bucket pointer of the head bucket is persisted to the nonvolatile memory by using a cache line refreshing instruction.
In one possible implementation, the method further includes the steps of:
And dynamically setting the initial head barrel length of the hash table according to the data load of the persistent data layer, wherein when the data load is relatively large, the initial head barrel length is relatively long, and the average barrel chain length corresponding to each head barrel is relatively short so as to improve the positive query performance of the hash table.
In a second aspect, the present invention provides a processing apparatus for a hash table, the hash table including a persistent data layer, including a plurality of hash buckets, each hash bucket storing a plurality of key-value pairs, the apparatus comprising:
a bloom filter setting unit for setting a bloom filter in the volatile filter layer; the key of each key value pair stored in the hash bucket is inserted into a bloom filter and is mapped with N bits of data in the bloom filter, wherein N is an integer greater than 1; the data bits mapped by keys of any two key-value pairs in the hash bucket in the bloom filter cannot be completely overlapped, and one data bit in the bloom filter can be mapped by at least one key;
And the hash table negative query unit is used for searching corresponding keys in the bloom filter when the value in a certain key value pair in the hash table is required to be searched or deleted, returning a search failure or deletion failure indication if the corresponding key value pair does not exist in the hash table, reducing the access cost to the persistent data layer when the key value pair does not exist in the hash table, and improving the negative query performance of the hash table.
In one possible implementation, the apparatus further includes:
The hash table inserting unit is used for firstly judging whether a key value pair to be inserted exists in the hash table or not when a certain key value pair needs to be inserted in the hash table, returning an insertion failure indication if the key value pair to be inserted exists in the hash table, calculating a corresponding head barrel number according to a key and locking a hash barrel inner range corresponding to the head barrel number if the key value pair does not exist in the hash table, searching whether an idle key value pair slot exists in the locked hash barrel inner range, inserting the key value pair into the idle slot if the idle key value pair exists in the locked hash barrel inner range, and persisting the inserted key value pair into a nonvolatile memory by adopting a merging refreshing mode to complete key value pair insertion, wherein the merging refreshing mode is used for reducing cache line refreshing times and reducing the insertion delay expenditure of the hash table.
In one possible implementation manner, the hash table inserting unit is configured to search whether an idle key value pair slot exists in the locked hash bucket, if not, allocate a new chain bucket from the nonvolatile memory space by using a head insertion method and assign a temporary bucket pointer, set a first slot of the new chain bucket as a position to be inserted with a key value pair, insert the key value pair to be inserted, and persistence the inserted key value pair to the nonvolatile memory by using a merging refresh method, and then assign the temporary bucket pointer to a next bucket pointer of the head bucket pointer, and persistence the next bucket pointer of the head bucket to the nonvolatile memory by using a cache line refresh instruction.
In one possible implementation, the apparatus further includes:
The head barrel length setting unit is used for dynamically setting the initial head barrel length of the hash table according to the data load capacity of the persistent data layer, when the data load capacity is relatively large, the initial head barrel length is relatively long, and the average barrel chain length corresponding to each head barrel is relatively short so as to improve the positive query performance of the hash table.
In a third aspect, the invention provides an electronic device comprising at least one memory for storing a program, and at least one processor for executing the program stored in the memory, the processor being adapted to perform the method described in the first aspect or any one of the possible implementations of the first aspect when the program stored in the memory is executed.
In a fourth aspect, the present invention provides a computer readable storage medium storing a computer program which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
In a fifth aspect, the invention provides a computer program product which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
In general, the above technical solutions conceived by the present invention have the following beneficial effects compared with the prior art:
The invention provides a processing method, a device and electronic equipment of a hash table, when key value pairs which do not exist in the hash table are required to be searched or deleted, the traditional method is that negative inquiry is carried out in the hash table of a persistent data layer, all the key value pairs in the hash table are searched once, and when the key value pairs do not exist, the searching cost is large and unnecessary. The invention can rapidly locate the non-existing key through the bloom filter positioned in the volatile filter layer, and avoid the search of the non-existing key value pair in the persistent data layer, thereby improving the negative query performance and the deletion operation of the non-existing key value pair. According to the invention, when the hash table of the persistent data layer inserts key value pairs, cache line refreshing times are reduced through a head insertion method and a combined refreshing operation, and the insertion performance is improved. The invention reduces the average chain barrel length of each head barrel by setting the relatively longer head barrel length, thereby improving the positive inquiry performance.
Drawings
FIG. 1 is a diagram of a nonvolatile memory structure according to an embodiment of the present invention;
FIG. 2 is a diagram of a hash bucket architecture provided by an embodiment of the present invention;
FIG. 3 is a mapping relationship diagram of bloom filters and keys provided by an embodiment of the present invention;
FIG. 4 is a flowchart of a hash table processing method according to an embodiment of the present invention;
FIG. 5 is a hash bucket concurrency control diagram based on lock provided by an embodiment of the present invention;
FIG. 6 is a key-value pair insertion flow chart of a hash table provided by an embodiment of the present invention;
FIG. 7 is a flow chart of a hash table lookup provided by an embodiment of the present invention;
FIG. 8 is a flowchart for deleting hash tables provided by an embodiment of the present invention;
fig. 9 is a diagram of a hash table processing apparatus according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In embodiments of the invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the description of the embodiments of the present invention, unless otherwise indicated, the meaning of "a plurality" means two or more.
Aiming at the defects and improvement demands of the prior art, the invention provides a processing method of a hash table to reduce negative search to accelerate a persistent hash index, and aims to improve the negative query performance of the persistent hash index, provide high multithreading expandability and simultaneously ensure low data consistency overhead.
The architecture diagram of the nonvolatile memory provided by the embodiment of the invention is shown in fig. 1, wherein the hash table can be a traditional chained hash or other hash varieties such as an extensible hash (extendible hashing). Volatile filter layer (Volatile FILTER LAYER, VFL) is a DRAM-based bloom filter that acts to reduce negative lookup operations on hash tables in PM, greatly enhancing the negative lookup performance of hash tables and reducing PM access overhead that is more expensive than DRAM. The persistent data layer (PERSISTENT DATA LAYER, PDL) is a hash table (may be a chain hash or other hash tables such as an extensible hash) based on a persistent memory, where, taking the chain hash as an example, the initial header Bucket length of the chain hash is set to N, where N may be dynamically set according to a scenario, and the chain Bucket is divided into a header Bucket (Head Bucket) and a chain Bucket (Chained Bucket), where each of the header Bucket and the chain Bucket is a multi-slot Bucket. In general, the longer the initial head barrel length, the better, so that the smaller the average barrel chain length of each corresponding head barrel, and the relatively smaller the positive query overhead. However, when the length of the head barrel is too long and the load capacity is relatively small, the memory space is wasted, so that a person skilled in the art can set the length of the head barrel according to the actual situation to integrate the memory and the forward query cost.
In addition, the persistent data layer needs to ensure multi-threaded lock-based concurrency correctness and low data consistency overhead at system failure, while the VFL needs to quickly recover from the consistent persistent data layer at system failure.
Fig. 2 is a mapping relationship diagram of bloom filter and key provided in an embodiment of the present invention. As shown in fig. 2, y 1、y2 and y 3 represent three keys, each key mapped to three bits of the bloom filter, the three bits interspersed with different locations. As can be seen from fig. 3, one bit can be mapped by at least one key, and the three bits of different key maps do not overlap completely. If the key is present, the data of the corresponding mapped three bits is set to 1, and if the key is not present, the data of the corresponding bits is set to 0.
To increase cache line efficiency, the size of the header and chain buckets may be an integer multiple of the 64 byte cache line size, may be 64 bytes in size, may be 128 bytes or 256 bytes, and the like. Here, a header bucket and a chain bucket with a size of 64 bytes are illustrated, for example, as shown in fig. 3. Where Token represents a tag, KV represents a key pair, next BP represents the Next Bucket Pointer (BP), and Dummy is the extra complement space added for 64 byte cache line alignment. A flag value of 0 indicates that the corresponding key-value pair slot is not used, and a value of 1 indicates that the corresponding key-value pair slot is occupied. The KV may be a fixed-length key-value pair size, or an 8-byte pointer to a variable-length key-value pair may be stored, where the fixed-length is taken as an example, and the key and value sizes are both 8 bytes.
Fig. 4 is a flowchart of a hash table processing method provided by an embodiment of the present invention, where the hash table includes a persistent data layer, and includes a plurality of hash buckets, each of which stores a plurality of key-value pairs, and the method includes the following steps:
s101, setting a bloom filter in a volatile filter layer, wherein keys of each key value pair stored in the hash bucket are inserted into the bloom filter and are mapped with N bits of data in the bloom filter, and N is an integer greater than 1;
S102, when a value in a certain key value pair in the hash table needs to be searched or deleted, searching a corresponding key in the bloom filter, if the corresponding key does not exist, returning a searching failure or deleting failure indication, so that access cost to a persistent data layer is reduced when the key value pair does not exist in the hash table, and negative query performance of the hash table is improved.
In one example, FIG. 5 is a hash bucket concurrency control diagram based on a lock provided by an embodiment of the present invention, and it can be seen that read operations employ shared read locks and write operations employ exclusive write locks, each lock size containing a range of 256 64 byte header buckets.
In one example, fig. 6 is a key-value pair insertion flow chart of a hash table according to an embodiment of the present invention, including the following steps:
(1) And calculating a hash value and a head bucket number according to a given key, and finding a bucket pointer corresponding to the chain hash in the persistent data layer.
(2) Before inserting the key value pairs, judging whether the key value pairs to be inserted exist in the chain hash of the nonvolatile memory. If yes, executing the step (3), otherwise executing the step (4).
(3) The hash table has key value pairs to be inserted, and the insertion failure is directly returned.
(4) And according to the calculated specific head barrel number, locking the lock range of the head barrel by adopting an intelligent mutual exclusion lock.
(5) Searching whether a free key value pair slot exists in the initial head barrel and the first chain barrel (if the free key value pair slot exists), executing the step (6) if the free slot position exists, and executing the step (7) if the free key value pair slot does not exist.
(6) And modifying the key value pair slot position value and the key into the value and the key of the key value pair to be inserted in sequence, and setting the mark of the slot position as 1. The modifications are all persisted to nonvolatile memory at a bucket granularity of 64 byte cache line size using a merge refresh approach. At this point the insertion was successful.
(7) A new chain bucket is distributed from a nonvolatile memory space by adopting a head insertion method and assigned to a temporary bucket pointer, a first slot position of the chain bucket is set as a position to be inserted with a key value pair, the value of the slot position, a key and a corresponding flag bit are modified in the same way, and the modification is persisted into PM in a combined cache line refreshing mode based on 8 byte atom update.
(8) And setting a memory fence instruction, assigning the temporary bucket pointer to the next bucket pointer of the head bucket pointer, and using a cache line refreshing instruction to persist the next bucket pointer of the head bucket into the nonvolatile memory.
(9) Setting a memory barrier instruction and inserting keys of the key value pair to be inserted into a bloom filter located in the volatile filter layer. Returning to successful insertion.
In one example, fig. 7 is a flowchart of a hash table lookup provided in an embodiment of the present invention, including the steps of:
(1) searching whether the bloom filter contains the key to be searched or not, if not, executing the step (2), and if so, executing the step (3);
(2) The key value pair to be searched does not exist, and the blank is directly returned;
(3) Calculating a hash value and a head bucket number according to a given key, finding a bucket pointer corresponding to chain hash in a persistent data layer, and locking by adopting a shared read lock;
(4) It is determined whether the bucket pointer is empty. If the air is empty, the circulation is finished to execute the step (2), otherwise, the step (5) is executed;
(5) Sequentially traversing a plurality of slots in the barrel, executing the step (6) if the corresponding mark is found to be 1 and the key value of the key value to the slots is equal to the key to be searched, otherwise executing the step (7);
(6) There are key-value pairs to be found, and the corresponding values are returned.
(7) The updated bucket pointer points to the next bucket pointer. And (4) executing the step (4).
In one example, fig. 8 is a flowchart of hash table deletion provided in an embodiment of the present invention, including the following steps:
(1) searching whether the bloom filter contains the key to be searched or not, if not, executing the step (2), and if so, executing the step (3);
(2) The key value pair to be deleted does not exist, and deletion failure is directly returned;
(3) Calculating a hash value and a head bucket number according to a given key, finding a bucket pointer corresponding to chain hash in a persistent data layer, and locking by adopting an intelligent mutual exclusion lock;
(4) It is determined whether the bucket pointer is empty. If the air is empty, the circulation is finished to execute the step (2), otherwise, the step (5) is executed;
(5) Sequentially traversing a plurality of slots in the barrel, executing the step (6) if the corresponding mark is found to be 1 and the key value of the key value to the slots is equal to the key to be searched, otherwise executing the step (7);
(6) And if the key value pair to be deleted exists, setting the mark of the corresponding key value pair slot to 0, persisting the key value pair to a nonvolatile memory through a cache line refreshing instruction, and returning that the deletion is successful.
(7) The updated bucket pointer points to the next bucket pointer. And (4) executing the step (4).
Fig. 9 is a schematic diagram of a hash table processing apparatus according to an embodiment of the present invention, as shown in fig. 9, including:
A bloom filter setting unit 910, configured to set a bloom filter in the volatile filter layer, where keys of each key value pair stored in the hash bucket are inserted into the bloom filter and mapped with data of N bits in the bloom filter, where N is an integer greater than 1;
The hash table negative query unit 920 is configured to, when a value in a certain key value pair in the hash table needs to be searched or deleted, search a corresponding key in the bloom filter, and if the corresponding key does not exist, return a search failure or deletion failure indication to reduce access overhead to the persistent data layer when the key value pair does not exist in the hash table, thereby improving the negative query performance of the hash table.
The hash table inserting unit 930 is configured to firstly determine whether a key value pair to be inserted exists in the hash table when a certain key value pair needs to be inserted in the hash table, if so, return an insertion failure indication, if not, calculate a corresponding header bucket number according to a key and lock an inner range of the hash bucket corresponding to the header bucket number, find whether an idle key value pair slot exists in the locked inner range of the hash bucket, if so, insert the key value pair into the idle slot, and persist the inserted key value pair into the nonvolatile memory by adopting a merging refresh mode to complete key value pair insertion, where the merging refresh mode is used for reducing cache line refresh times and reducing insertion delay overhead of the hash table.
The hash table inserting unit 930 is configured to find whether there is an idle key value pair slot in the locked hash bucket, if not, allocate a new chain bucket from the nonvolatile memory space by using a head insertion method and assign a temporary bucket pointer, set the first slot of the new chain bucket as a position to be inserted with a key value pair, insert the key value pair to be inserted, and persist the inserted key value pair to the nonvolatile memory by using a merging refresh method, and then assign the temporary bucket pointer to the next bucket pointer of the head bucket pointer, and persist the next bucket pointer of the head bucket to the nonvolatile memory by using a cache line refresh instruction.
The header length setting unit 940 is configured to dynamically set an initial header length of the hash table according to a data load amount of the persistent data layer, where the initial header length is relatively longer and an average bucket chain length corresponding to each header is relatively shorter when the data load amount is relatively larger, so as to improve a positive query performance of the hash table.
It should be understood that, the foregoing apparatus is used to perform the method in the foregoing embodiment, and corresponding program modules in the apparatus implement principles and technical effects similar to those described in the foregoing method, and reference may be made to corresponding processes in the foregoing method for the working process of the apparatus, which are not repeated herein.
Based on the method in the above embodiment, the embodiment of the invention provides an electronic device. The apparatus may include at least one memory for storing a program and at least one processor for executing the program stored by the memory. Wherein the processor is adapted to perform the method described in the above embodiments when the program stored in the memory is executed.
Based on the method in the above embodiment, the embodiment of the present invention provides a computer-readable storage medium storing a computer program, which when executed on a processor, causes the processor to perform the method in the above embodiment.
Based on the method in the above embodiments, an embodiment of the present invention provides a computer program product, which when run on a processor causes the processor to perform the method in the above embodiments.
It is to be appreciated that the processor in embodiments of the invention may be a central processing unit (centralprocessing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signalprocessor, DSP), application Specific Integrated Circuits (ASIC), field programmable gate arrays (field programmable GATE ARRAY, FPGA), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. The general purpose processor may be a microprocessor, but in the alternative, it may be any conventional processor.
The method steps in the embodiments of the present invention may be implemented by hardware, or may be implemented by executing software instructions by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access memory (random access memory, RAM), flash memory, read-only memory (ROM), programmable ROM (PROM), erasable programmable ROM (erasable PROM, EPROM), electrically Erasable Programmable ROM (EEPROM), registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Drive (SSD)), etc.
It will be appreciated that the various numerical numbers referred to in the embodiments of the present invention are merely for ease of description and are not intended to limit the scope of the embodiments of the present invention.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (8)

1. A method of processing a hash table, the hash table comprising a plurality of hash buckets, each hash bucket storing a plurality of key-value pairs, the method comprising the steps of:
Setting a bloom filter in a volatile filter layer, wherein keys of each key value pair stored in the hash bucket are inserted into the bloom filter and are mapped with data of N bits in the bloom filter, and N is an integer greater than 1;
When a value in a certain key value pair in the hash table is required to be searched or deleted, searching a corresponding key in the bloom filter, if the corresponding key does not exist, returning a searching failure or deleting failure indication, so as to reduce the access cost to a persistent data layer when the key value pair does not exist in the hash table and improve the negative query performance of the hash table;
When a certain key value pair needs to be inserted into the hash table, judging whether the key value pair to be inserted exists in the hash table, and if so, returning an insertion failure indication;
If the hash table does not have the key value pair to be inserted, calculating a corresponding head barrel number according to the key, and locking the hash barrel inner range corresponding to the head barrel number;
searching whether an idle key value pair slot exists in the locked hash bucket or not, if so, inserting the key value pair into the idle slot, and persisting the inserted key value pair into a nonvolatile memory by adopting a merging refreshing mode to finish key value pair insertion, wherein the merging refreshing mode is used for reducing cache line refreshing times and reducing the insertion delay cost of a hash table.
2. The method of claim 1, further comprising the step of:
When a value in a certain key value pair in the hash table needs to be searched or deleted, searching a corresponding key in the bloom filter, and if the corresponding key exists, searching or deleting the corresponding key in the hash table according to the corresponding key value pair.
3. The method of claim 2 wherein a range lookup is performed in the locked hash bucket as to whether there is a free key pair slot, if not, a head insertion method is used to allocate a new chain bucket from the nonvolatile memory space and assign a temporary bucket pointer, the first slot of the new chain bucket is set as the location where the key pair is to be inserted, the key pair to be inserted is inserted, and the inserted key pair is persisted to the nonvolatile memory in a merge refresh mode, and then the temporary bucket pointer is assigned to the next bucket pointer of the head bucket pointer and the next bucket pointer of the head bucket is persisted to the nonvolatile memory using a cache line refresh instruction.
4. A method according to any one of claims 1 to 3, further comprising the step of:
And dynamically setting the initial head barrel length of the hash table according to the data load of the persistent data layer, wherein when the data load is relatively large, the initial head barrel length is relatively long, and the average barrel chain length corresponding to each head barrel is relatively short so as to improve the positive query performance of the hash table.
5. A processing apparatus for a hash table, the hash table comprising a persistent data layer including a plurality of hash buckets, each hash bucket storing a plurality of key-value pairs, the apparatus comprising:
a bloom filter setting unit for setting a bloom filter in the volatile filter layer; the key of each key value pair stored in the hash bucket is inserted into a bloom filter and is mapped with N bits of data in the bloom filter, wherein N is an integer greater than 1; the data bits mapped by keys of any two key-value pairs in the hash bucket in the bloom filter cannot be completely overlapped, and one data bit in the bloom filter can be mapped by at least one key;
The hash table negative query unit is used for searching corresponding keys in the bloom filter when a value in a certain key value pair in the hash table is required to be searched or deleted, if the corresponding keys do not exist, the corresponding key value pair does not exist in the hash table, and returning a searching failure or deleting failure indication so as to reduce the access cost to a persistent data layer when the key value pair does not exist in the hash table and improve the negative query performance of the hash table;
The hash table inserting unit is used for firstly judging whether a key value pair to be inserted exists in the hash table or not when a certain key value pair needs to be inserted in the hash table, returning an insertion failure indication if the key value pair to be inserted exists in the hash table, calculating a corresponding head barrel number according to a key and locking a hash barrel inner range corresponding to the head barrel number if the key value pair does not exist in the hash table, searching whether an idle key value pair slot exists in the locked hash barrel inner range, inserting the key value pair into the idle slot if the idle key value pair exists in the locked hash barrel inner range, and persisting the inserted key value pair into a nonvolatile memory by adopting a merging refreshing mode to complete key value pair insertion, wherein the merging refreshing mode is used for reducing cache line refreshing times and reducing the insertion delay expenditure of the hash table.
6. The apparatus of claim 5, wherein the hash table insertion unit is configured to search a locked hash bucket for a free key pair slot, if not, allocate a new chain bucket from the nonvolatile memory space by using a head insertion method and assign the new chain bucket to a temporary bucket pointer, set a first slot of the new chain bucket as a location where a key pair is to be inserted, insert the key pair to be inserted, and persist the inserted key pair to the nonvolatile memory by using a merge refresh method, and then assign the temporary bucket pointer to a next bucket pointer of the head bucket pointer and persist the next bucket pointer of the head bucket to the nonvolatile memory by using a cache line refresh instruction.
7. The apparatus according to claim 5 or 6, further comprising:
The head barrel length setting unit is used for dynamically setting the initial head barrel length of the hash table according to the data load capacity of the persistent data layer, when the data load capacity is relatively large, the initial head barrel length is relatively long, and the average barrel chain length corresponding to each head barrel is relatively short so as to improve the positive query performance of the hash table.
8. An electronic device, comprising:
at least one memory for storing a program;
at least one processor for executing the memory-stored program, which processor is adapted to perform the method according to any of claims 1-4 when the memory-stored program is executed.
CN202310630033.3A 2023-05-26 2023-05-26 Hash table processing method and device and electronic equipment Active CN116719813B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310630033.3A CN116719813B (en) 2023-05-26 2023-05-26 Hash table processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310630033.3A CN116719813B (en) 2023-05-26 2023-05-26 Hash table processing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN116719813A CN116719813A (en) 2023-09-08
CN116719813B true CN116719813B (en) 2025-08-19

Family

ID=87867130

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310630033.3A Active CN116719813B (en) 2023-05-26 2023-05-26 Hash table processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116719813B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117539408B (en) * 2024-01-09 2024-03-12 华中科技大学 Integrated index system for memory and calculation and key value pair memory system
US12314224B1 (en) 2024-01-09 2025-05-27 Huazhong University Of Science And Technology Storage-computation integrated indexing system and key-value pair storage system
CN118606327B (en) * 2024-08-08 2024-12-06 芯云晟(杭州)电子科技有限公司 Processing method, device, equipment and medium of elastic hash table

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342828A (en) * 2021-07-02 2021-09-03 广东唯审信息科技有限公司 Hash table conflict resolution method based on d-dimensional mapping
CN113342706A (en) * 2021-05-13 2021-09-03 武汉大学 Write-optimized extensible hash index structure based on nonvolatile memory and inserting, refreshing and deleting methods

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108287840B (en) * 2017-01-09 2022-05-03 北京大学 A Data Storage and Query Method Based on Matrix Hash

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342706A (en) * 2021-05-13 2021-09-03 武汉大学 Write-optimized extensible hash index structure based on nonvolatile memory and inserting, refreshing and deleting methods
CN113342828A (en) * 2021-07-02 2021-09-03 广东唯审信息科技有限公司 Hash table conflict resolution method based on d-dimensional mapping

Also Published As

Publication number Publication date
CN116719813A (en) 2023-09-08

Similar Documents

Publication Publication Date Title
CN116719813B (en) Hash table processing method and device and electronic equipment
US10706101B2 (en) Bucketized hash tables with remap entries
CN110888886B (en) Index structure, construction method, key value storage system and request processing method
JP6764359B2 (en) Deduplication DRAM memory module and its memory deduplication method
JP6356675B2 (en) Aggregation / grouping operation: Hardware implementation of hash table method
CN110377531B (en) Persistent memory storage engine device based on log structure and control method
US7805427B1 (en) Integrated search engine devices that support multi-way search trees having multi-column nodes
KR101599177B1 (en) Data migration for composite non-volatile storage device
CN104426770A (en) Routing lookup method, routing lookup device and method for constructing B-Tree tree structure
JP6893805B2 (en) Memory module duplicate memory removal method and DRAM memory module for that purpose
CN109683811A (en) A kind of request processing method mixing memory key-value pair storage system
US8086641B1 (en) Integrated search engine devices that utilize SPM-linked bit maps to reduce handle memory duplication and methods of operating same
CN104573112B (en) Page interrogation method and data processing node in OLTP Cluster Databases
US11747998B1 (en) Indexing technique for large scale distributed key-value systems
US20160203180A1 (en) Index tree search method and computer
US7987205B1 (en) Integrated search engine devices having pipelined node maintenance sub-engines therein that support database flush operations
CN106294189B (en) Memory defragmentation method and device
CN107766478A (en) A kind of design method of concurrent index structure towards high competition scene
CN114328500A (en) Data access method, device, equipment and computer readable storage medium
US10013442B2 (en) Database value identifier hash map
US7953721B1 (en) Integrated search engine devices that support database key dumping and methods of operating same
CN116974955A (en) Method, system, equipment and storage medium for caching data of persistent memory file
CN116340266A (en) Fine-grained file system and file read-write method
CN114207602B (en) Using probabilistic data structures to reduce requests
CN116401416A (en) A Persistent Mutable Cardinality Tree Access System Supporting Lock-Free Concurrent Access

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant