Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a processing method, a device and electronic equipment of a hash table, and aims to solve the problem that the processing performance of the existing hash table is to be improved.
To achieve the above object, in a first aspect, the present invention provides a method for processing a hash table, where the hash table includes a plurality of hash buckets, each of which stores a plurality of key value pairs, the method including the steps of:
Setting a bloom filter in a volatile filter layer, wherein keys of each key value pair stored in the hash bucket are inserted into the bloom filter and are mapped with data of N bits in the bloom filter, and N is an integer greater than 1;
when a value in a certain key value pair in the hash table needs to be searched or deleted, searching a corresponding key in the bloom filter, if the corresponding key does not exist, returning a searching failure or deleting failure indication, so as to reduce the access cost to a persistent data layer when the key value pair does not exist in the hash table, and improve the negative query performance of the hash table.
In one possible implementation, the method further includes the steps of:
When a value in a certain key value pair in the hash table needs to be searched or deleted, searching a corresponding key in the bloom filter, and if the corresponding key exists, searching or deleting the corresponding key in the hash table according to the corresponding key value pair.
In one possible implementation, the method further includes the steps of:
When a certain key value pair needs to be inserted into the hash table, judging whether the key value pair to be inserted exists in the hash table, and if so, returning an insertion failure indication;
If the hash table does not have the key value pair to be inserted, calculating a corresponding head barrel number according to the key, and locking the hash barrel inner range corresponding to the head barrel number;
searching whether an idle key value pair slot exists in the locked hash bucket or not, if so, inserting the key value pair into the idle slot, and persisting the inserted key value pair into a nonvolatile memory by adopting a merging refreshing mode to finish key value pair insertion, wherein the merging refreshing mode is used for reducing cache line refreshing times and reducing the insertion delay cost of a hash table.
In one possible implementation manner, whether an idle key value pair slot exists is searched in a locked hash bucket, if not, a new chain bucket is distributed from a nonvolatile memory space by adopting a head insertion method and assigned to a temporary bucket pointer, a first slot position of the new chain bucket is set to be a position where a key value pair is to be inserted, the key value pair to be inserted is inserted, the inserted key value pair is persisted to the nonvolatile memory by adopting a merging refreshing mode, then the temporary bucket pointer is assigned to a next bucket pointer of the head bucket pointer, and the next bucket pointer of the head bucket is persisted to the nonvolatile memory by using a cache line refreshing instruction.
In one possible implementation, the method further includes the steps of:
And dynamically setting the initial head barrel length of the hash table according to the data load of the persistent data layer, wherein when the data load is relatively large, the initial head barrel length is relatively long, and the average barrel chain length corresponding to each head barrel is relatively short so as to improve the positive query performance of the hash table.
In a second aspect, the present invention provides a processing apparatus for a hash table, the hash table including a persistent data layer, including a plurality of hash buckets, each hash bucket storing a plurality of key-value pairs, the apparatus comprising:
a bloom filter setting unit for setting a bloom filter in the volatile filter layer; the key of each key value pair stored in the hash bucket is inserted into a bloom filter and is mapped with N bits of data in the bloom filter, wherein N is an integer greater than 1; the data bits mapped by keys of any two key-value pairs in the hash bucket in the bloom filter cannot be completely overlapped, and one data bit in the bloom filter can be mapped by at least one key;
And the hash table negative query unit is used for searching corresponding keys in the bloom filter when the value in a certain key value pair in the hash table is required to be searched or deleted, returning a search failure or deletion failure indication if the corresponding key value pair does not exist in the hash table, reducing the access cost to the persistent data layer when the key value pair does not exist in the hash table, and improving the negative query performance of the hash table.
In one possible implementation, the apparatus further includes:
The hash table inserting unit is used for firstly judging whether a key value pair to be inserted exists in the hash table or not when a certain key value pair needs to be inserted in the hash table, returning an insertion failure indication if the key value pair to be inserted exists in the hash table, calculating a corresponding head barrel number according to a key and locking a hash barrel inner range corresponding to the head barrel number if the key value pair does not exist in the hash table, searching whether an idle key value pair slot exists in the locked hash barrel inner range, inserting the key value pair into the idle slot if the idle key value pair exists in the locked hash barrel inner range, and persisting the inserted key value pair into a nonvolatile memory by adopting a merging refreshing mode to complete key value pair insertion, wherein the merging refreshing mode is used for reducing cache line refreshing times and reducing the insertion delay expenditure of the hash table.
In one possible implementation manner, the hash table inserting unit is configured to search whether an idle key value pair slot exists in the locked hash bucket, if not, allocate a new chain bucket from the nonvolatile memory space by using a head insertion method and assign a temporary bucket pointer, set a first slot of the new chain bucket as a position to be inserted with a key value pair, insert the key value pair to be inserted, and persistence the inserted key value pair to the nonvolatile memory by using a merging refresh method, and then assign the temporary bucket pointer to a next bucket pointer of the head bucket pointer, and persistence the next bucket pointer of the head bucket to the nonvolatile memory by using a cache line refresh instruction.
In one possible implementation, the apparatus further includes:
The head barrel length setting unit is used for dynamically setting the initial head barrel length of the hash table according to the data load capacity of the persistent data layer, when the data load capacity is relatively large, the initial head barrel length is relatively long, and the average barrel chain length corresponding to each head barrel is relatively short so as to improve the positive query performance of the hash table.
In a third aspect, the invention provides an electronic device comprising at least one memory for storing a program, and at least one processor for executing the program stored in the memory, the processor being adapted to perform the method described in the first aspect or any one of the possible implementations of the first aspect when the program stored in the memory is executed.
In a fourth aspect, the present invention provides a computer readable storage medium storing a computer program which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
In a fifth aspect, the invention provides a computer program product which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
In general, the above technical solutions conceived by the present invention have the following beneficial effects compared with the prior art:
The invention provides a processing method, a device and electronic equipment of a hash table, when key value pairs which do not exist in the hash table are required to be searched or deleted, the traditional method is that negative inquiry is carried out in the hash table of a persistent data layer, all the key value pairs in the hash table are searched once, and when the key value pairs do not exist, the searching cost is large and unnecessary. The invention can rapidly locate the non-existing key through the bloom filter positioned in the volatile filter layer, and avoid the search of the non-existing key value pair in the persistent data layer, thereby improving the negative query performance and the deletion operation of the non-existing key value pair. According to the invention, when the hash table of the persistent data layer inserts key value pairs, cache line refreshing times are reduced through a head insertion method and a combined refreshing operation, and the insertion performance is improved. The invention reduces the average chain barrel length of each head barrel by setting the relatively longer head barrel length, thereby improving the positive inquiry performance.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In embodiments of the invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the description of the embodiments of the present invention, unless otherwise indicated, the meaning of "a plurality" means two or more.
Aiming at the defects and improvement demands of the prior art, the invention provides a processing method of a hash table to reduce negative search to accelerate a persistent hash index, and aims to improve the negative query performance of the persistent hash index, provide high multithreading expandability and simultaneously ensure low data consistency overhead.
The architecture diagram of the nonvolatile memory provided by the embodiment of the invention is shown in fig. 1, wherein the hash table can be a traditional chained hash or other hash varieties such as an extensible hash (extendible hashing). Volatile filter layer (Volatile FILTER LAYER, VFL) is a DRAM-based bloom filter that acts to reduce negative lookup operations on hash tables in PM, greatly enhancing the negative lookup performance of hash tables and reducing PM access overhead that is more expensive than DRAM. The persistent data layer (PERSISTENT DATA LAYER, PDL) is a hash table (may be a chain hash or other hash tables such as an extensible hash) based on a persistent memory, where, taking the chain hash as an example, the initial header Bucket length of the chain hash is set to N, where N may be dynamically set according to a scenario, and the chain Bucket is divided into a header Bucket (Head Bucket) and a chain Bucket (Chained Bucket), where each of the header Bucket and the chain Bucket is a multi-slot Bucket. In general, the longer the initial head barrel length, the better, so that the smaller the average barrel chain length of each corresponding head barrel, and the relatively smaller the positive query overhead. However, when the length of the head barrel is too long and the load capacity is relatively small, the memory space is wasted, so that a person skilled in the art can set the length of the head barrel according to the actual situation to integrate the memory and the forward query cost.
In addition, the persistent data layer needs to ensure multi-threaded lock-based concurrency correctness and low data consistency overhead at system failure, while the VFL needs to quickly recover from the consistent persistent data layer at system failure.
Fig. 2 is a mapping relationship diagram of bloom filter and key provided in an embodiment of the present invention. As shown in fig. 2, y 1、y2 and y 3 represent three keys, each key mapped to three bits of the bloom filter, the three bits interspersed with different locations. As can be seen from fig. 3, one bit can be mapped by at least one key, and the three bits of different key maps do not overlap completely. If the key is present, the data of the corresponding mapped three bits is set to 1, and if the key is not present, the data of the corresponding bits is set to 0.
To increase cache line efficiency, the size of the header and chain buckets may be an integer multiple of the 64 byte cache line size, may be 64 bytes in size, may be 128 bytes or 256 bytes, and the like. Here, a header bucket and a chain bucket with a size of 64 bytes are illustrated, for example, as shown in fig. 3. Where Token represents a tag, KV represents a key pair, next BP represents the Next Bucket Pointer (BP), and Dummy is the extra complement space added for 64 byte cache line alignment. A flag value of 0 indicates that the corresponding key-value pair slot is not used, and a value of 1 indicates that the corresponding key-value pair slot is occupied. The KV may be a fixed-length key-value pair size, or an 8-byte pointer to a variable-length key-value pair may be stored, where the fixed-length is taken as an example, and the key and value sizes are both 8 bytes.
Fig. 4 is a flowchart of a hash table processing method provided by an embodiment of the present invention, where the hash table includes a persistent data layer, and includes a plurality of hash buckets, each of which stores a plurality of key-value pairs, and the method includes the following steps:
s101, setting a bloom filter in a volatile filter layer, wherein keys of each key value pair stored in the hash bucket are inserted into the bloom filter and are mapped with N bits of data in the bloom filter, and N is an integer greater than 1;
S102, when a value in a certain key value pair in the hash table needs to be searched or deleted, searching a corresponding key in the bloom filter, if the corresponding key does not exist, returning a searching failure or deleting failure indication, so that access cost to a persistent data layer is reduced when the key value pair does not exist in the hash table, and negative query performance of the hash table is improved.
In one example, FIG. 5 is a hash bucket concurrency control diagram based on a lock provided by an embodiment of the present invention, and it can be seen that read operations employ shared read locks and write operations employ exclusive write locks, each lock size containing a range of 256 64 byte header buckets.
In one example, fig. 6 is a key-value pair insertion flow chart of a hash table according to an embodiment of the present invention, including the following steps:
(1) And calculating a hash value and a head bucket number according to a given key, and finding a bucket pointer corresponding to the chain hash in the persistent data layer.
(2) Before inserting the key value pairs, judging whether the key value pairs to be inserted exist in the chain hash of the nonvolatile memory. If yes, executing the step (3), otherwise executing the step (4).
(3) The hash table has key value pairs to be inserted, and the insertion failure is directly returned.
(4) And according to the calculated specific head barrel number, locking the lock range of the head barrel by adopting an intelligent mutual exclusion lock.
(5) Searching whether a free key value pair slot exists in the initial head barrel and the first chain barrel (if the free key value pair slot exists), executing the step (6) if the free slot position exists, and executing the step (7) if the free key value pair slot does not exist.
(6) And modifying the key value pair slot position value and the key into the value and the key of the key value pair to be inserted in sequence, and setting the mark of the slot position as 1. The modifications are all persisted to nonvolatile memory at a bucket granularity of 64 byte cache line size using a merge refresh approach. At this point the insertion was successful.
(7) A new chain bucket is distributed from a nonvolatile memory space by adopting a head insertion method and assigned to a temporary bucket pointer, a first slot position of the chain bucket is set as a position to be inserted with a key value pair, the value of the slot position, a key and a corresponding flag bit are modified in the same way, and the modification is persisted into PM in a combined cache line refreshing mode based on 8 byte atom update.
(8) And setting a memory fence instruction, assigning the temporary bucket pointer to the next bucket pointer of the head bucket pointer, and using a cache line refreshing instruction to persist the next bucket pointer of the head bucket into the nonvolatile memory.
(9) Setting a memory barrier instruction and inserting keys of the key value pair to be inserted into a bloom filter located in the volatile filter layer. Returning to successful insertion.
In one example, fig. 7 is a flowchart of a hash table lookup provided in an embodiment of the present invention, including the steps of:
(1) searching whether the bloom filter contains the key to be searched or not, if not, executing the step (2), and if so, executing the step (3);
(2) The key value pair to be searched does not exist, and the blank is directly returned;
(3) Calculating a hash value and a head bucket number according to a given key, finding a bucket pointer corresponding to chain hash in a persistent data layer, and locking by adopting a shared read lock;
(4) It is determined whether the bucket pointer is empty. If the air is empty, the circulation is finished to execute the step (2), otherwise, the step (5) is executed;
(5) Sequentially traversing a plurality of slots in the barrel, executing the step (6) if the corresponding mark is found to be 1 and the key value of the key value to the slots is equal to the key to be searched, otherwise executing the step (7);
(6) There are key-value pairs to be found, and the corresponding values are returned.
(7) The updated bucket pointer points to the next bucket pointer. And (4) executing the step (4).
In one example, fig. 8 is a flowchart of hash table deletion provided in an embodiment of the present invention, including the following steps:
(1) searching whether the bloom filter contains the key to be searched or not, if not, executing the step (2), and if so, executing the step (3);
(2) The key value pair to be deleted does not exist, and deletion failure is directly returned;
(3) Calculating a hash value and a head bucket number according to a given key, finding a bucket pointer corresponding to chain hash in a persistent data layer, and locking by adopting an intelligent mutual exclusion lock;
(4) It is determined whether the bucket pointer is empty. If the air is empty, the circulation is finished to execute the step (2), otherwise, the step (5) is executed;
(5) Sequentially traversing a plurality of slots in the barrel, executing the step (6) if the corresponding mark is found to be 1 and the key value of the key value to the slots is equal to the key to be searched, otherwise executing the step (7);
(6) And if the key value pair to be deleted exists, setting the mark of the corresponding key value pair slot to 0, persisting the key value pair to a nonvolatile memory through a cache line refreshing instruction, and returning that the deletion is successful.
(7) The updated bucket pointer points to the next bucket pointer. And (4) executing the step (4).
Fig. 9 is a schematic diagram of a hash table processing apparatus according to an embodiment of the present invention, as shown in fig. 9, including:
A bloom filter setting unit 910, configured to set a bloom filter in the volatile filter layer, where keys of each key value pair stored in the hash bucket are inserted into the bloom filter and mapped with data of N bits in the bloom filter, where N is an integer greater than 1;
The hash table negative query unit 920 is configured to, when a value in a certain key value pair in the hash table needs to be searched or deleted, search a corresponding key in the bloom filter, and if the corresponding key does not exist, return a search failure or deletion failure indication to reduce access overhead to the persistent data layer when the key value pair does not exist in the hash table, thereby improving the negative query performance of the hash table.
The hash table inserting unit 930 is configured to firstly determine whether a key value pair to be inserted exists in the hash table when a certain key value pair needs to be inserted in the hash table, if so, return an insertion failure indication, if not, calculate a corresponding header bucket number according to a key and lock an inner range of the hash bucket corresponding to the header bucket number, find whether an idle key value pair slot exists in the locked inner range of the hash bucket, if so, insert the key value pair into the idle slot, and persist the inserted key value pair into the nonvolatile memory by adopting a merging refresh mode to complete key value pair insertion, where the merging refresh mode is used for reducing cache line refresh times and reducing insertion delay overhead of the hash table.
The hash table inserting unit 930 is configured to find whether there is an idle key value pair slot in the locked hash bucket, if not, allocate a new chain bucket from the nonvolatile memory space by using a head insertion method and assign a temporary bucket pointer, set the first slot of the new chain bucket as a position to be inserted with a key value pair, insert the key value pair to be inserted, and persist the inserted key value pair to the nonvolatile memory by using a merging refresh method, and then assign the temporary bucket pointer to the next bucket pointer of the head bucket pointer, and persist the next bucket pointer of the head bucket to the nonvolatile memory by using a cache line refresh instruction.
The header length setting unit 940 is configured to dynamically set an initial header length of the hash table according to a data load amount of the persistent data layer, where the initial header length is relatively longer and an average bucket chain length corresponding to each header is relatively shorter when the data load amount is relatively larger, so as to improve a positive query performance of the hash table.
It should be understood that, the foregoing apparatus is used to perform the method in the foregoing embodiment, and corresponding program modules in the apparatus implement principles and technical effects similar to those described in the foregoing method, and reference may be made to corresponding processes in the foregoing method for the working process of the apparatus, which are not repeated herein.
Based on the method in the above embodiment, the embodiment of the invention provides an electronic device. The apparatus may include at least one memory for storing a program and at least one processor for executing the program stored by the memory. Wherein the processor is adapted to perform the method described in the above embodiments when the program stored in the memory is executed.
Based on the method in the above embodiment, the embodiment of the present invention provides a computer-readable storage medium storing a computer program, which when executed on a processor, causes the processor to perform the method in the above embodiment.
Based on the method in the above embodiments, an embodiment of the present invention provides a computer program product, which when run on a processor causes the processor to perform the method in the above embodiments.
It is to be appreciated that the processor in embodiments of the invention may be a central processing unit (centralprocessing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signalprocessor, DSP), application Specific Integrated Circuits (ASIC), field programmable gate arrays (field programmable GATE ARRAY, FPGA), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. The general purpose processor may be a microprocessor, but in the alternative, it may be any conventional processor.
The method steps in the embodiments of the present invention may be implemented by hardware, or may be implemented by executing software instructions by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access memory (random access memory, RAM), flash memory, read-only memory (ROM), programmable ROM (PROM), erasable programmable ROM (erasable PROM, EPROM), electrically Erasable Programmable ROM (EEPROM), registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Drive (SSD)), etc.
It will be appreciated that the various numerical numbers referred to in the embodiments of the present invention are merely for ease of description and are not intended to limit the scope of the embodiments of the present invention.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.