CN109815225B

CN109815225B - Parallelized prefix data retrieval method and system based on prefix tree structure

Info

Publication number: CN109815225B
Application number: CN201811511690.1A
Authority: CN
Inventors: 景全亮; 孙树冲; 毕经平
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2018-12-11
Filing date: 2018-12-11
Publication date: 2021-07-09
Anticipated expiration: 2038-12-11
Also published as: CN109815225A

Abstract

The invention relates to a method and system for retrieving prefix data, including: acquiring a network prefix represented by binary, constructing a prefix tree for the network prefix, segmenting the prefix tree to obtain a multi-segment segment tree; Retrieve network prefixes, process the to-be-retrieved network prefixes into segments, obtain multiple segment prefixes, perform concurrent retrieval on the segment tree corresponding to each segment prefix, and obtain the retrieval result r _i , indicating that the i-th segment is segmented. For the retrieval result of the segment prefix in the segment tree of the i-th segment, perform a logical AND operation on all retrieval results, and use the operation result as the retrieval result of the network prefix to be retrieved. The invention can use the segmented prefix tree structure to construct the information base for the prefix information, and can perform parallelized prefix matching search for the specific prefix, so as to realize the significant improvement of the prefix matching efficiency.

Description

Parallel prefix data retrieval method and system based on prefix tree structure

Technical Field

The invention relates to the field of information retrieval, in particular to a method and a system for retrieving parallelized prefix data based on a prefix tree structure.

Background

With the popularization of networks in the information era and the rapid development of emerging services such as big data and the like, network information shows an explosive growth trend. There is a huge amount of network prefix data in the global internet. In order to perform good management on the internet and perform timely detection and alarm on network abnormality, organization and construction of a network prefix information base become abnormally important.

A network prefix is a collection of a set of IP addresses, each prefix representing a small network. In order to perform good management on the internet and perform timely detection and alarm on network abnormality, organization and construction of a network prefix information base are indispensable.

There are two main ways for constructing the information base of the network prefix:

1. and constructing the information base in a hash table mode. In the mode, the prefix is used as a Key Value, the information to be stored is used as a Value, and the information is stored in a Key Value pair mode;

2. and constructing an information base based on the form of the binary prefix tree. The method expresses prefix information in a binary form and constructs a binary tree for the network prefix information. Each binary tree node represents a network prefix, and data information is stored in the node. The prefix query and matching process is carried out in the binary tree, namely, the branches of the binary tree are selected downwards layer by layer according to the bits of the binary representation of the prefix to be retrieved until the target prefix is found.

For the two modes, when the hash table mode faces massive network prefixes, the occupied storage space is large, byte-by-byte comparison is needed for processing the matching of the sub-prefixes or the longest prefix, and the real-time or quasi-real-time requirements under a big data scene are difficult to meet; the traditional prefix tree construction method solves the problems of the hash table method to a certain extent, but for an IPV4 network, the binary representation of the network prefix is 32 bits, and the binary representation always needs to be searched layer by layer from the root node in the prefix matching process, 8 layers need to be searched for the prefix with the length of 8, and 24 layers need to be searched for the prefix with the length of 24, so that the search time for the prefix with the longer length is longer.

Disclosure of Invention

The invention aims to solve the problems that in the storage and retrieval of massive network prefixes, the traditional prefix information base construction method (a hash table and a traditional prefix tree) has low efficiency in the execution of searching, inserting and deleting operations due to huge data volume, is difficult to maintain and the like, and provides a parallel prefix matching query technology based on a prefix tree structure. The technology of the invention is oriented to massive network prefixes, and the prefix tree structure is segmented to obtain the retrieval entries at each segment at the same time, so that the operations of searching, inserting and deleting are carried out concurrently, and the execution efficiency is greatly improved. The technology can remarkably improve the prefix information retrieval work under massive network prefixes in a parallelization mode, and greatly improves the retrieval efficiency.

In order to solve the above technical problem, an object of the present invention is to provide a prefix data retrieval method, including:

step 1, obtaining a network prefix represented by binary, constructing a prefix tree for the network prefix, and performing segmentation processing on the prefix tree to obtain a multi-segment segmented tree;

step 2, obtaining the binary expressed network prefix to be retrieved, carrying out segmentation processing on the network prefix to be retrieved to obtain a plurality of segmented prefixes, carrying out concurrent retrieval on the segmented trees corresponding to each segmented prefix to obtain a retrieval result r_iAnd representing the retrieval result of the ith segment prefix in the ith segment tree, performing logic AND operation on all the retrieval results, and taking the operation result as the retrieval result of the network prefix to be retrieved.

The prefix data retrieval method, wherein the segmentation processing in step 1 specifically includes: segmenting the prefix tree according to the preset segmentation quantity, wherein the obtained segmentation tree quantity is equal to the preset segmentation quantity;

the segmentation treatment in the step 2 specifically comprises the following steps: and segmenting the network prefix to be retrieved, wherein the number of the obtained approximate segmentation prefixes is equal to the preset segmentation number.

The prefix data retrieval method for performing segmentation processing on the prefix tree specifically comprises the following steps: the structure of the prefix tree is divided into one segment according to each 8 layers, wherein the ith segment is a segmented tree, and 256 segments exist^i-1And the index entry is used for being used as a matching entry of each parallelized segmented prefix during parallel retrieval, the matching entry of the 1 st segmented prefix is a root node of the prefix tree, the matching entry is 0, and the matching entry of the ith segmented prefix except the 1 st segmented prefix is an integer value of binary representation of the preceding (i-1) segment of the segmented prefix.

The prefix data retrieval method further comprises:

and 3, acquiring the network prefix to be inserted, carrying out segmentation processing on the network prefix to be inserted, and respectively carrying out parallel insertion operation on the segmented trees of the corresponding segments.

The prefix data retrieval method is characterized in that the network prefix is an IPV4 network prefix.

The invention also discloses a prefix data retrieval system, which comprises:

the prefix tree segmentation module is used for acquiring a network prefix represented by a binary system, constructing a prefix tree for the network prefix, and performing segmentation processing on the prefix tree to obtain a multi-segment segmented tree;

a retrieval module for obtaining binary expressed network prefix to be retrieved, carrying out segmentation processing on the network prefix to be retrieved to obtain a plurality of segmented prefixes, carrying out concurrent retrieval on the segmented tree corresponding to each segmented prefix to obtain retrieval result r_iAnd representing the retrieval result of the ith segment prefix in the ith segment tree, performing logic AND operation on all the retrieval results, and taking the operation result as the retrieval result of the network prefix to be retrieved.

In the prefix data retrieval system, the segmentation process in the prefix tree segmentation module specifically includes: segmenting the prefix tree according to the preset segmentation quantity, wherein the obtained segmentation tree quantity is equal to the preset segmentation quantity;

the segmentation processing in the retrieval module specifically comprises: and segmenting the network prefix to be retrieved, wherein the number of the obtained approximate segmentation prefixes is equal to the preset segmentation number.

The prefix data retrieval system, which performs segmentation processing on the prefix tree, specifically includes: the structure of the prefix tree is divided into one segment according to each 8 layers, wherein the ith segment is a segmented tree, and 256 segments exist^i-1And the index entry is used for being used as a matching entry of each parallelized segmented prefix during parallel retrieval, the matching entry of the 1 st segmented prefix is a root node of the prefix tree, the matching entry is 0, and the matching entry of the ith segmented prefix except the 1 st segmented prefix is an integer value of binary representation of the preceding (i-1) segment of the segmented prefix.

The prefix data retrieval system further includes:

and the inserting module is used for acquiring the network prefix to be inserted, carrying out segmentation processing on the network prefix to be inserted, and respectively carrying out parallel inserting operation on the segmented trees of the corresponding segments.

The prefix data retrieval system, wherein the network prefix is an IPV4 network prefix.

The invention thus has the technical advances comprising:

the network prefix and the segmentation representation method of the prefix tree structure can carry out segmentation representation on the traditional prefix tree structure, and a retrieval inlet of each segment can be determined; the parallelization prefix matching technology provided by the invention can perform prefix matching and searching in parallel on each section of prefix tree structure, thereby realizing the efficiency improvement of approximate linearity. The general technical effect is that the information base construction can be carried out on the prefix information by using a segmented prefix tree structure, the specific prefix can be parallelized to obtain prefix matching search, and the prefix matching efficiency is greatly improved.

Drawings

FIG. 1 is a schematic diagram of a segmented prefix tree;

FIG. 2 is a flow chart of the method of the present invention.

Detailed Description

When the inventor conducts prefix tree algorithm optimization research, the matching search algorithm of the traditional prefix tree is found to be highly efficient for processing prefixes with short length, and the processing time is gradually increased along with the increase of the prefix length. Now, with the improvement of CPU performance and the rapid development of parallel computing, the computing power of a computer is no longer the bottleneck of improving system efficiency, and parallel programming becomes a fundamental and important part of system design. The inventors consider applying parallel programming to algorithmic optimization of prefix trees. For all prefixes, each plurality of bits is divided into a plurality of sections, and index entries in each section can be found through the prefixes during prefix matching, so that parallel searching, inserting and deleting operations can be performed on each section, and the retrieval time of the prefixes with the length of 8 can be approximated even if the longest prefixes are matched, and the retrieval efficiency is improved.

In practical application, the invention can process matching query of massive network prefixes. The technology provided by the invention firstly segments the network prefix information so as to complete a prefix information base and store the prefix information in a segmented prefix tree; when searching and inserting a new prefix, the parallelization prefix matching technology is used for operating in the prefix tree to complete the parallelization matching query of the prefix.

As shown in fig. 2, the method of the present invention comprises: step 1, segmented representation of network prefixes and prefix tree structures. And acquiring a network prefix represented by a binary system, constructing a prefix tree for the network prefix, and performing segmentation processing on the prefix tree to obtain a multi-segment segmented tree.

And performing binary representation on the network prefix, and dividing each plurality of bits into a plurality of sections. Taking IPV4 network prefix as an example, each 8 bits is a segment, and its binary representation and corresponding prefix tree structure have 32 levels, it is divided into 4 segments, i.e. the prefix tree structure is also divided into 4 segments each 8 levels, where the 32 levels is determined by the network prefix itself of IPV 4. The network binary of IPV4 requires 32 bits, namely 32 layers; if 8 bits are taken as 1 segment, 4 segments are needed; for segments where i is a positive integer, there are 256^i-1An index entry, and for a particular prefix its index entry is specified by an integer number of binary representations of the first (i-1) segment prefix. In fig. 1, a prefix tree structure divided into 4 segments is shown, with dashed lines indicating the index of each segment and numbers above the nodes indicating the cable entries. 256 are leaf nodes of a full binary tree with a depth of 8. For a prefix tree with 32 layers of depth, every 8 layers are one segment, and the number of leaf nodes is 256, and the total number is 4 segments. Segment 1 has a root node as an index entry, 256^1-11 inlet; paragraph 2 depends on the leaf node index of the first paragraph for a total of 256^2-1256 inlets; similarly, the 3 rd section and the 4 th section respectively have 256^3-1、256^4-1An inlet. The index entry serves as a matching entry for each part (i.e., segment) of the parallelization when performing parallelization prefix matching.

The specific prefix refers to a target prefix given to be retrieved by a query. Taking 124.26.3.0/24, a 24-bit decimal network prefix as an example, because the length is 24, the invention only needs to consider the first 3 segments for each 8 bits: the 1 st segment index entry is the root node, and the index position (matching entry) is 0; the 2 nd segment index entry depends on the integer value of the 1 st segment, the index entry position is 124-1-123, the 3 rd segment index entry depends on the previous two segments, and the index entry position is 124-26-1-3223. Note: the index position value starts from 0.

And (3) representing the network prefix by segments, correspondingly segmenting the prefix tree, finding a retrieval inlet of each segment, and laying a foundation for parallel searching and matching. The tree structure layering is the operation corresponding to the prefix segmentation, and aims to perform parallelization retrieval on each layer after layering; it is the network prefix that is segmented that naturally leads to tree structure layering. The tree structure and the network prefix are in a corresponding relationship, and the 32-bit prefix corresponds to 32 layers of the tree structure. Every 8 bits of the prefix is a segment, and every 8 layers of the corresponding tree structure are segments.

The prefix tree is used as a storage structure for storing network prefixes, and the structure supports insertion and retrieval of prefixes. The prefix tree may be different in content in different scenarios. For example, a system needs a large amount of network prefix information to construct a prefix tree, and then prefixes used for constructing the prefix tree in the early stage are few (in an extreme case, the prefix tree is empty initially); the prefixes stored in the tree then grow progressively over time. At any point in time, a search can be made in the prefix tree.

And 2, matching parallelized prefixes. Based on the constructed segmented prefix tree structure, when a given prefix is subjected to matching query, concurrent search can be performed on each segment of the prefix tree structure, and after the search of each segment is completed, the search result of each segment is operated to obtain a final result. Obtaining binary expressed network prefix to be retrieved, carrying out segmentation processing on the network prefix to be retrieved to obtain a plurality of segmented prefixes, carrying out concurrent retrieval on the segmented trees corresponding to each segmented prefix to obtain retrieval result r_iAnd representing the retrieval result of the ith segment prefix in the ith segment tree, performing logic AND operation on all the retrieval results, and taking the operation result as the retrieval result of the network prefix to be retrieved.

The step 2 specifically comprises the following steps: and step 21, parallelizing prefix search. Segmenting prefixes to be searched, respectively searching on prefix trees of corresponding segments, and respectively obtaining results r_i(i is 1,2,3 … N, N is the number of segments of the prefix division, and if 8 bits are one segment, N is 4), the search is performed in parallel, and the segments are performed simultaneously. Still taking the prefix 124.26.3.0/24, 8 bits as a segment as an example, the specific search process:segment 1 search 124, binary represented as 01111100, sequentially finds whether nodes exist in the segment 1 prefix tree, if so, the result r₁Returning true, otherwise false; the 2 nd section and the 3 rd section are simultaneously carried out in the same way to obtain r₂And r₃. "corresponding" refers to a segment corresponding to a prefix segment to be searched. r is_iRepresenting the search result in the ith segment, true represents that the search is successful, false represents that the search is failed, and the final search result is:

r＝r₁&&r₂&&r₃…&&r_N

wherein "& &" represents a logical AND.

And 3, parallelizing prefix insertion. And segmenting the prefix to be inserted, and respectively performing insertion operation on the prefix trees of the corresponding segments. Taking the prefix 124.26.3.0/24 to be inserted, 8 bits as a segment, the specific insertion process is as follows: inserting 124 the 1 st segment, inserting 01111100 binary representation bits into the 1 st segment prefix tree according to 0/1, if the node exists, then continuing downward without repeated insertion; similarly, the 2 nd and 3 rd sections are inserted with 26(00011010) and 3(00000011) at the same time. Thus, the parallelized insert operation is complete. The method and the device realize the matching query of the prefixes by segmentation and concurrence, realize the parallelization search and insertion operation and improve the operation efficiency.

The following are system examples corresponding to the above method examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.

The invention also discloses a prefix data retrieval system, which comprises:

retrieval module for obtaining a binary representation to be examinedNetwork prefix is required, the network prefix to be retrieved is segmented to obtain a plurality of segmented prefixes, concurrent retrieval is carried out on the segmented tree corresponding to each segmented prefix to obtain a retrieval result r_iAnd representing the retrieval result of the ith segment prefix in the ith segment tree, performing logic AND operation on all the retrieval results, and taking the operation result as the retrieval result of the network prefix to be retrieved.

The prefix data retrieval system, which performs segmentation processing on the prefix tree specifically includes that the structure of the prefix tree is divided into one segment according to each 8 layers, wherein the ith segment of the segmented tree has 256 segments^i-1And the index entry is used for being used as a matching entry of each parallelized segmented prefix during parallel retrieval, the matching entry of the 1 st segmented prefix is a root node of the prefix tree, the matching entry is 0, and the matching entry of the ith segmented prefix except the 1 st segmented prefix is an integer value of binary representation of the preceding (i-1) segment of the segmented prefix.

The prefix data retrieval system further includes:

Claims

1. a prefix data retrieval method, is characterized in that, comprises:

Step 1, obtaining the network prefix represented by binary, and constructing a prefix tree for the network prefix, and segmenting the prefix tree to obtain a multi-segment segment tree;

Step 2: Obtain the network prefix to be retrieved in binary representation, perform segment processing on the to-be-retrieved network prefix to obtain multiple segment prefixes, perform concurrent retrieval on the segment tree corresponding to each segment prefix, and obtain a retrieval result r _i , represents the retrieval result of the segment prefix in the i-th segment in the segment-tree of the i-th segment, perform a logical AND operation on all the retrieval results, and use the operation result as the retrieval result of the network prefix to be retrieved.

2. The method for retrieving prefix data according to claim 1, wherein the segmentation processing in step 1 specifically comprises: segmenting the prefix tree according to a preset number of segments, and obtaining the segment tree of the segment tree. The number is equal to the preset number of segments;

The segmentation processing in step 2 specifically includes: segmenting the network prefix to be retrieved, and the obtained number of segment prefixes is equal to the preset number of segments.

3. The method for retrieving prefix data according to claim 2, wherein performing segmentation processing on the prefix tree specifically comprises: the structure of the prefix tree is a segment according to every 8 layers, wherein the i-th segment tree is divided into segments. , there are 256 ^i-1 index entries, the index entry is used as the matching entry of each segment prefix of parallelization during parallel retrieval, the matching entry of the segment prefix of the first segment is the root node of the prefix tree, and the matching entry is 0, the matching entry of the i-th segment prefix except the first segment segment prefix is the integer value of the binary representation of the preceding (i-1) segment segment prefix.

4. The prefix data retrieval method according to claim 1 or 3, further comprising:

Step 3: Obtain the network prefix to be inserted, perform segment processing on the network prefix to be inserted, and perform parallel insertion operations on the segment tree of the corresponding segment respectively.

5. The method for retrieving prefix data according to claim 1, wherein the network prefix is an IPV4 network prefix.

6. A prefix data retrieval system, characterized in that, comprising:

The prefix tree segmentation module is used to obtain the network prefix represented by binary, construct a prefix tree for the network prefix, and segment the prefix tree to obtain a multi-segment segment tree;

The retrieval module is used to obtain the binary representation of the to-be-retrieved network prefix, perform segment processing on the to-be-retrieved network prefix to obtain a plurality of segment prefixes, perform concurrent retrieval on the segment tree corresponding to each segment prefix, and obtain retrieval The result ri represents the retrieval result of the segment prefix _i in the segment tree of the i segment, perform logical AND operation on all the retrieval results, and use the operation result as the retrieval result of the network prefix to be retrieved.

7. The prefix data retrieval system according to claim 6, wherein the segmentation processing in the prefix tree segmentation module specifically comprises: segmenting the prefix tree according to a preset number of segments, and the obtained segment is obtained by segmenting the prefix tree. The number of segment trees is equal to the preset number of segments;

The segmentation processing in the retrieval module specifically includes: segmenting the network prefix to be retrieved, and the obtained number of segment prefixes is equal to the preset number of segments.

8 . The prefix data retrieval system according to claim 7 , wherein the segment processing of the prefix tree specifically comprises: the structure of the prefix tree is a segment according to every 8 layers, wherein the i-th segment tree is divided into one segment. 9 . , there are 256 ^i-1 index entries, the index entry is used as the matching entry of each segment prefix of parallelization during parallel retrieval, the matching entry of the segment prefix of the first segment is the root node of the prefix tree, and the matching entry is 0, the matching entry of the i-th segment prefix except the first segment segment prefix is the integer value of the binary representation of the preceding (i-1) segment segment prefix.

9. The prefix data retrieval system according to claim 6 or 8, further comprising:

The inserting module is used to obtain the network prefix to be inserted, perform segment processing on the network prefix to be inserted, and perform parallel insert operations on the segment tree of the corresponding segment respectively.

10. The prefix data retrieval system of claim 6, wherein the network prefix is an IPV4 network prefix.