[go: up one dir, main page]

CN119988694A - Graph partition retrieval method, system and readable storage medium based on distance perception - Google Patents

Graph partition retrieval method, system and readable storage medium based on distance perception Download PDF

Info

Publication number
CN119988694A
CN119988694A CN202510472804.XA CN202510472804A CN119988694A CN 119988694 A CN119988694 A CN 119988694A CN 202510472804 A CN202510472804 A CN 202510472804A CN 119988694 A CN119988694 A CN 119988694A
Authority
CN
China
Prior art keywords
graph
distance
initial
query
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510472804.XA
Other languages
Chinese (zh)
Inventor
王鑫炜
李恒
黄炎
陈书俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Aikesheng Information Technology Co ltd
Original Assignee
Shanghai Aikesheng Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Aikesheng Information Technology Co ltd filed Critical Shanghai Aikesheng Information Technology Co ltd
Priority to CN202510472804.XA priority Critical patent/CN119988694A/en
Publication of CN119988694A publication Critical patent/CN119988694A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a graph division retrieval method, a system and a readable storage medium based on distance perception, wherein the method comprises the steps of constructing original data into an initial graph structure; selecting distance measurement, distributing weight for each edge, merging nearest neighbor node pairs based on the weight to obtain a coarsened graph structure, carrying out hierarchical segmentation on the coarsened graph structure, projecting back to the initial graph structure based on segmentation conditions to obtain a plurality of initial cutting edge sets, finding out corresponding nodes according to each initial cutting edge in the initial cutting edge sets, randomly placing the nodes into different partitions, optimizing each initial cutting edge set by taking the sum of minimized cutting edge weights as an objective function to obtain a graph index structure, inputting a query, calling the graph index structure, respectively calculating the distances between the query and the central nodes of each partition of the graph index structure, and selecting the partition with the nearest distance for retrieval. The method reduces the times of cross-partition query and improves the query efficiency.

Description

Graph division retrieval method and system based on distance perception and readable storage medium
Technical Field
The invention relates to the technical field of vector databases, in particular to a graph division retrieval method and system based on distance perception and a readable storage medium.
Background
In the field of computer vision, image matching and retrieval are core tasks, and are widely applied to the fields of face recognition, automatic driving, medical image analysis and the like. With explosive growth of image data volume (such as social media, medical image library, etc.), traditional retrieval methods based on text labels or low-level visual features (color, texture) gradually expose limitations:
1) The manual annotation has strong dependence, the traditional method needs to manually annotate the image content, has high cost and is difficult to cover complex scenes;
2) The feature dimension is high, the image features (such as SIFT and ORB) are usually high-dimensional vectors, the direct matching calculation complexity is as high as O (N2), and large-scale data are difficult to deal with;
semantic gap-low-level features are difficult to capture image semantic information, so that cross-modal retrieval effect is limited.
In order to solve the problems, an Approximate Nearest Neighbor (ANN) method becomes a research hotspot, and the core aim is to remarkably improve the retrieval efficiency on the premise of ensuring a certain precision by a dimension reduction or structured indexing technology.
The currently prevailing ANN methods can be divided into four classes:
1. tree-based methods (e.g., KD-Tree, ball-Tree) accelerate nearest neighbor search by spatial partitioning, but have limited effect on high-dimensional data;
2. based on quantization methods (e.g., LSH, PQ) by compressing the feature dimension through quantization, but quantization errors may affect the matching accuracy;
3. hash-based methods (e.g., CNNH, DPSH) achieve fast matching by hash coding, but are not robust to complex geometric deformations and illumination changes.
4. The graph-based method (such as Graph Neural Networks) realizes efficient retrieval by constructing an image similarity graph structure, has optimal performance, but has the following bottlenecks:
the construction cost is high, and a large amount of computation resources and storage space are consumed for constructing the graph of a large data set (such as millions of images);
the graph division defect is that a traditional graph division algorithm (such as METIS) aims at minimizing edge cut numbers, and can divide high-similarity nodes into different subgraphs, so that index performance is reduced;
Therefore, it is necessary to study a new graph partitioning algorithm to maintain the spatial proximity of data so that the query result is more accurate.
Disclosure of Invention
The invention aims to provide a graph division searching method, a system and a readable storage medium based on distance perception, which are used for solving the technical problems of the existing graph division algorithm, reducing the number of times of cross-partition query and improving the query efficiency and the accuracy of query results.
In order to achieve the above object, the present invention provides a graph division searching method based on distance sensing, comprising the following steps:
constructing the original data into an initial graph structure;
Selecting a distance measure, distributing weight for each edge, merging nearest neighbor node pairs based on the weight to gradually simplify the initial graph structure, and obtaining a simplified coarsened graph structure;
Hierarchical segmentation is carried out on the coarsened graph structure, and based on the segmentation condition, the coarsened graph structure is projected back to the initial graph structure in an iterative mode, and a plurality of initial cutting edge sets are finally obtained;
Finding out corresponding nodes according to each initial cutting edge in the initial cutting edge set, randomly putting the nodes into different partitions, and optimizing each initial cutting edge set one by taking the sum of minimized cutting edge weights as an objective function to obtain a final graph index structure;
and inputting a query, calling the graph index structure, respectively calculating the distances between the query and the central nodes of each partition of the graph index structure, and selecting the partition closest to the query for retrieval.
Optionally, the distance measure is used to measure the proximity between adjacent nodes, and the distance measure is euclidean distance, cosine similarity or manhattan distance.
Optionally, the weights are expressed as follows:
where w uv represents the weight, d uv represents the distance between adjacent nodes u and v, ε is a constant.
Based on the same inventive concept, the invention also provides a graph division retrieval system based on distance perception, which comprises:
the diagram construction module is used for constructing the original data into an initial diagram structure;
the coarsening module is used for selecting a distance measure, distributing weight to each edge, merging nearest neighbor node pairs based on the weight so as to gradually simplify the initial graph structure, and obtaining a simplified coarsened graph structure;
the graph segmentation module is used for carrying out hierarchical segmentation on the coarsened graph structure, and carrying out back projection on the initial graph structure in an iterative mode based on the segmentation condition to finally obtain a plurality of initial cutting edge sets;
The cutting edge optimizing module is used for finding out corresponding nodes according to each initial cutting edge in the initial cutting edge set, randomly placing the nodes into different partitions, and optimizing each initial cutting edge set one by taking the sum of minimized cutting edge weights as an objective function to obtain a final graph index structure;
And the retrieval module is used for inputting a query, calling the graph index structure, respectively calculating the distances between the query and the central nodes of the partitions of the graph index structure, and selecting the partition closest to the query for retrieval.
Optionally, the distance measure is used to measure the proximity between adjacent nodes, and the distance measure is euclidean distance, cosine similarity or manhattan distance.
Optionally, the weights are expressed as follows:
where w uv represents the weight, d uv represents the distance between adjacent nodes u and v, ε is a constant.
Based on the same inventive concept, the present invention also provides a readable storage medium having stored thereon a computer program which, when executed, enables the distance-aware based graph division search method as described above.
According to the graph division retrieval method, system and readable storage medium based on distance perception, the distance information among the nodes is considered in the division process, so that the nodes with the relatively close distance are effectively divided into the same partition, the number of times of cross-partition query is reduced, and the query efficiency is improved. Meanwhile, the spatial proximity of the data can be maintained by adopting a node partition mode of minimizing the sum of the edge cutting weights, so that the query result is more accurate.
Drawings
It will be appreciated by those skilled in the art that the drawings are provided for a better understanding of the invention and do not constitute any limitation on the scope of the invention. Wherein:
fig. 1 is a flowchart of a graph division searching method based on distance sensing according to an embodiment of the present invention.
Detailed Description
As described in the background, conventional graph partitioning methods have focused on minimizing the number of edges, often ignoring the distance between partitioned adjacent nodes, which may result in closer nodes being assigned to different partitions. In order to better maintain a local neighborhood structure, the invention provides a graph division retrieval method based on distance perception, which effectively divides nodes with relatively close distances into the same partition by considering the distance information among the nodes in the division process, thereby reducing the times of cross-partition query and improving the query efficiency. Meanwhile, the spatial proximity of the data can be maintained by adopting a node partition mode of minimizing the sum of the edge cutting weights, so that the query result is more accurate.
The invention will be described in further detail with reference to the drawings and the specific embodiments thereof in order to make the objects, advantages and features of the invention more apparent. It should be noted that the drawings are in a very simplified form and are all to a non-precise scale, merely for the purpose of facilitating and clearly aiding in the description of embodiments of the invention. For a better understanding of the invention with objects, features and advantages, refer to the drawings. It should be understood that the structures, proportions, sizes, etc. shown in the drawings are shown only in connection with the present disclosure for the understanding and reading of the present disclosure, and are not intended to limit the scope of the invention, which is defined by the appended claims, and any structural modifications, proportional changes, or dimensional adjustments, which may be made by the present disclosure, should fall within the scope of the present disclosure under the same or similar circumstances as the effects and objectives attained by the present invention.
As used in this disclosure, the singular forms "a," "an," and "the" include plural referents unless the content clearly dictates otherwise. As used in this disclosure, the term "or" is generally employed in its sense including "and/or" unless the content clearly dictates otherwise.
Referring to fig. 1, the present embodiment provides a graph division searching method based on distance sensing, which includes the following steps:
S1, constructing original data into an initial graph structure;
S2, selecting a distance measure, distributing weight to each edge, and merging nearest neighbor node pairs based on the weight to gradually simplify the initial graph structure, so as to obtain a simplified coarsened graph structure;
s3, carrying out hierarchical segmentation on the coarsened graph structure, and carrying out iterative projection on the initial graph structure based on segmentation conditions to finally obtain a plurality of initial cutting edge sets;
S4, finding out corresponding nodes according to each initial cutting edge in the initial cutting edge set, randomly placing the nodes into different partitions, and optimizing each initial cutting edge set one by taking the sum of minimized cutting edge weights as an objective function to obtain a final graph index structure;
S5, inputting a query, calling the graph index structure, respectively calculating the distances between the query and the central nodes of all the partitions of the graph index structure, and selecting the partition closest to the query for retrieval.
First, S1 is performed on the original data to construct an initial graph structure. In this embodiment, the original data may be constructed into the initial graph structure by using a conventional graph construction algorithm, which is not limited in the present invention.
And S2, selecting a distance measure, distributing weight to each edge, and merging nearest neighbor node pairs based on the weight to gradually simplify the initial graph structure, so as to obtain a simplified coarsened graph structure.
In this embodiment, the distance measure is used to measure the proximity between neighboring nodes, and common choices include euclidean distance, cosine similarity, and manhattan distance, according to the nature of the data set. Depending on the distance metric selected, each edge in the graph may be assigned a weight that helps prioritize the connections between neighboring nodes during the partitioning process.
After selecting the distance metric, each edge is assigned a weight that is inversely proportional to the distance connecting the nodes. This ensures that the edges connecting the closely related nodes get a higher weight. The weights are expressed as follows:
Where w uv represents the weight, d uv represents the distance between adjacent nodes u and v, ε is a constant to avoid dividing by zero.
This approach ensures that edges between closer nodes are prioritized during partitioning, thereby preserving local connectivity more efficiently.
The S2 is actually a stepwise simplification of the graph by merging node pairs. In implementations, node pairs that are weighted may be preferentially merged, the weights indicating that the node pairs are closer. By merging the nodes with their nearest neighbors, this stage preserves the critical neighborhood relationships, maintaining the local structure of the graph, which is important for accurate segmentation of the graph. This weighted coarsening method allows the graph to be reduced in size while also maintaining its proximity-based structure.
It should be noted that, the simplified diagrams mentioned herein refer to not only one simplification but also a plurality of simplifications. For example, the initial graph structure includes 100 nodes, 50 nodes remain after the first merging of node pairs, and 25 nodes remain after the second merging of node pairs, and so on, until a minimum and simplest version of the coarsened graph structure is obtained.
And then executing S3, carrying out hierarchical segmentation on the coarsened graph structure, and carrying out iterative mode back projection on the initial graph structure based on the segmentation condition to finally obtain a plurality of initial cutting edge sets.
In this embodiment, the partitions are divided according to the number of data points on the final coarsened graph structure, and the corresponding number of partitions is divided by how many points.
And S4, finding out corresponding nodes according to each initial cutting edge in the initial cutting edge set, randomly placing the nodes into different partitions, and optimizing each initial cutting edge set one by taking the sum of minimized cutting edge weights as an objective function to obtain a final graph index structure.
It is emphasized that the conventional graph partitioning algorithm minimizes the total number of the cut edges by not considering the intensity or importance of each edge, ignores the distance information between the nodes, and easily causes adjacent nodes to be partitioned into different partitions, thereby affecting the query efficiency and recall rate. In the technology, the optimization target is adjusted to minimize the sum of the edge cutting weights instead of the number of edges by introducing the distance-based weights, so that the possibility of separating closely connected nodes is effectively reduced by the design, the quality of the segmentation in the aspect of maintaining a local structure is improved, and meanwhile, the spatial proximity of data can be maintained, so that the query result is more accurate.
In this embodiment, the modified objective function becomes:
where E cut represents the initial set of edges and w uv represents the weight of each edge, reflecting the proximity between connected nodes.
In this embodiment, each initial cutting edge set is optimized in turn. The optimization process is further described below using one of the sets of cut edges as an example.
1) And finding out the corresponding graph nodes according to the initial cutting edge set, randomly placing the nodes into different partitions so as to update the initial cutting edge set, then respectively calculating which corresponding objective function value in each updated cutting edge set is the smallest, and ending the optimization of the initial cutting edge set after finding out the smallest value.
2) The optimization of the second initial set of cutting edges is then started, the method being consistent with the procedure described above.
3) And (3) until all the initial cutting edge sets are optimized, ending the whole optimization process.
It should be noted that, the process of updating the initial cutting edge set is the most accurate to traverse all the possibilities, but considering that the points involved are more (for example, the number of involved nodes is m), the schemes of moving the partitions are also correspondingly more (the schemes involved are 2 m), so a threshold value x can be set, and the minimum value of the objective function is selected after n partition schemes are calculated. The specific x value needs to be set according to the number of nodes involved, for example x=1/3×2 m.
And finally, executing S5, inputting a query, calling the graph index structure, respectively calculating the distances between the query and the central nodes of all the partitions of the graph index structure, and selecting the partition closest to the query for retrieval.
For example, after a query (data) is obtained, the query is first computed with the center node of each partition of the graph index structure obtained by the graph partitioning and searching method based on distance perception, so as to select which partition should be entered for computation, and finally top-k values are returned as final search results, so that the computation steps can be reduced, and the index efficiency is improved.
The recall rate and the query number per second of the graph division retrieval method based on distance perception provided by the invention are compared with those of the existing method through specific experimental data.
Experimental environment:
Ubuntu 20.04.5 LTS
CPU:Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz
256G memory
Data set SIFT1M, GIST
The recall rates for each of the methods for SIFT128, 1M when calculating 12000 distance calculations are shown in table 1 below:
TABLE 1
The number of queries per second for each method for SIFT128, 1M when querying 1000 queries is shown in table 2 below:
TABLE 2
GIST960, 1M recall rates for each method when calculating 12000 distance calculations are shown in table 3 below:
TABLE 3 Table 3
GIST960, 1M the number of queries per second for each method when querying 1000 queries is shown in table 4 below:
TABLE 4 Table 4
As can be seen from tables 1 to 4, the method of the present invention is superior to the conventional method in Recall (Recall) and Query Per Second (QPS) when the initial degrees of the same edges are identical.
Based on the same inventive concept, the embodiment of the invention also provides a graph division retrieval system based on distance perception, which comprises the following steps:
the diagram construction module is used for constructing the original data into an initial diagram structure;
the coarsening module is used for selecting a distance measure, distributing weight to each edge, merging nearest neighbor node pairs based on the weight so as to gradually simplify the initial graph structure, and obtaining a simplified coarsened graph structure;
the graph segmentation module is used for carrying out hierarchical segmentation on the coarsened graph structure, and carrying out back projection on the initial graph structure in an iterative mode based on the segmentation condition to finally obtain a plurality of initial cutting edge sets;
The cutting edge optimizing module is used for finding out corresponding nodes according to each initial cutting edge in the initial cutting edge set, randomly placing the nodes into different partitions, and optimizing each initial cutting edge set one by taking the sum of minimized cutting edge weights as an objective function to obtain a final graph index structure;
And the retrieval module is used for inputting a query, calling the graph index structure, respectively calculating the distances between the query and the central nodes of the partitions of the graph index structure, and selecting the partition closest to the query for retrieval.
In this embodiment, the distance measure is used to measure the proximity between adjacent nodes, and the distance measure is euclidean distance, cosine similarity, or manhattan distance.
Preferably, the weights are expressed as follows:
Where w uv represents the weight, d uv represents the distance between adjacent nodes u and v, ε is a constant to avoid dividing by zero.
Based on the same inventive concept, the embodiment of the invention also provides a readable storage medium, on which a computer program is stored, which when executed can implement the graph division searching method based on distance perception as described above.
The readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device, such as, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the preceding. More specific examples of a readable storage medium (a non-exhaustive list) include a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical encoding device, punch cards or in-groove protrusion structures such as those having instructions stored thereon, and any suitable combination of the foregoing. The computer program described herein may be downloaded from a readable storage medium to a respective computing/processing device or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives the computer program from the network and forwards the computer program for storage in a readable storage medium in the respective computing/processing device. Computer programs for performing the operations of the present invention can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer program may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuits, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for a computer program, which can execute computer-readable program instructions.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, systems, and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer programs. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the programs, when executed by the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer programs may also be stored in a readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the readable storage medium storing the computer program includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the computer program which is executed on the computer, other programmable apparatus or other devices implements the functions/acts specified in the flowchart and/or block diagram block or blocks.
In summary, the embodiments of the present invention provide a graph division search method, a system and a readable storage medium based on distance perception, which effectively divide nodes with a relatively close distance into the same partition by considering distance information between the nodes in the division process, thereby reducing the number of times of cross-partition query and improving the query efficiency. Meanwhile, the spatial proximity of the data can be maintained by adopting a node partition mode of minimizing the sum of the edge cutting weights, so that the query result is more accurate.
It should also be appreciated that while the present invention has been disclosed in the context of a preferred embodiment, the above embodiments are not intended to limit the invention. Many possible variations and modifications of the disclosed technology can be made by anyone skilled in the art without departing from the scope of the technology, or the technology can be modified to be equivalent. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims (7)

1.一种基于距离感知的图划分检索方法,其特征在于,包括以下步骤:1. A distance-aware graph partitioning retrieval method, comprising the following steps: 将原始数据构建为初始图结构;Construct the original data into an initial graph structure; 选择一个距离度量,为每条边分配权重,并基于所述权重对最近邻的节点对进行合并以逐步简化所述初始图结构,得到简化后的粗化图结构;Selecting a distance metric, assigning a weight to each edge, and merging nearest neighbor node pairs based on the weight to gradually simplify the initial graph structure to obtain a simplified coarsened graph structure; 对所述粗化图结构进行层次化分割,并基于分割情况以迭代方式回投影到所述初始图结构上,最终得到若干个初始割边集合;The coarsened graph structure is hierarchically segmented, and based on the segmentation situation, it is iteratively back-projected onto the initial graph structure, and finally a plurality of initial cut edge sets are obtained; 根据所述初始割边集合中的各条初始割边找到对应的节点,随机将所述节点放入不同的分区,以最小化割边权重的总和为目标函数,对各个初始割边集合逐一进行优化,得到最终的图索引结构;Find the corresponding node according to each initial cut edge in the initial cut edge set, randomly put the node into different partitions, and optimize each initial cut edge set one by one by minimizing the sum of the cut edge weights as the objective function to obtain the final graph index structure; 输入一查询query,调用所述图索引结构并分别计算所述查询query与所述图索引结构的各个分区的中心节点的距离,选择距离最近的分区进行检索。A query is input, the graph index structure is called, and the distances between the query and the central nodes of each partition of the graph index structure are calculated respectively, and the partition with the closest distance is selected for retrieval. 2.根据权利要求1所述的基于距离感知的图划分检索方法,其特征在于,所述距离度量用于衡量相邻节点之间的邻近度,所述距离度量为欧几里得距离、余弦相似度或曼哈顿距离。2. According to the distance-aware graph partitioning retrieval method according to claim 1, it is characterized in that the distance metric is used to measure the proximity between adjacent nodes, and the distance metric is Euclidean distance, cosine similarity or Manhattan distance. 3.根据权利要求1所述的基于距离感知的图划分检索方法,其特征在于,所述权重表示如下:3. The distance-aware graph partitioning retrieval method according to claim 1, wherein the weight is expressed as follows: 其中,wuv表示权重,duv表示相邻节点u和v之间的距离,ε为常数。Among them, w uv represents the weight, d uv represents the distance between adjacent nodes u and v, and ε is a constant. 4.一种基于距离感知的图划分检索系统,其特征在于,包括:4. A distance-aware graph partitioning retrieval system, comprising: 图构建模块,用于将原始数据构建为初始图结构;A graph construction module, used to construct the original data into an initial graph structure; 粗化模块,用于选择一个距离度量,为每条边分配权重,并基于所述权重对最近邻的节点对进行合并以逐步简化所述初始图结构,得到简化后的粗化图结构;A coarsening module, configured to select a distance metric, assign a weight to each edge, and merge nearest neighbor node pairs based on the weight to gradually simplify the initial graph structure to obtain a simplified coarsening graph structure; 图分割模块,用于对所述粗化图结构进行层次化分割,并基于分割情况以迭代方式回投影到所述初始图结构上,最终得到若干个初始割边集合;A graph segmentation module, used for hierarchically segmenting the coarsened graph structure, and iteratively back-projecting it onto the initial graph structure based on the segmentation situation, and finally obtaining a plurality of initial cut edge sets; 割边优化模块,用于根据所述初始割边集合中的各条初始割边找到对应的节点,随机将所述节点放入不同的分区,以最小化割边权重的总和为目标函数,对各个初始割边集合逐一进行优化,得到最终的图索引结构;The edge cutting optimization module is used to find the corresponding node according to each initial edge cutting in the initial edge cutting set, randomly put the nodes into different partitions, and optimize each initial edge cutting set one by one by minimizing the sum of edge cutting weights as the objective function to obtain the final graph index structure; 检索模块,用于输入一查询query,调用所述图索引结构并分别计算所述查询query与所述图索引结构的各个分区的中心节点的距离,选择距离最近的分区进行检索。The retrieval module is used to input a query, call the graph index structure and respectively calculate the distance between the query and the central node of each partition of the graph index structure, and select the partition with the closest distance for retrieval. 5.根据权利要求4所述的基于距离感知的图划分检索系统,其特征在于,所述距离度量用于衡量相邻节点之间的邻近度,所述距离度量为欧几里得距离、余弦相似度或曼哈顿距离。5. According to the distance-aware graph partitioning retrieval system of claim 4, it is characterized in that the distance metric is used to measure the proximity between adjacent nodes, and the distance metric is Euclidean distance, cosine similarity or Manhattan distance. 6.根据权利要求4所述的基于距离感知的图划分检索系统,其特征在于,所述权重表示如下:6. The distance-aware graph partitioning retrieval system according to claim 4, wherein the weight is expressed as follows: 其中,wuv表示权重,duv表示相邻节点u和v之间的距离,ε为常数。Among them, w uv represents the weight, d uv represents the distance between adjacent nodes u and v, and ε is a constant. 7.一种可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被执行时能实现根据权利要求1-3中任一项所述的基于距离感知的图划分检索方法。7. A readable storage medium having a computer program stored thereon, characterized in that when the computer program is executed, it can implement the distance-aware graph partitioning retrieval method according to any one of claims 1-3.
CN202510472804.XA 2025-04-16 2025-04-16 Graph partition retrieval method, system and readable storage medium based on distance perception Pending CN119988694A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510472804.XA CN119988694A (en) 2025-04-16 2025-04-16 Graph partition retrieval method, system and readable storage medium based on distance perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510472804.XA CN119988694A (en) 2025-04-16 2025-04-16 Graph partition retrieval method, system and readable storage medium based on distance perception

Publications (1)

Publication Number Publication Date
CN119988694A true CN119988694A (en) 2025-05-13

Family

ID=95632096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510472804.XA Pending CN119988694A (en) 2025-04-16 2025-04-16 Graph partition retrieval method, system and readable storage medium based on distance perception

Country Status (1)

Country Link
CN (1) CN119988694A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130230255A1 (en) * 2012-03-02 2013-09-05 Microsoft Corporation Image Searching By Approximate k-NN Graph
CN109272467A (en) * 2018-09-25 2019-01-25 南京大学 A Hierarchical Image Segmentation Method Based on Multiscale Edge Clues
CN114626331A (en) * 2022-03-14 2022-06-14 中山大学 Hypergraph segmentation method, hypergraph segmentation system and hypergraph segmentation medium based on hyperedge clustering coarsening
CN116992091A (en) * 2023-07-19 2023-11-03 中国人民解放军国防科技大学 A learning index construction method for massive high-dimensional data based on partition hierarchical graph
CN118916757A (en) * 2024-07-16 2024-11-08 武汉大学深圳研究院 Node classification method, system, equipment and product based on improved graph converter model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130230255A1 (en) * 2012-03-02 2013-09-05 Microsoft Corporation Image Searching By Approximate k-NN Graph
CN109272467A (en) * 2018-09-25 2019-01-25 南京大学 A Hierarchical Image Segmentation Method Based on Multiscale Edge Clues
CN114626331A (en) * 2022-03-14 2022-06-14 中山大学 Hypergraph segmentation method, hypergraph segmentation system and hypergraph segmentation medium based on hyperedge clustering coarsening
CN116992091A (en) * 2023-07-19 2023-11-03 中国人民解放军国防科技大学 A learning index construction method for massive high-dimensional data based on partition hierarchical graph
CN118916757A (en) * 2024-07-16 2024-11-08 武汉大学深圳研究院 Node classification method, system, equipment and product based on improved graph converter model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
余开亮: "清代的粮价与市场空间结构", 30 September 2023, 上海:上海社会科学院出版社, pages: 119 - 124 *
刘国华: "西安:西安电子科技大学出版社", 31 May 2023, 西安:西安电子科技大学出版社, pages: 122 - 125 *
陈可心;陈业斌;: "基于4-叉树结构的路网数据最近邻查询算法", 安徽工业大学学报(自然科学版), no. 03, 15 September 2020 (2020-09-15), pages 83 - 86 *

Similar Documents

Publication Publication Date Title
Peng et al. Efficient approximate nearest neighbor search in multi-dimensional databases
US9442929B2 (en) Determining documents that match a query
Yang et al. Pase: Postgresql ultra-high-dimensional approximate nearest neighbor search extension
CN105912611B (en) A Fast Image Retrieval Method Based on CNN
US10452691B2 (en) Method and apparatus for generating search results using inverted index
CN113821657B (en) Image processing model training method and image processing method based on artificial intelligence
JP2007521565A (en) Multidimensional data object search using bit vector index
WO2018184305A1 (en) Group search method based on social network, device, server and storage medium
WO2020006488A1 (en) Corpus generating method and apparatus, and human-machine interaction processing method and apparatus
CN110209895B (en) Vector retrieval method, device and equipment
CN108304585A (en) A kind of result data choosing method and relevant apparatus based on spatial key search
CN118964686A (en) Vector retrieval method, device, equipment and storage medium
Mandarapu et al. Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray Tracing
CN116705192A (en) Drug virtual screening method and device based on deep learning
CN118585626A (en) A large model knowledge question answering method and device for the power field
CN113448957B (en) A data query method and device
CN115129871B (en) Text category determining method, apparatus, computer device and storage medium
CN112395462A (en) Method, device, equipment and storage medium for searching matching subgraph in graph data stream
CN117454206B (en) Method, system, device and computer readable medium for clustering defects of wafers
CN114663579A (en) Twin three-dimensional model generation method and device, electronic device and storage medium
CN118964364A (en) A dynamic nearest neighbor search method for high-dimensional space vectors based on tree-graph structure
CN118964422A (en) Vector retrieval method, device, equipment and storage medium
CN118965081A (en) A method for identifying multiple accounts for one person based on online local sensitive hashing algorithm
US11386143B2 (en) Searching for analogue subsurface structures based on topological knowledge representation (TKR)
CN119988694A (en) Graph partition retrieval method, system and readable storage medium based on distance perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination