[go: up one dir, main page]

CN116166755B - Search engine database management method, device and search engine database - Google Patents

Search engine database management method, device and search engine database Download PDF

Info

Publication number
CN116166755B
CN116166755B CN202310445218.7A CN202310445218A CN116166755B CN 116166755 B CN116166755 B CN 116166755B CN 202310445218 A CN202310445218 A CN 202310445218A CN 116166755 B CN116166755 B CN 116166755B
Authority
CN
China
Prior art keywords
computer device
device node
new
current
data storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310445218.7A
Other languages
Chinese (zh)
Other versions
CN116166755A (en
Inventor
秦朝阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202310445218.7A priority Critical patent/CN116166755B/en
Publication of CN116166755A publication Critical patent/CN116166755A/en
Application granted granted Critical
Publication of CN116166755B publication Critical patent/CN116166755B/en
Priority to PCT/CN2023/132695 priority patent/WO2024221868A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供一种搜索引擎数据库管理方法、装置及搜索引擎数据库,该方法包括:获取搜索引擎数据库对应的各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量;根据各计算机设备节点的当前数据存储量、当前存储使用率以及新分片的数量,确定目标计算机设备节点;将所述搜索引擎数据库的新分片,分配至所述目标计算机设备节点。通过综合考虑各计算机设备节点当前数据存储量、当前存储使用率以及新分片的数量,选择新分片对应的目标计算机设备节点,确保新分片分配至对应的目标计算机设备节点后,各个计算机设备节点的数据存储能够趋于平衡,提高了资源利用率,也为提高ES数据库的搜索效率奠定了基础。

Figure 202310445218

The present application provides a search engine database management method, device and search engine database. The method includes: obtaining the current data storage capacity of each computer equipment node corresponding to the search engine database, the current storage usage rate of each computer equipment node, and the new fragmentation according to the current data storage capacity of each computer device node, the current storage usage rate and the number of new fragments, determine the target computer device node; assign the new fragments of the search engine database to the target computer device node . By comprehensively considering the current data storage capacity of each computer device node, the current storage usage rate, and the number of new fragments, the target computer device node corresponding to the new fragment is selected to ensure that after the new fragment is allocated to the corresponding target computer device node, each computer The data storage of device nodes can be balanced, which improves resource utilization and lays the foundation for improving the search efficiency of ES databases.

Figure 202310445218

Description

一种搜索引擎数据库管理方法、装置及搜索引擎数据库Search engine database management method, device and search engine database

技术领域technical field

本申请涉及数据库技术领域,尤其涉及一种搜索引擎数据库管理方法、装置及搜索引擎数据库。The present application relates to the technical field of databases, in particular to a search engine database management method and device and a search engine database.

背景技术Background technique

Elasticsearch(ES)数据库是现今使用广泛且功能强大的分布式全文搜索引擎,其索引数据可以分散存储到多个计算机设备节点上,这里分散存储的数据块称作分片(Shard)。分片在多台计算机设备上的分配机制决定了ES数据库存储的平衡性,因此,在创建新的分片时,如何将其分配到对应的计算机设备节点成为了重点研究内容。The Elasticsearch (ES) database is a widely used and powerful distributed full-text search engine today. Its index data can be distributed and stored on multiple computer device nodes. The data blocks stored here are called shards. The allocation mechanism of shards on multiple computer devices determines the balance of ES database storage. Therefore, when creating new shards, how to allocate them to corresponding computer device nodes has become a key research content.

在现有技术中,通常将计算机设备节点列表按现有分片数量升序排序,找出当前拥有分片数量最少的计算机设备节点,然后将新创建的分片分配至当前拥有分片数量最少的计算机设备节点。In the prior art, the computer device node list is usually sorted in ascending order according to the number of existing shards, and the computer device node that currently has the fewest number of shards is found, and then the newly created shard is assigned to the node that currently has the fewest number of shards. Computer device node.

但是,在实际的生产环境当中,随着ES索引的开启和关闭、计算机设备节点的上下线及部分数据删除等情况的交替发生,若基于现有技术分配新创建的分片,将导致各个计算机设备节点出现数据存储严重不平衡的情况,不仅降低了资源利用率,也不利于保证ES数据库的搜索效率。However, in the actual production environment, as the ES index is turned on and off, computer device nodes go online and offline, and some data is deleted alternately, if the newly created shards are allocated based on the existing technology, it will cause each computer to The data storage of device nodes is seriously unbalanced, which not only reduces the resource utilization rate, but also is not conducive to ensuring the search efficiency of the ES database.

发明内容Contents of the invention

本申请提供一种搜索引擎数据库管理方法、装置及搜索引擎数据库,以解决现有技术导致各个计算机设备节点出现数据存储严重不平衡的情况,不仅降低了资源利用率,也利于保证ES数据库的搜索效率等缺陷。This application provides a search engine database management method, device and search engine database to solve the situation that the data storage of each computer device node is seriously unbalanced in the prior art, which not only reduces the resource utilization rate, but also helps to ensure the search of the ES database Efficiency and other defects.

本申请第一个方面提供一种搜索引擎数据库管理方法,包括:The first aspect of the present application provides a search engine database management method, including:

获取搜索引擎数据库对应的各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量;Obtain the current data storage capacity of each computer device node corresponding to the search engine database, the current storage usage rate of each computer device node, and the number of new fragments;

根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点;Determine the target computer device node according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments;

将所述搜索引擎数据库的新分片,分配至所述目标计算机设备节点。Allocating the new fragment of the search engine database to the target computer device node.

可选的,所述根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点,包括:Optionally, the determining the target computer device node according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments includes:

根据各所述计算机设备节点的当前存储使用率,确定各所述计算机设备节点之间的当前平衡评价指标;According to the current storage usage rate of each of the computer equipment nodes, determine the current balance evaluation index between each of the computer equipment nodes;

根据各所述计算机设备节点之间的当前平衡评价指标,确定节点选择约束条件;Determine node selection constraints according to the current balance evaluation indicators among the computer equipment nodes;

按照所述节点选择约束条件,根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点。According to the node selection constraints, the target computer device node is determined according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments.

可选的,所述根据各所述计算机设备节点的当前存储使用率,确定各所述计算机设备节点之间的当前平衡评价指标,包括:Optionally, the determining the current balance evaluation index between each of the computer equipment nodes according to the current storage usage rate of each of the computer equipment nodes includes:

根据各所述计算机设备节点的当前存储使用率,确定各所述计算机设备节点之间的当前存储使用率标准差;According to the current storage usage rate of each of the computer device nodes, determine the current storage usage rate standard deviation between each of the computer device nodes;

将所述当前存储使用率标准差,作为各所述计算机设备节点之间的当前平衡评价指标。The standard deviation of the current storage usage rate is used as the current balance evaluation index among the computer device nodes.

可选的,所述按照所述节点选择约束条件,根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点,包括:Optionally, according to the node selection constraints, determining the target computer device node according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments includes:

根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定待选计算机设备节点;According to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node and the number of new fragments, determine the computer device node to be selected;

预测将所述搜索引擎数据库的新分片分配至所述待选计算机设备节点后,各所述计算机设备节点之间的最新平衡评价指标;Predicting the latest balance evaluation index between each computer device node after the new fragment of the search engine database is allocated to the computer device node to be selected;

当所述最新平衡评价指标小于所述当前平衡评价指标时,将所述待选计算机设备节点,确定为所述目标计算设备节点。When the latest balance evaluation index is smaller than the current balance evaluation index, the computer device node to be selected is determined as the target computing device node.

可选的,所述根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定待选计算机设备节点,包括:Optionally, the determining the computer device node to be selected according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments includes:

根据各所述计算机设备节点的当前数据存储量,预测所述新分片的数据存储量;Predicting the data storage capacity of the new fragment according to the current data storage capacity of each of the computer device nodes;

根据各所述计算机设备节点的当前存储使用率、新分片的数量及所述新分片的数据存储量,确定待选计算机设备节点。According to the current storage usage rate of each of the computer equipment nodes, the number of new slices and the data storage capacity of the new slices, the computer equipment nodes to be selected are determined.

可选的,所述根据各所述计算机设备节点的当前数据存储量,预测所述新分片的数据存储量,包括:Optionally, the predicting the data storage capacity of the new fragment according to the current data storage capacity of each of the computer device nodes includes:

根据各所述计算机设备节点的当前数据存储量及各所述计算机设备节点的当前分片数量,预测所述新分片的数据存储量。According to the current data storage capacity of each of the computer device nodes and the current number of fragments of each of the computer device nodes, the data storage capacity of the new fragment is predicted.

可选的,所述根据各所述计算机设备节点的当前数据存储量及各所述计算机设备节点的当前分片数量,预测所述新分片的数据存储量,包括:Optionally, the predicting the data storage capacity of the new fragment according to the current data storage capacity of each of the computer device nodes and the current fragmentation quantity of each of the computer device nodes includes:

根据各所述计算机设备节点的当前数据存储量及各所述计算机设备节点的当前分片数量,确定所述搜索引擎数据库单个分片的平均数据存储量;According to the current data storage capacity of each described computer device node and the current fragmentation quantity of each described computer device node, determine the average data storage capacity of a single fragment of the search engine database;

将所述单个分片的平均数据存储量,作为所述新分片的数据存储量。The average data storage capacity of the single shard is used as the data storage capacity of the new shard.

可选的,所述根据各所述计算机设备节点的当前数据存储量及各所述计算机设备节点的当前分片数量,预测所述新分片的数据存储量,包括:Optionally, the predicting the data storage capacity of the new fragment according to the current data storage capacity of each of the computer device nodes and the current fragmentation quantity of each of the computer device nodes includes:

根据各所述计算机设备节点的当前数据存储量及各所述计算机设备节点的当前分片数量,确定所述搜索引擎数据库单个分片的几何平均数据存储量;According to the current data storage capacity of each described computer device node and the current fragmentation quantity of each described computer device node, determine the geometric mean data storage capacity of a single fragment of the search engine database;

将所述单个分片的几何平均数据存储量,作为所述新分片的数据存储量。The geometric mean data storage capacity of the single shard is used as the data storage capacity of the new shard.

可选的,所述根据各所述计算机设备节点的当前存储使用率、新分片的数量及所述新分片的数据存储量,确定待选计算机设备节点,包括:Optionally, the determining the computer device node to be selected according to the current storage usage rate of each computer device node, the number of new fragments and the data storage capacity of the new fragments includes:

根据各所述计算机设备节点的当前存储使用率,构建存储使用率汇总表;According to the current storage utilization rate of each described computer device node, construct a storage utilization rate summary table;

判断所述新分片的数量是否大于所述计算机设备节点总量;Judging whether the number of the new fragments is greater than the total number of computer device nodes;

当所述新分片的数量大于所述计算机设备节点总量时,根据所述存储使用率汇总表,筛选第一数量的新分片对应的待选计算机设备节点;When the number of the new fragments is greater than the total number of computer equipment nodes, according to the storage usage summary table, screen candidate computer equipment nodes corresponding to the first number of new fragments;

根据各所述待选计算机设备节点的当前数据存储量及所述新分片的数据存储量,更新各所述待选计算机设备节点的存储使用率,除所述待选计算机设备节点以外的其他计算机设备节点的存储使用率不变,以对所述存储使用率汇总表进行迭代更新,并将更新后的存储使用率汇总表,作为新的存储使用率汇总表;According to the current data storage capacity of each computer device node to be selected and the data storage capacity of the new fragment, update the storage usage rate of each computer device node to be selected, and other than the computer device node to be selected The storage utilization rate of the computer device node remains unchanged, so as to iteratively update the storage utilization rate summary table, and use the updated storage utilization rate summary table as a new storage utilization rate summary table;

更新所述新分片的数量减少第一数量,以对所述新分片的数量进行迭代更新,并返回至所述判断所述新分片的数量是否大于所述计算机设备节点总量的步骤。Updating the number of new shards to reduce the first amount, so as to iteratively update the number of new shards, and return to the step of judging whether the number of new shards is greater than the total number of computer device nodes .

可选的,所述当所述新分片的数量大于所述计算机设备节点总量时,根据所述存储使用率汇总表,筛选第一数量的新分片对应的待选计算机设备节点,包括:Optionally, when the number of new shards is greater than the total number of computer device nodes, according to the storage usage summary table, screening candidate computer device nodes corresponding to the first number of new shards includes :

当所述新分片的数量大于所述计算机设备节点总量时,根据所述存储使用率汇总表表征的各所述计算机设备节点的当前存储使用率,对各所述计算设备节点进行升序排序;When the number of new fragments is greater than the total number of computer device nodes, sort each computing device node in ascending order according to the current storage usage rate of each computer device node represented by the storage usage summary table ;

根据各所述计算设备节点的升序排序结果,从小到大筛选第一数量的待选计算机设备节点。According to the ascending sorting results of each computing device node, the first number of computer device nodes to be selected is screened from small to large.

可选的,所述方法还包括:Optionally, the method also includes:

根据所述计算机设备节点总量及预设第一分配比例,确定所述第一数量。The first quantity is determined according to the total number of computer equipment nodes and a preset first distribution ratio.

可选的,所述方法还包括:Optionally, the method also includes:

当所述新分片的数量不大于所述计算机设备节点总量时,判断所述新分片的数量是否等于1;When the number of the new fragments is not greater than the total number of computer device nodes, it is judged whether the number of the new fragments is equal to 1;

当所述新分片的数量等于1时,将所述存储使用率汇总表表征的当前存储使用率最低的计算机设备节点,确定为该新分片对应的待选计算机设备节点。When the number of new fragments is equal to 1, the computer device node with the lowest current storage utilization represented by the storage utilization summary table is determined as the candidate computer device node corresponding to the new fragment.

可选的,所述方法还包括:Optionally, the method also includes:

当所述新分片的数量不大于所述计算机设备节点总量,且所述新分片的数量大于1时,根据所述存储使用率汇总表,筛选第二数量的新分片对应的待选计算机设备节点;When the number of the new fragments is not greater than the total number of computer equipment nodes, and the number of the new fragments is greater than 1, according to the storage usage summary table, screen the second number of new fragments corresponding to the pending Select a computer device node;

根据各所述待选计算机设备节点的当前数据存储量及所述新分片的数据存储量,更新各所述待选计算机设备节点的存储使用率,除所述待选计算机设备节点以外的其他计算机设备节点的存储使用率不变,以对所述存储使用率汇总表进行迭代更新,并将更新后的存储使用率汇总表,作为新的存储使用率汇总表;According to the current data storage capacity of each computer device node to be selected and the data storage capacity of the new fragment, update the storage usage rate of each computer device node to be selected, and other than the computer device node to be selected The storage utilization rate of the computer device node remains unchanged, so as to iteratively update the storage utilization rate summary table, and use the updated storage utilization rate summary table as a new storage utilization rate summary table;

更新所述新分片的数量减少第二数量,以对所述新分片的数量进行迭代更新,并返回至所述判断所述新分片的数量是否等于1的步骤。Updating the number of new fragments by a second quantity, so as to iteratively update the number of new fragments, and return to the step of judging whether the number of new fragments is equal to 1.

可选的,所述当所述新分片的数量不大于所述计算机设备节点总量,且所述新分片的数量大于1时,根据所述存储使用率汇总表,筛选第二数量的新分片对应的待选计算机设备节点,包括:Optionally, when the number of new fragments is not greater than the total number of computer device nodes, and the number of new fragments is greater than 1, according to the storage usage summary table, the second number of The computer device nodes to be selected corresponding to the new shard, including:

当所述新分片的数量不大于所述计算机设备节点总量,且所述新分片的数量大于1时,根据所述存储使用率汇总表表征的各所述计算机设备节点的当前存储使用率,对各所述计算设备节点进行升序排序;When the number of new fragments is not greater than the total number of computer equipment nodes, and the number of new fragments is greater than 1, the current storage usage of each computer equipment node represented by the storage usage summary table rate, sorting each computing device node in ascending order;

根据各所述计算设备节点的升序排序结果,从小到大筛选第二数量的待选计算机设备节点。According to the ascending sorting results of each of the computing device nodes, the second number of computer device nodes to be selected is screened from small to large.

可选的,还包括:Optionally, also include:

根据所述计算机设备节点总量及预设第二分配比例,确定所述第二数量。The second quantity is determined according to the total amount of computer equipment nodes and a preset second distribution ratio.

可选的,所述获取搜索引擎数据库对应的各计算机设备节点的当前存储使用率,包括:Optionally, the obtaining the current storage usage rate of each computer device node corresponding to the search engine database includes:

获取各所述计算机设备节点的数据存储空间;Acquiring the data storage space of each computer device node;

针对任一所述计算机设备节点,根据该计算机设备节点的当前数据存储量和数据存储空间,确定该计算机设备节点的当前存储使用率。For any of the computer device nodes, the current storage usage rate of the computer device node is determined according to the current data storage capacity and data storage space of the computer device node.

本申请第二个方面提供一种搜索引擎数据库管理装置,包括:The second aspect of the present application provides a search engine database management device, including:

获取模块,用于获取搜索引擎数据库对应的各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量;An acquisition module, configured to acquire the current data storage capacity of each computer device node corresponding to the search engine database, the current storage usage rate of each computer device node, and the number of new fragments;

确定模块,用于根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点;A determining module, configured to determine the target computer device node according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments;

管理模块,用于将所述搜索引擎数据库的新分片,分配至所述目标计算机设备节点。The management module is used for distributing the new fragment of the search engine database to the target computer device node.

可选的,所述确定模块,具体用于:Optionally, the determination module is specifically used for:

根据各所述计算机设备节点的当前存储使用率,确定各所述计算机设备节点之间的当前平衡评价指标;According to the current storage usage rate of each of the computer equipment nodes, determine the current balance evaluation index between each of the computer equipment nodes;

根据各所述计算机设备节点之间的当前平衡评价指标,确定节点选择约束条件;Determine node selection constraints according to the current balance evaluation indicators among the computer equipment nodes;

按照所述节点选择约束条件,根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点。According to the node selection constraints, the target computer device node is determined according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments.

可选的,所述确定模块,具体用于:Optionally, the determination module is specifically used for:

根据各所述计算机设备节点的当前存储使用率,确定各所述计算机设备节点之间的当前存储使用率标准差;According to the current storage usage rate of each of the computer device nodes, determine the current storage usage rate standard deviation between each of the computer device nodes;

将所述当前存储使用率标准差,作为各所述计算机设备节点之间的当前平衡评价指标。The standard deviation of the current storage usage rate is used as the current balance evaluation index among the computer device nodes.

可选的,所述确定模块,具体用于:Optionally, the determination module is specifically used for:

根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定待选计算机设备节点;According to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node and the number of new fragments, determine the computer device node to be selected;

预测将所述搜索引擎数据库的新分片分配至所述待选计算机设备节点后,各所述计算机设备节点之间的最新平衡评价指标;Predicting the latest balance evaluation index between each computer device node after the new fragment of the search engine database is allocated to the computer device node to be selected;

当所述最新平衡评价指标小于所述当前平衡评价指标时,将所述待选计算机设备节点,确定为所述目标计算设备节点。When the latest balance evaluation index is smaller than the current balance evaluation index, the computer device node to be selected is determined as the target computing device node.

可选的,所述确定模块,具体用于:Optionally, the determination module is specifically used for:

根据各所述计算机设备节点的当前数据存储量,预测所述新分片的数据存储量;Predicting the data storage capacity of the new fragment according to the current data storage capacity of each of the computer device nodes;

根据各所述计算机设备节点的当前存储使用率、新分片的数量及所述新分片的数据存储量,确定待选计算机设备节点。According to the current storage usage rate of each of the computer equipment nodes, the number of new slices and the data storage capacity of the new slices, the computer equipment nodes to be selected are determined.

可选的,所述确定模块,具体用于:Optionally, the determination module is specifically used for:

根据各所述计算机设备节点的当前数据存储量及各所述计算机设备节点的当前分片数量,预测所述新分片的数据存储量。According to the current data storage capacity of each of the computer device nodes and the current number of fragments of each of the computer device nodes, the data storage capacity of the new fragment is predicted.

可选的,所述确定模块,具体用于:Optionally, the determination module is specifically used for:

根据各所述计算机设备节点的当前数据存储量及各所述计算机设备节点的当前分片数量,确定所述搜索引擎数据库单个分片的平均数据存储量;According to the current data storage capacity of each described computer device node and the current fragmentation quantity of each described computer device node, determine the average data storage capacity of a single fragment of the search engine database;

将所述单个分片的平均数据存储量,作为所述新分片的数据存储量。The average data storage capacity of the single shard is used as the data storage capacity of the new shard.

可选的,所述确定模块,具体用于:Optionally, the determination module is specifically used for:

根据各所述计算机设备节点的当前数据存储量及各所述计算机设备节点的当前分片数量,确定所述搜索引擎数据库单个分片的几何平均数据存储量;According to the current data storage capacity of each described computer device node and the current fragmentation quantity of each described computer device node, determine the geometric mean data storage capacity of a single fragment of the search engine database;

将所述单个分片的几何平均数据存储量,作为所述新分片的数据存储量。The geometric mean data storage capacity of the single shard is used as the data storage capacity of the new shard.

可选的,所述确定模块,具体用于:Optionally, the determination module is specifically used for:

根据各所述计算机设备节点的当前存储使用率,构建存储使用率汇总表;According to the current storage utilization rate of each described computer device node, construct a storage utilization rate summary table;

判断所述新分片的数量是否大于所述计算机设备节点总量;Judging whether the number of the new fragments is greater than the total number of computer device nodes;

当所述新分片的数量大于所述计算机设备节点总量时,根据所述存储使用率汇总表,筛选第一数量的新分片对应的待选计算机设备节点;When the number of the new fragments is greater than the total number of computer equipment nodes, according to the storage usage summary table, screen candidate computer equipment nodes corresponding to the first number of new fragments;

根据各所述待选计算机设备节点的当前数据存储量及所述新分片的数据存储量,更新各所述待选计算机设备节点的存储使用率,除所述待选计算机设备节点以外的其他计算机设备节点的存储使用率不变,以对所述存储使用率汇总表进行迭代更新,并将更新后的存储使用率汇总表,作为新的存储使用率汇总表;According to the current data storage capacity of each computer device node to be selected and the data storage capacity of the new fragment, update the storage usage rate of each computer device node to be selected, and other than the computer device node to be selected The storage utilization rate of the computer device node remains unchanged, so as to iteratively update the storage utilization rate summary table, and use the updated storage utilization rate summary table as a new storage utilization rate summary table;

更新所述新分片的数量减少第一数量,以对所述新分片的数量进行迭代更新,并返回至所述判断所述新分片的数量是否大于所述计算机设备节点总量的步骤。Updating the number of new shards to reduce the first amount, so as to iteratively update the number of new shards, and return to the step of judging whether the number of new shards is greater than the total number of computer device nodes .

可选的,所述确定模块,具体用于:Optionally, the determination module is specifically used for:

当所述新分片的数量大于所述计算机设备节点总量时,根据所述存储使用率汇总表表征的各所述计算机设备节点的当前存储使用率,对各所述计算设备节点进行升序排序;When the number of new fragments is greater than the total number of computer device nodes, sort each computing device node in ascending order according to the current storage usage rate of each computer device node represented by the storage usage summary table ;

根据各所述计算设备节点的升序排序结果,从小到大筛选第一数量的待选计算机设备节点。According to the ascending sorting results of each computing device node, the first number of computer device nodes to be selected is screened from small to large.

可选的,所述确定模块,还用于:Optionally, the determination module is also used for:

根据所述计算机设备节点总量及预设第一分配比例,确定所述第一数量。The first quantity is determined according to the total number of computer equipment nodes and a preset first distribution ratio.

可选的,所述确定模块,还用于:Optionally, the determination module is also used for:

当所述新分片的数量不大于所述计算机设备节点总量时,判断所述新分片的数量是否等于1;When the number of the new fragments is not greater than the total number of computer device nodes, it is judged whether the number of the new fragments is equal to 1;

当所述新分片的数量等于1时,将所述存储使用率汇总表表征的当前存储使用率最低的计算机设备节点,确定为该新分片对应的待选计算机设备节点。When the number of new fragments is equal to 1, the computer device node with the lowest current storage utilization represented by the storage utilization summary table is determined as the candidate computer device node corresponding to the new fragment.

可选的,所述确定模块,还用于:Optionally, the determination module is also used for:

当所述新分片的数量不大于所述计算机设备节点总量,且所述新分片的数量大于1时,根据所述存储使用率汇总表,筛选第二数量的新分片对应的待选计算机设备节点;When the number of the new fragments is not greater than the total number of computer equipment nodes, and the number of the new fragments is greater than 1, according to the storage usage summary table, screen the second number of new fragments corresponding to the pending Select a computer device node;

根据各所述待选计算机设备节点的当前数据存储量及所述新分片的数据存储量,更新各所述待选计算机设备节点的存储使用率,除所述待选计算机设备节点以外的其他计算机设备节点的存储使用率不变,以对所述存储使用率汇总表进行迭代更新,并将更新后的存储使用率汇总表,作为新的存储使用率汇总表;According to the current data storage capacity of each computer device node to be selected and the data storage capacity of the new fragment, update the storage usage rate of each computer device node to be selected, and other than the computer device node to be selected The storage utilization rate of the computer device node remains unchanged, so as to iteratively update the storage utilization rate summary table, and use the updated storage utilization rate summary table as a new storage utilization rate summary table;

更新所述新分片的数量减少第二数量,以对所述新分片的数量进行迭代更新,并返回至所述判断所述新分片的数量是否等于1的步骤。Updating the number of new fragments by a second quantity, so as to iteratively update the number of new fragments, and return to the step of judging whether the number of new fragments is equal to 1.

可选的,所述确定模块,具体用于:Optionally, the determination module is specifically used for:

当所述新分片的数量不大于所述计算机设备节点总量,且所述新分片的数量大于1时,根据所述存储使用率汇总表表征的各所述计算机设备节点的当前存储使用率,对各所述计算设备节点进行升序排序;When the number of new fragments is not greater than the total number of computer equipment nodes, and the number of new fragments is greater than 1, the current storage usage of each computer equipment node represented by the storage usage summary table rate, sorting each computing device node in ascending order;

根据各所述计算设备节点的升序排序结果,从小到大筛选第二数量的待选计算机设备节点。According to the ascending sorting results of each of the computing device nodes, the second number of computer device nodes to be selected is screened from small to large.

可选的,所述确定模块,还用于:Optionally, the determination module is also used for:

根据所述计算机设备节点总量及预设第二分配比例,确定所述第二数量。The second quantity is determined according to the total amount of computer equipment nodes and a preset second distribution ratio.

可选的,所述获取模块,具体用于:Optionally, the acquiring module is specifically used for:

获取各所述计算机设备节点的数据存储空间;Acquiring the data storage space of each computer device node;

针对任一所述计算机设备节点,根据该计算机设备节点的当前数据存储量和数据存储空间,确定该计算机设备节点的当前存储使用率。For any of the computer device nodes, the current storage usage rate of the computer device node is determined according to the current data storage capacity and data storage space of the computer device node.

本申请第三个方面提供一种搜索引擎数据库,包括:若干个计算机设备节点;The third aspect of the present application provides a search engine database, including: several computer equipment nodes;

任一所述计算机设备节点执行如上第一个方面以及第一个方面各种可能的设计所述的方法。Any one of the computer device nodes executes the method described in the above first aspect and various possible designs of the first aspect.

本申请第四个方面提供一种电子设备,包括:至少一个处理器和存储器;A fourth aspect of the present application provides an electronic device, including: at least one processor and a memory;

所述存储器存储计算机执行指令;the memory stores computer-executable instructions;

所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如上第一个方面以及第一个方面各种可能的设计所述的方法。The at least one processor executes the computer-executed instructions stored in the memory, so that the at least one processor executes the method described in the above first aspect and various possible designs of the first aspect.

本申请第五个方面提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一个方面以及第一个方面各种可能的设计所述的方法。The fifth aspect of the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and when the processor executes the computer-executable instructions, the above first aspect and the first Aspects of various possible designs of the described method.

本申请技术方案,具有如下优点:The technical solution of the present application has the following advantages:

本申请提供一种搜索引擎数据库管理方法、装置及搜索引擎数据库,该方法包括:获取搜索引擎数据库对应的各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量;根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点;将所述搜索引擎数据库的新分片,分配至所述目标计算机设备节点。上述方案提供的方法,通过综合考虑各计算机设备节点当前数据存储量、当前存储使用率以及新分片的数量,选择新分片对应的目标计算机设备节点,确保新分片分配至对应的目标计算机设备节点后,各个计算机设备节点的数据存储能够趋于平衡,提高了资源利用率,也为提高ES数据库的搜索效率奠定了基础。The present application provides a search engine database management method, device, and search engine database, the method including: obtaining the current data storage capacity of each computer equipment node corresponding to the search engine database, the current storage usage rate of each computer equipment node, and the new fragmentation according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments, determine the target computer device node; distribute the new fragments of the search engine database to all Describe the target computer device node. The method provided by the above solution, by comprehensively considering the current data storage capacity of each computer device node, the current storage usage rate and the number of new fragments, selects the target computer device node corresponding to the new fragment to ensure that the new fragment is allocated to the corresponding target computer After the device node, the data storage of each computer device node can be balanced, which improves resource utilization and also lays the foundation for improving the search efficiency of the ES database.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present application, and those skilled in the art can also obtain other drawings based on these drawings.

图1为本申请实施例基于的搜索引擎数据库管理系统的结构示意图;Fig. 1 is the structural representation of the search engine database management system based on the embodiment of the present application;

图2为本申请实施例提供的搜索引擎数据库管理方法的流程示意图;FIG. 2 is a schematic flow diagram of a search engine database management method provided by an embodiment of the present application;

图3为本申请实施例提供的一种示例性的搜索引擎数据库管理方法的流程示意图;FIG. 3 is a schematic flow diagram of an exemplary search engine database management method provided by an embodiment of the present application;

图4为本申请实施例提供的另一种示例性的搜索引擎数据库管理方法的流程示意图;FIG. 4 is a schematic flowchart of another exemplary search engine database management method provided by the embodiment of the present application;

图5为本申请实施例提供的搜索引擎数据库管理装置的结构示意图;FIG. 5 is a schematic structural diagram of a search engine database management device provided in an embodiment of the present application;

图6为本申请实施例提供的搜索引擎数据库的结构示意图;FIG. 6 is a schematic structural diagram of a search engine database provided by an embodiment of the present application;

图7为本申请实施例提供的电子设备的结构示意图。FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本公开构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。By means of the above drawings, specific embodiments of the present application have been shown, which will be described in more detail hereinafter. These drawings and written description are not intended to limit the scope of the disclosed concept in any way, but to illustrate the concept of the application for those skilled in the art by referring to specific embodiments.

具体实施方式Detailed ways

为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.

首先对本申请所涉及的名词进行解释:First, the nouns involved in this application are explained:

分片:分片是ES数据库在集群中分发数据的关键,是最小工作单元,存储了索引数据的一部分。分片可以是主分片(primary shard)或者是副本分片(replica shard)。Fragmentation: Fragmentation is the key to distributing data in the cluster of ES databases. It is the smallest unit of work and stores part of the index data. Shards can be primary shards or replica shards.

此外,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。在以下各实施例的描述中,“多个”的含义是两个以上,除非另有明确具体的限定。In addition, the terms "first", "second", etc. are used for descriptive purposes only, and should not be understood as indicating or implying relative importance or implicitly specifying the quantity of the indicated technical features. In the descriptions of the following embodiments, "plurality" means two or more, unless otherwise specifically defined.

在现有技术中,通常将计算机设备节点列表按现有分片数量升序排序,找出当前拥有分片数量最少的计算机设备节点,然后将新创建的分片分配至当前拥有分片数量最少的计算机设备节点。但是,在实际的生产环境当中,随着ES索引的开启和关闭、计算机设备节点的上下线及分数据删除等情况的交替发生,若基于现有技术分配新创建的分片,将导致各个计算机设备节点出现数据存储严重不平衡的情况,不仅降低了资源利用率,线上服务质量下降或业务停顿等后果,也不利于保证ES数据库的搜索效率。In the prior art, the computer device node list is usually sorted in ascending order according to the number of existing shards, and the computer device node that currently has the fewest number of shards is found, and then the newly created shard is assigned to the node that currently has the fewest number of shards. Computer device node. However, in the actual production environment, as the ES index is turned on and off, computer device nodes go online and offline, and data deletion occurs alternately, if the newly created shards are allocated based on the existing technology, it will cause each computer to The serious imbalance of data storage in the device nodes not only reduces the resource utilization rate, the online service quality declines or the business stops, but it is also not conducive to ensuring the search efficiency of the ES database.

针对上述问题,本申请实施例提供的搜索引擎数据库管理方法、装置及搜索引擎数据库,该方法包括:获取搜索引擎数据库对应的各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量;根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点;将搜索引擎数据库的新分片,分配至目标计算机设备节点。上述方案提供的方法,通过综合考虑各计算机设备节点当前数据存储量、当前存储使用率以及新分片的数量,选择新分片对应的目标计算机设备节点,确保新分片分配至对应的目标计算机设备节点后,各个计算机设备节点的数据存储能够趋于平衡,提高了资源利用率,也为提高ES数据库的搜索效率奠定了基础。In view of the above problems, the embodiment of the present application provides a search engine database management method, device and search engine database, the method includes: obtaining the current data storage capacity of each computer equipment node corresponding to the search engine database, the current storage usage of each computer equipment node rate and the number of new fragments; according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node and the number of new fragments, determine the target computer device node; the new fragmentation of the search engine database, Assigned to the target computer device node. The method provided by the above solution, by comprehensively considering the current data storage capacity of each computer device node, the current storage usage rate and the number of new fragments, selects the target computer device node corresponding to the new fragment to ensure that the new fragment is allocated to the corresponding target computer After the device node, the data storage of each computer device node can be balanced, which improves resource utilization and also lays the foundation for improving the search efficiency of the ES database.

下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本发明实施例进行描述。The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below in conjunction with the accompanying drawings.

首先,对本申请所基于的搜索引擎数据库管理系统的结构进行说明:First, the structure of the search engine database management system on which this application is based is explained:

本申请实施例提供的搜索引擎数据库管理方法、装置及搜索引擎数据库,适用于在搜索引擎数据库增加新分片时,将新分片分配到目标计算机设备节点,以使搜索引擎数据库的分片平衡分配。如图1所示,为本申请实施例基于的搜索引擎数据库管理系统的结构示意图,主要包括数据采集装置及搜索引擎数据库,搜索引擎数据库包括若干个计算机设备节点,各计算机设备节点均配置有搜索引擎数据库管理装置,数据采集装置用于采集搜索引擎数据库对应的各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,并将采集到的信息发送给任一计算机设备节点中的搜索引擎数据库管理装置,以基于搜索引擎数据库管理装置,根据得到的信息,确定新分片的目标计算机设备节点。The search engine database management method, device, and search engine database provided by the embodiments of the present application are suitable for distributing new fragments to target computer equipment nodes when new fragments are added to the search engine database, so that the fragments of the search engine database are balanced distribute. As shown in Figure 1, it is a schematic structural diagram of the search engine database management system based on the embodiment of the present application, which mainly includes a data acquisition device and a search engine database. The search engine database includes several computer equipment nodes, and each computer equipment node is equipped with a search engine database. The engine database management device, the data acquisition device is used to collect the current data storage capacity of each computer equipment node corresponding to the search engine database, the current storage utilization rate of each computer equipment node and the number of new fragments, and send the collected information to The search engine database management device in any computer device node determines the target computer device node of the new fragment based on the search engine database management device and the obtained information.

本申请实施例提供了一种搜索引擎数据库管理方法,用于在搜索引擎数据库增加新分片时,将新分片分配到目标计算机设备节点,以使搜索引擎数据库的分片平衡分配。本申请实施例的执行主体为电子设备,比如服务器、台式电脑、笔记本电脑、平板电脑及其他可用于管理搜索引擎数据库管理的电子设备。The embodiment of the present application provides a search engine database management method, which is used for allocating new fragments to target computer device nodes when new fragments are added to the search engine database, so as to balance the distribution of fragments of the search engine database. The execution subject of the embodiment of the present application is an electronic device, such as a server, a desktop computer, a notebook computer, a tablet computer and other electronic devices that can be used to manage the search engine database.

如图2所示,为本申请实施例提供的搜索引擎数据库管理方法的流程示意图,该方法包括:As shown in Figure 2, it is a schematic flow chart of the search engine database management method provided by the embodiment of the present application, the method includes:

步骤201,获取搜索引擎数据库对应的各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量。Step 201, acquiring the current data storage capacity of each computer device node corresponding to the search engine database, the current storage usage rate of each computer device node, and the number of new fragments.

其中,当前数据存储量表征计算机设备节点当前存储的数据总量,当前存储使用率表征计算机设备节点的存储空间占用率。Wherein, the current data storage capacity represents the total amount of data currently stored by the computer device node, and the current storage usage rate represents the storage space occupancy rate of the computer device node.

具体地,在一实施例中,可以获取各计算机设备节点的数据存储空间;针对任一计算机设备节点,根据该计算机设备节点的当前数据存储量和数据存储空间,确定该计算机设备节点的当前存储使用率。Specifically, in one embodiment, the data storage space of each computer device node can be obtained; for any computer device node, according to the current data storage capacity and data storage space of the computer device node, determine the current storage space of the computer device node usage rate.

具体地,可以将当前数据存储量和数据存储空间之间的比值,作为计算机设备节点的当前存储使用率。Specifically, the ratio between the current data storage amount and the data storage space may be used as the current storage usage rate of the computer device node.

步骤202,根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点。Step 202: Determine the target computer device node according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments.

具体地,可以根据各计算机设备节点的当前存储使用率,分析各计算机设备节点的存储空间使用情况,然后结合各计算机设备节点的当前数据存储量及待分配的新分片的数量,确定各新分片对应的目标计算机设备节点,以使新分片分配至相应的目标计算机设备节点后,搜索引擎数据库涉及的各个计算机设备节点能够保持数据存储平衡。Specifically, the storage space usage of each computer device node can be analyzed according to the current storage usage rate of each computer device node, and then combined with the current data storage capacity of each computer device node and the number of new fragments to be allocated, determine each new The target computer device node corresponding to the fragment, so that after the new fragment is allocated to the corresponding target computer device node, each computer device node involved in the search engine database can maintain a balance of data storage.

步骤203,将搜索引擎数据库的新分片,分配至目标计算机设备节点。Step 203, assigning new fragments of the search engine database to target computer device nodes.

具体地,在确定各新分片对应的目标计算机设备节点后,依次将各个新分片,分配至相应的目标计算机设备节点,具体分配方式本申请实施例不做限定。Specifically, after determining the target computer device node corresponding to each new segment, each new segment is sequentially allocated to the corresponding target computer device node, and the specific allocation method is not limited in this embodiment of the application.

在上述实施例的基础上,为了进一步确保选择的目标计算机设备节点的可靠性,作为一种可实施的方式,在一实施例中,根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点,包括:On the basis of the above embodiments, in order to further ensure the reliability of the selected target computer equipment node, as an implementable manner, in one embodiment, according to the current data storage capacity of each computer equipment node, each computer equipment node The current storage usage and the number of new shards determine the target computer device node, including:

步骤2021,根据各计算机设备节点的当前存储使用率,确定各计算机设备节点之间的当前平衡评价指标;Step 2021, according to the current storage usage rate of each computer equipment node, determine the current balance evaluation index between each computer equipment node;

步骤2022,根据各计算机设备节点之间的当前平衡评价指标,确定节点选择约束条件;Step 2022, according to the current balance evaluation index among the computer equipment nodes, determine node selection constraints;

步骤2023,按照节点选择约束条件,根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点。Step 2023: Determine the target computer device node according to the node selection constraints, according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments.

需要说明的是,平衡评价指标表征各计算机设备节点之间数据存储的差异程度。It should be noted that the balance evaluation index represents the degree of difference in data storage among computer equipment nodes.

具体地,可以以降低各计算机设备节点之间数据存储的差异程度为目标,确定节点选择约束条件。Specifically, node selection constraints may be determined with the goal of reducing the degree of difference in data storage among computer device nodes.

具体地,在一实施例中,为了进一步保证平衡评价指标的客观性,可以根据各计算机设备节点的当前存储使用率,确定各计算机设备节点之间的当前存储使用率标准差;将当前存储使用率标准差,作为各计算机设备节点之间的当前平衡评价指标。Specifically, in one embodiment, in order to further ensure the objectivity of the balance evaluation index, the standard deviation of the current storage usage rate between each computer device node can be determined according to the current storage usage rate of each computer device node; the current storage usage rate The standard deviation of the rate is used as the current balance evaluation index among the computer equipment nodes.

其中,当前存储使用率标准差的具体计算方式可以参考标准差计算公式,本申请实施例不做限定。Wherein, the specific calculation method of the standard deviation of the current storage usage rate may refer to the standard deviation calculation formula, which is not limited in this embodiment of the present application.

具体地,在一实施例中,可以根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定待选计算机设备节点;预测将搜索引擎数据库的新分片分配至待选计算机设备节点后,各计算机设备节点之间的最新平衡评价指标;当最新平衡评价指标小于当前平衡评价指标时,将待选计算机设备节点,确定为目标计算设备节点。Specifically, in one embodiment, computer device nodes to be selected can be determined according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments; After the new fragment is allocated to the computer equipment node to be selected, the latest balance evaluation index among the computer equipment nodes; when the latest balance evaluation index is smaller than the current balance evaluation index, the computer equipment node to be selected is determined as the target computing equipment node.

具体地,在已知新分片的数据存储量的情况下,针对任一目标计算机设备节点,可以根据新分片的数据存储量和该目标计算机设备节点的当前数据存储量,预测该目标计算机设备节点分配到新分片之后的存储使用率,然后重新计算各计算机设备节点之间的平衡评价指标,即预测将搜索引擎数据库的新分片分配至待选计算机设备节点后,各计算机设备节点之间的最新平衡评价指标。由于本申请实施例采用当前存储使用率标准差,作为各计算机设备节点之间的当前平衡评价指标,所以当最新平衡评价指标小于当前平衡评价指标时,表征降低了各计算机设备节点之间数据存储的差异程度,因此将待选计算机设备节点,确定为目标计算设备节点。Specifically, when the data storage capacity of the new fragment is known, for any target computer device node, the target computer can be predicted according to the data storage capacity of the new fragment and the current data storage capacity of the target computer device node. The storage usage rate after the device node is assigned to the new fragment, and then recalculate the balance evaluation index between each computer device node, that is, predict that after the new fragment of the search engine database is allocated to the computer device node to be selected, each computer device node The latest balance evaluation index between. Since the embodiment of the present application adopts the standard deviation of the current storage usage rate as the current balance evaluation index among the computer equipment nodes, when the latest balance evaluation index is smaller than the current balance evaluation index, the representation reduces the data storage between the computer equipment nodes. Therefore, the computer device node to be selected is determined as the target computing device node.

具体地,在一实施例中,在未知新分片的数据存储量的情况下,可以根据各计算机设备节点的当前数据存储量,预测新分片的数据存储量;根据各计算机设备节点的当前存储使用率、新分片的数量及新分片的数据存储量,确定待选计算机设备节点。Specifically, in one embodiment, when the data storage capacity of the new fragment is unknown, the data storage capacity of the new fragment can be predicted according to the current data storage capacity of each computer device node; according to the current data storage capacity of each computer device node The storage usage rate, the number of new shards and the data storage capacity of the new shards are used to determine the computer device nodes to be selected.

具体地,在一实施例中,可以根据各计算机设备节点的当前数据存储量及各计算机设备节点的当前分片数量,预测新分片的数据存储量。Specifically, in one embodiment, the data storage capacity of new shards may be predicted according to the current data storage capacity of each computer device node and the current number of shards of each computer device node.

具体地,在各计算机设备节点中每个分片的数据存储量可以直接获取的情况下,可以获取搜索引擎数据库包括的各个分片的数据存储量以及分片总量,然后根据各个分片的数据存储量,确定该搜索引擎数据库的数据存储总量,然后根据搜索引擎数据库的数据存储总量及分片总量,预测新分片的数据存储量。Specifically, in the case where the data storage capacity of each fragment in each computer device node can be obtained directly, the data storage capacity and the total amount of fragments included in the search engine database can be obtained, and then according to the data storage capacity of each fragment The amount of data storage is to determine the total amount of data storage in the search engine database, and then predict the data storage amount of the new shard according to the total amount of data storage in the search engine database and the total amount of shards.

具体地,可以根据各计算机设备节点的当前数据存储量及各计算机设备节点的当前分片数量,预测新分片的数据存储量,进而结合各计算机设备节点的当前存储使用率、新分片的数量及新分片的数据存储量,确定待选计算机设备节点。Specifically, according to the current data storage capacity of each computer device node and the current number of fragments of each computer device node, the data storage capacity of the new fragment can be predicted, and then combined with the current storage usage rate of each computer device node, the new fragmentation The number and the data storage capacity of the new fragment determine the computer device node to be selected.

具体地,在一实施例中,可以根据各计算机设备节点的当前数据存储量及各计算机设备节点的当前分片数量,确定搜索引擎数据库单个分片的平均数据存储量;将单个分片的平均数据存储量,作为新分片的数据存储量。Specifically, in one embodiment, the average data storage capacity of a single fragment of the search engine database can be determined according to the current data storage capacity of each computer device node and the current number of fragments of each computer device node; Data storage amount, as the data storage amount of the new shard.

类似地,在一实施例中,可以根据各计算机设备节点的当前数据存储量及各计算机设备节点的当前分片数量,确定搜索引擎数据库单个分片的几何平均数据存储量;将单个分片的几何平均数据存储量,作为新分片的数据存储量。Similarly, in one embodiment, the geometric mean data storage capacity of a single fragment of the search engine database can be determined according to the current data storage capacity of each computer device node and the current fragmentation quantity of each computer device node; The geometric mean data storage capacity, as the data storage capacity of the new shard.

其中,将几何平均数据存储量,作为新分片的数据存储量,可以减小极端值的影响,提高了新分片的数据存储量预测结果的准确性。Among them, using the geometric mean data storage capacity as the data storage capacity of the new shard can reduce the influence of extreme values and improve the accuracy of the prediction result of the data storage capacity of the new shard.

在上述实施例的基础上,为了避免出现新分片全部堆积在单个节点的极端情况,这种极端情况会导致搜索效率的下降和热点风险,作为一种可实施的方式,在一实施例中,根据各计算机设备节点的当前存储使用率、新分片的数量及新分片的数据存储量,确定待选计算机设备节点,包括:On the basis of the above embodiments, in order to avoid the extreme situation where all new fragments are accumulated on a single node, which will lead to a decrease in search efficiency and hotspot risks, as an implementable method, in an embodiment , according to the current storage usage rate of each computer device node, the number of new fragments and the data storage capacity of the new fragments, determine the computer device nodes to be selected, including:

步骤301,根据各计算机设备节点的当前存储使用率,构建存储使用率汇总表;Step 301, constructing a storage usage summary table according to the current storage usage of each computer device node;

步骤302,判断新分片的数量是否大于计算机设备节点总量;Step 302, judging whether the number of new fragments is greater than the total number of computer device nodes;

步骤303,当新分片的数量大于计算机设备节点总量时,根据各计算机设备节点的当前存储使用率,筛选第一数量的新分片对应的待选计算机设备节点;Step 303, when the number of new fragments is greater than the total number of computer equipment nodes, according to the current storage usage rate of each computer equipment node, screen the candidate computer equipment nodes corresponding to the first number of new fragments;

步骤304,根据各待选计算机设备节点的当前数据存储量及新分片的数据存储量,更新各待选计算机设备节点的存储使用率,除待选计算机设备节点以外的其他计算机设备节点的存储使用率不变,以对存储使用率汇总表进行迭代更新,并将更新后的存储使用率汇总表,作为新的存储使用率汇总表;Step 304, according to the current data storage capacity of each candidate computer device node and the data storage capacity of the new fragment, update the storage usage rate of each candidate computer device node, the storage of other computer device nodes except the candidate computer device node The utilization rate remains unchanged, so as to iteratively update the storage utilization summary table, and use the updated storage utilization summary table as the new storage utilization summary table;

步骤305,更新新分片的数量减少第一数量,以对新分片的数量进行迭代更新,并返回至判断新分片的数量是否大于计算机设备节点总量的步骤。In step 305, the number of updated new fragments is reduced by the first quantity, so as to iteratively update the number of new fragments, and return to the step of judging whether the number of new fragments is greater than the total number of computer device nodes.

具体地,在一实施例中,可以当新分片的数量大于计算机设备节点总量时,根据存储使用率汇总表表征的各计算机设备节点的当前存储使用率,对各计算设备节点进行升序排序;根据各计算设备节点的升序排序结果,从小到大筛选第一数量的待选计算机设备节点。Specifically, in one embodiment, when the number of new fragments is greater than the total number of computer device nodes, each computing device node can be sorted in ascending order according to the current storage usage rate of each computer device node represented by the storage usage summary table ; According to the ascending sorting results of each computing device node, filter the first number of computer device nodes to be selected from small to large.

具体地,首先根据各计算机设备节点的当前存储使用率,对搜索引擎数据库进行物理节点的存储使用率统计,以得到存储使用率汇总表,存储使用率汇总表至少包括各计算机设备节点的唯一标识及其对应的当前存储使用率,当新分片的数量大于计算机设备节点总量时,可以对新分片进行轮次迭代分配,每轮根据存储使用率汇总表确定第一数量的新分片对应的待选计算机设备节点,且优先选取存储使用率较小的计算机设备节点作为待选计算机设备节点,每完成一轮待选计算机设备节点的选择,便对存储使用率汇总表进行一次更新,以使存储使用率汇总表记录的各计算机设备节点的存储使用率为其最新存储使用率。其中,为了避免出现新分片全部堆积在单个节点的极端情况,本申请实施例采用分散分配的方式,即每轮确定的待选计算机设备节点与新分片一一对应。Specifically, firstly, according to the current storage usage rate of each computer device node, statistics are made on the storage usage rate of physical nodes on the search engine database to obtain a storage usage rate summary table, which at least includes the unique identifier of each computer device node And its corresponding current storage usage rate. When the number of new shards is greater than the total number of computer device nodes, the new shards can be iteratively allocated in rounds, and the first number of new shards can be determined in each round according to the storage usage summary table Corresponding to the computer equipment node to be selected, and the computer equipment node with a smaller storage utilization rate is preferentially selected as the computer equipment node to be selected, and each time a round of selection of the computer equipment node to be selected is completed, the storage utilization rate summary table is updated once, The storage usage rate of each computer device node recorded in the storage usage summary table is its latest storage usage rate. Among them, in order to avoid the extreme situation where all the new fragments are accumulated on a single node, the embodiment of the present application adopts a decentralized allocation method, that is, the candidate computer equipment nodes determined in each round correspond to the new fragments one by one.

具体地,在一实施例中,可以根据计算机设备节点总量及预设第一分配比例,确定第一数量。Specifically, in an embodiment, the first quantity may be determined according to the total number of computer equipment nodes and a preset first distribution ratio.

具体地,可以将计算机设备节点总量与预设第一分配比例的乘积计算结果,作为第一数量。Specifically, the calculation result of the product of the total number of computer device nodes and the preset first distribution ratio may be used as the first quantity.

示例性的,当计算机设备节点总量为5,预设第一分配比例为2/3时,确定第一数量为3。Exemplarily, when the total number of computer device nodes is 5 and the preset first distribution ratio is 2/3, the first number is determined to be 3.

具体地,在一实施例中,当新分片的数量不大于计算机设备节点总量时,判断新分片的数量是否等于1;当新分片的数量等于1时,将存储使用率汇总表表征的当前存储使用率最低的计算机设备节点,确定为该新分片对应的待选计算机设备节点。Specifically, in one embodiment, when the number of new fragments is not greater than the total number of computer device nodes, it is judged whether the number of new fragments is equal to 1; when the number of new fragments is equal to 1, the storage usage summary table The computer device node with the lowest current storage usage rate is determined as the candidate computer device node corresponding to the new slice.

具体地,当首轮分配时新分片的数量等于1,或经过若干轮迭代分配,新分片的数量减小至1,均可以直接将当前存储使用率最低的计算机设备节点,确定为该新分片对应的待选计算机设备节点。Specifically, when the number of new shards is equal to 1 in the first round of allocation, or after several rounds of iterative allocation, the number of new shards is reduced to 1, the computer device node with the lowest current storage usage can be directly determined as the The candidate computer device node corresponding to the new shard.

具体地,在一实施例中,当新分片的数量不大于计算机设备节点总量,且新分片的数量大于1时,根据存储使用率汇总表,筛选第二数量的新分片对应的待选计算机设备节点;根据各待选计算机设备节点的当前数据存储量及新分片的数据存储量,更新各待选计算机设备节点的存储使用率,除待选计算机设备节点以外的其他计算机设备节点的存储使用率不变,以对存储使用率汇总表进行迭代更新,并将更新后的存储使用率汇总表,作为新的存储使用率汇总表;更新新分片的数量减少第二数量,以对新分片的数量进行迭代更新,并返回至判断新分片的数量是否等于1的步骤。Specifically, in one embodiment, when the number of new fragments is not greater than the total number of computer device nodes, and the number of new fragments is greater than 1, according to the storage usage summary table, filter the second number of new fragments corresponding to Computer equipment nodes to be selected; according to the current data storage capacity of each computer equipment node to be selected and the data storage capacity of new fragments, update the storage utilization rate of each computer equipment node to be selected, and other computer equipment except the computer equipment node to be selected The storage usage rate of the node remains unchanged, so as to iteratively update the storage usage summary table, and use the updated storage usage summary table as a new storage usage summary table; the number of updated new fragments is reduced by a second amount, To iteratively update the number of new shards, and return to the step of judging whether the number of new shards is equal to 1.

具体地,在一实施例中,当新分片的数量不大于计算机设备节点总量,且新分片的数量大于1时,根据存储使用率汇总表表征的各计算机设备节点的当前存储使用率,对各计算设备节点进行升序排序;根据各计算设备节点的升序排序结果,从小到大筛选第二数量的待选计算机设备节点。Specifically, in one embodiment, when the number of new fragments is not greater than the total number of computer device nodes, and the number of new fragments is greater than 1, the current storage usage rate of each computer device node represented by the storage usage summary table , sorting each computing device node in ascending order; according to the ascending sorting result of each computing device node, screen the second number of computer device nodes to be selected from small to large.

具体地,当首轮分配时新分片的数量不大于计算机设备节点总量,且新分片的数量大于1,或经过若干轮分配,新分片的数量减小至不大于计算机设备节点总量,且新分片的数量大于1,则根据当前的存储使用率汇总表征的各计算机设备节点的当前存储使用率,筛选第二数量的新分片对应的待选计算机设备节点,且优先选取存储使用率较小的计算机设备节点作为待选计算机设备节点。其中,第二数量小于第一数量。Specifically, when the first round of allocation, the number of new fragments is not greater than the total number of computer equipment nodes, and the number of new fragments is greater than 1, or after several rounds of allocation, the number of new fragments is reduced to no more than the total number of computer equipment nodes amount, and the number of new fragments is greater than 1, then according to the current storage utilization rate of each computer device node represented by the current storage utilization rate summary, the candidate computer device nodes corresponding to the second number of new fragments are screened, and the priority is selected A computer device node with a smaller storage usage rate is used as a candidate computer device node. Wherein, the second quantity is smaller than the first quantity.

具体地,在一实施例中,可以根据计算机设备节点总量及预设第二分配比例,确定第二数量。Specifically, in an embodiment, the second quantity may be determined according to the total number of computer equipment nodes and a preset second allocation ratio.

具体地,可以将计算机设备节点总量与预设第二分配比例的乘积计算结果,作为第二数量。Specifically, the calculation result of the product of the total number of computer device nodes and the preset second distribution ratio may be used as the second quantity.

其中,预设第二分配比例小于预设第一分配比例。当计算机设备节点总量为5,预设第一分配比例为1/3时,确定第一数量为2。Wherein, the preset second distribution ratio is smaller than the preset first distribution ratio. When the total number of computer equipment nodes is 5 and the preset first distribution ratio is 1/3, the first number is determined to be 2.

示例性的,如图3所示,为本申请实施例提供的一种示例性的搜索引擎数据库管理方法的流程示意图。在实际应用中,当更新后的新分片的数量为负数时,可以直接对其进行置零处理,当新分片的数量等于0时,该流程结束。Exemplarily, as shown in FIG. 3 , it is a schematic flowchart of an exemplary search engine database management method provided by the embodiment of the present application. In practical applications, when the number of updated new shards is a negative number, it can be directly set to zero, and when the number of new shards is equal to 0, the process ends.

具体地,如图4所示,为本申请实施例提供的另一种示例性的搜索引擎数据库管理方法的流程示意图,本申请实施例提供的搜索引擎数据库管理方法涉及预期标准差计算模块,节点选择器和迭代分配器等算法的运行,其中,预期标准差即为平衡评价指标,单分片大小即为新分片的数据存储量,存储率即为计算机设备节点的存储使用率,节点选择器用于基于上述实施例提供的方法,选择目标计算机设备节点,迭代分配器用于基于上述实施例提供的方法确定各新分片对应的待选目标计算机设备节点,更新节点范围是指待选计算机设备节点的数量(第一数量或第二数量)。Specifically, as shown in FIG. 4 , it is a schematic flowchart of another exemplary search engine database management method provided by the embodiment of the present application. The search engine database management method provided by the embodiment of the present application involves the expected standard deviation calculation module, node The operation of algorithms such as selector and iterative allocator, among which, the expected standard deviation is the balance evaluation index, the size of a single fragment is the data storage capacity of the new fragment, the storage rate is the storage utilization rate of the computer equipment node, and the node selection The device is used to select the target computer device node based on the method provided by the above-mentioned embodiment, and the iterative allocator is used to determine the target computer device node to be selected corresponding to each new fragment based on the method provided by the above-mentioned embodiment, and the update node range refers to the computer device to be selected The number of nodes (first or second).

为便于本领域技术人员更清楚地了解本申请实施例提供的搜索引擎数据库管理方法,本申请实施例提供示例如下:In order to facilitate those skilled in the art to understand more clearly the search engine database management method provided by the embodiment of the present application, the example provided in the embodiment of the present application is as follows:

示例一example one

设一个包含5存储节点(计算机设备节点)的ElasticSearch集群,现5个存储节点(编号为D1,D2,D3,D4,D5)的存储使用率分别为10%、15%、20%、17%和3%;存储大小(当前数据存储量)分别是10GB、15GB、20GB、17GB和3GB;那么当前的存储使用率标准差(当前平衡评价指标)是5.20%。Set up an ElasticSearch cluster containing 5 storage nodes (computer equipment nodes), and the storage usage rates of the 5 storage nodes (numbered D1, D2, D3, D4, D5) are 10%, 15%, 20%, and 17% respectively and 3%; the storage size (current data storage capacity) is 10GB, 15GB, 20GB, 17GB, and 3GB respectively; then the standard deviation of the current storage usage rate (the current balance evaluation index) is 5.20%.

若计算得到的分片几何均值(新分片的数据存储量)为1.85GB,需要新增的新分片数量是3个,则进行如下分配:分片数3小于节点数5,那么选择存储使用率最低的1/3节点,5乘以1/3近似取整为2个,即D5和D1,选择这2个节点进行分片分配。If the calculated geometric mean of the shards (the data storage capacity of the new shard) is 1.85GB, and the number of new shards to be added is 3, the allocation is as follows: the number of shards 3 is less than the number of nodes 5, then choose to store For the 1/3 nodes with the lowest usage rate, 5 times 1/3 is approximately rounded to 2, namely D5 and D1, and these 2 nodes are selected for shard allocation.

迭代分配计算。根据如下表格进行预期标准差(最新平衡评价指标)的计算:Iterative allocation calculations. Calculate the expected standard deviation (the latest balance evaluation indicator) according to the following table:

第一轮分配:First round allocation:

Figure SMS_1
Figure SMS_1

其中,预期存储使用率表示更新后的存储使用率。Wherein, the expected storage usage represents the updated storage usage.

如上表,3个分片第一轮分配完毕后的预期存储使用率标准差为4.31%,小于当前的5.20%,数据存储平衡性变得更好,则可以接受该轮分配,此时剩余一个分片。As shown in the above table, the standard deviation of the expected storage usage rate after the first round of allocation of the three shards is 4.31%, which is less than the current 5.20%, and the balance of data storage becomes better, so this round of allocation can be accepted. At this time, there is one remaining Fragmentation.

继续第二轮分配,则直接选择D5节点进行分配。第二轮的预期分配结果计算如下:To continue the second round of allocation, directly select the D5 node for allocation. The expected distribution outcome for the second round is calculated as follows:

Figure SMS_2
Figure SMS_2

至此,3个分片分配完毕,最终预期使用率标准差时3.87%,低于原来的5.20%的标准差,可以接受该分配,最终分片分配情况是:D5新增2个分片,D1新增1个分片。So far, the three shards have been allocated, and the standard deviation of the final expected usage rate is 3.87%, which is lower than the original standard deviation of 5.20%. The allocation can be accepted. The final shard allocation is: D5 adds 2 shards, D1 Added 1 shard.

示例二Example two

设一个包含5存储节点的ElasticSearch集群,现5个存储节点(编号为D1,D2,D3,D4,D5)的存储使用率分别为10%、15%、20%、17%和3%;存储大小分别是10GB、15GB、20GB、17GB和3GB;那么当前的存储使用率标准差是5.20%;Assume an ElasticSearch cluster containing 5 storage nodes, and the storage usage rates of the 5 storage nodes (numbered D1, D2, D3, D4, D5) are 10%, 15%, 20%, 17% and 3% respectively; The sizes are 10GB, 15GB, 20GB, 17GB, and 3GB; then the standard deviation of the current storage usage is 5.20%;

要新增12个分片,其单个分片的几何平均大小是3.1GB。To add 12 shards, the geometric average size of a single shard is 3.1GB.

首先进行第一轮迭代分配:First, the first round of iterative allocation is performed:

分片数12大于节点数5,应先选择存储使用率最低的2/3节点范围,5乘以2/3近似取整为3个,即D5,D1和D2,预期分配计算如下:If the number of shards 12 is greater than the number of nodes 5, the range of 2/3 nodes with the lowest storage usage should be selected first, and 5 multiplied by 2/3 is approximately rounded to 3, that is, D5, D1, and D2. The expected allocation is calculated as follows:

Figure SMS_3
Figure SMS_3

预期标准差降为4.06%,还剩余分片9个,大于节点数5,继续选择,此时选择D5,D1和D3,进行第二轮分配:The expected standard deviation is reduced to 4.06%, and there are still 9 shards remaining, which are greater than the number of nodes 5. Continue to choose, and at this time choose D5, D1 and D3 for the second round of allocation:

Figure SMS_4
Figure SMS_4

预期标准差降为3.22%,还剩余分片6个,大于节点数5,继续选择,此时选择D5,D1和D2,进行第三轮分配:The expected standard deviation is reduced to 3.22%, and there are still 6 shards remaining, which are greater than the number of nodes 5. Continue to choose, and at this time choose D5, D1 and D2 for the third round of allocation:

Figure SMS_5
Figure SMS_5

预期标准差降为2.51%,还剩余分片3个,小于节点数5,应选择2个节点,此时选择D5和D1,进行第四轮分配:The expected standard deviation is reduced to 2.51%, and there are still 3 shards left, which are less than the number of nodes 5, so 2 nodes should be selected. At this time, D5 and D1 are selected for the fourth round of allocation:

Figure SMS_6
Figure SMS_6

预期标准差降为1.77%,仅剩余分片1个,直接选择存储使用率最低的D5,进行最后一次分配:The expected standard deviation is reduced to 1.77%, and there is only one shard left. Directly select D5 with the lowest storage usage rate for the last allocation:

Figure SMS_7
Figure SMS_7

至此12个分片全部分配完成,最终预期存储使用率标准差降低为1.09%,远低于初始值5.20%,预期各节点不平衡程度降低79%。So far, all 12 shards have been allocated, and the standard deviation of the final expected storage usage rate is reduced to 1.09%, which is far lower than the initial value of 5.20%. The expected imbalance of each node is reduced by 79%.

本申请实施例提供的搜索引擎数据库管理方法,通过获取搜索引擎数据库对应的各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量;根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点;将搜索引擎数据库的新分片,分配至目标计算机设备节点。上述方案提供的方法,通过综合考虑各计算机设备节点当前数据存储量、当前存储使用率以及新分片的数量,选择新分片对应的目标计算机设备节点,确保新分片分配至对应的目标计算机设备节点后,各个计算机设备节点的数据存储能够趋于平衡,提高了资源利用率,也为提高ES数据库的搜索效率奠定了基础。并且,兼顾节点分散程度进行二次选择做迭代计算,以实现新分片的分散分配,避免出现新分片全部堆积在单个节点的极端情况,进一步提高了计算机设备节点的数据存储平衡程度。The search engine database management method provided in the embodiment of the present application obtains the current data storage capacity of each computer device node corresponding to the search engine database, the current storage usage rate of each computer device node, and the number of new fragments; according to each computer device node The current data storage capacity, the current storage usage rate of each computer device node and the number of new fragments determine the target computer device node; the new fragments of the search engine database are allocated to the target computer device node. The method provided by the above solution, by comprehensively considering the current data storage capacity of each computer device node, the current storage usage rate and the number of new fragments, selects the target computer device node corresponding to the new fragment to ensure that the new fragment is allocated to the corresponding target computer After the device node, the data storage of each computer device node can be balanced, which improves resource utilization and also lays the foundation for improving the search efficiency of the ES database. In addition, taking into account the degree of node dispersion, the secondary selection is used for iterative calculation to realize the decentralized allocation of new shards, avoiding the extreme situation where all new shards are piled up on a single node, and further improving the data storage balance of computer equipment nodes.

本申请实施例提供了一种搜索引擎数据库管理装置,用于执行上述实施例提供的搜索引擎数据库管理方法。An embodiment of the present application provides a search engine database management device, which is used to execute the search engine database management method provided in the foregoing embodiments.

如图5所示,为本申请实施例提供的搜索引擎数据库管理装置的结构示意图。该搜索引擎数据库管理装置50包括:获取模块501、确定模块502和管理模块503。As shown in FIG. 5 , it is a schematic structural diagram of a search engine database management device provided in the embodiment of the present application. The search engine database management device 50 includes: an acquisition module 501 , a determination module 502 and a management module 503 .

其中,获取模块,用于获取搜索引擎数据库对应的各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量;确定模块,用于根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点;管理模块,用于将搜索引擎数据库的新分片,分配至目标计算机设备节点。Wherein, the obtaining module is used to obtain the current data storage capacity of each computer device node corresponding to the search engine database, the current storage usage rate of each computer device node and the number of new fragments; The current data storage capacity, the current storage usage rate of each computer device node and the number of new fragments determine the target computer device node; the management module is used to allocate the new fragments of the search engine database to the target computer device node.

具体地,在一实施例中,确定模块,具体用于:Specifically, in an embodiment, the determination module is specifically used to:

根据各计算机设备节点的当前存储使用率,确定各计算机设备节点之间的当前平衡评价指标;According to the current storage usage rate of each computer equipment node, determine the current balance evaluation index between each computer equipment node;

根据各计算机设备节点之间的当前平衡评价指标,确定节点选择约束条件;According to the current balance evaluation index between each computer equipment node, determine the node selection constraints;

按照节点选择约束条件,根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点。According to the node selection constraints, the target computer device node is determined according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments.

具体地,在一实施例中,确定模块,具体用于:Specifically, in an embodiment, the determination module is specifically used to:

根据各计算机设备节点的当前存储使用率,确定各计算机设备节点之间的当前存储使用率标准差;According to the current storage usage rate of each computer device node, determine the current storage usage rate standard deviation between each computer device node;

将当前存储使用率标准差,作为各计算机设备节点之间的当前平衡评价指标。The standard deviation of the current storage usage rate is used as the current balance evaluation index among the computer equipment nodes.

具体地,在一实施例中,确定模块,具体用于:Specifically, in an embodiment, the determination module is specifically used to:

根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定待选计算机设备节点;According to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node and the number of new fragments, determine the computer device node to be selected;

预测将搜索引擎数据库的新分片分配至待选计算机设备节点后,各计算机设备节点之间的最新平衡评价指标;Predict the latest balance evaluation index between each computer device node after the new fragmentation of the search engine database is allocated to the computer device node to be selected;

当最新平衡评价指标小于当前平衡评价指标时,将待选计算机设备节点,确定为目标计算设备节点。When the latest balance evaluation index is smaller than the current balance evaluation index, the computer device node to be selected is determined as the target computing device node.

具体地,在一实施例中,确定模块,具体用于:Specifically, in an embodiment, the determination module is specifically used to:

根据各计算机设备节点的当前数据存储量,预测新分片的数据存储量;According to the current data storage capacity of each computer device node, predict the data storage capacity of the new fragment;

根据各计算机设备节点的当前存储使用率、新分片的数量及新分片的数据存储量,确定待选计算机设备节点。According to the current storage usage rate of each computer device node, the number of new fragments and the data storage capacity of the new fragments, the computer device node to be selected is determined.

具体地,在一实施例中,确定模块,具体用于:Specifically, in an embodiment, the determination module is specifically used to:

根据各计算机设备节点的当前数据存储量及各计算机设备节点的当前分片数量,预测新分片的数据存储量。According to the current data storage capacity of each computer device node and the current fragmentation quantity of each computer device node, the data storage capacity of the new fragment is predicted.

具体地,在一实施例中,确定模块,具体用于:Specifically, in an embodiment, the determination module is specifically used to:

根据各计算机设备节点的当前数据存储量及各计算机设备节点的当前分片数量,确定搜索引擎数据库单个分片的平均数据存储量;According to the current data storage capacity of each computer device node and the current fragmentation quantity of each computer device node, determine the average data storage capacity of a single fragment of the search engine database;

将单个分片的平均数据存储量,作为新分片的数据存储量。Use the average data storage capacity of a single shard as the data storage capacity of the new shard.

具体地,在一实施例中,确定模块,具体用于:Specifically, in an embodiment, the determination module is specifically used to:

根据各计算机设备节点的当前数据存储量及各计算机设备节点的当前分片数量,确定搜索引擎数据库单个分片的几何平均数据存储量;According to the current data storage capacity of each computer device node and the current fragmentation quantity of each computer device node, determine the geometric mean data storage capacity of a single fragment of the search engine database;

将单个分片的几何平均数据存储量,作为新分片的数据存储量。The geometric mean data storage capacity of a single shard is used as the data storage capacity of the new shard.

具体地,在一实施例中,确定模块,具体用于:Specifically, in an embodiment, the determination module is specifically used to:

根据各计算机设备节点的当前存储使用率,构建存储使用率汇总表;According to the current storage utilization rate of each computer equipment node, construct a storage utilization rate summary table;

判断新分片的数量是否大于计算机设备节点总量;Determine whether the number of new fragments is greater than the total number of computer equipment nodes;

当新分片的数量大于计算机设备节点总量时,根据存储使用率汇总表,筛选第一数量的新分片对应的待选计算机设备节点;When the number of new fragments is greater than the total amount of computer equipment nodes, according to the storage usage summary table, screen the candidate computer equipment nodes corresponding to the first number of new fragments;

根据各待选计算机设备节点的当前数据存储量及新分片的数据存储量,更新各待选计算机设备节点的存储使用率,除待选计算机设备节点以外的其他计算机设备节点的存储使用率不变,以对存储使用率汇总表进行迭代更新,并将更新后的存储使用率汇总表,作为新的存储使用率汇总表;According to the current data storage capacity of each computer device node to be selected and the data storage capacity of new fragments, update the storage usage rate of each computer device node to be selected, and the storage usage rate of other computer device nodes except the computer device node to be selected is not Change to iteratively update the storage usage summary table, and use the updated storage usage summary table as the new storage usage summary table;

更新新分片的数量减少第一数量,以对新分片的数量进行迭代更新,并返回至判断新分片的数量是否大于计算机设备节点总量的步骤。Updating the number of new shards by reducing the first amount, so as to iteratively update the number of new shards, and return to the step of judging whether the number of new shards is greater than the total number of computer device nodes.

具体地,在一实施例中,确定模块,具体用于:Specifically, in an embodiment, the determination module is specifically used to:

当新分片的数量大于计算机设备节点总量时,根据存储使用率汇总表表征的各计算机设备节点的当前存储使用率,对各计算设备节点进行升序排序;When the number of new fragments is greater than the total number of computer device nodes, sort each computing device node in ascending order according to the current storage usage rate of each computer device node represented by the storage usage summary table;

根据各计算设备节点的升序排序结果,从小到大筛选第一数量的待选计算机设备节点。According to the ascending sorting results of each computing device node, the first number of computer device nodes to be selected is screened from small to large.

具体地,在一实施例中,确定模块,还用于:Specifically, in an embodiment, the determining module is also used for:

根据计算机设备节点总量及预设第一分配比例,确定第一数量。The first quantity is determined according to the total number of computer equipment nodes and a preset first distribution ratio.

具体地,在一实施例中,确定模块,还用于:Specifically, in an embodiment, the determining module is also used for:

当新分片的数量不大于计算机设备节点总量时,判断新分片的数量是否等于1;When the number of new fragments is not greater than the total number of computer equipment nodes, determine whether the number of new fragments is equal to 1;

当新分片的数量等于1时,将存储使用率汇总表表征的当前存储使用率最低的计算机设备节点,确定为该新分片对应的待选计算机设备节点。When the number of new fragments is equal to 1, the computer device node with the lowest current storage utilization represented by the storage utilization summary table is determined as the candidate computer device node corresponding to the new fragment.

具体地,在一实施例中,确定模块,还用于:Specifically, in an embodiment, the determining module is also used for:

当新分片的数量不大于计算机设备节点总量,且新分片的数量大于1时,根据存储使用率汇总表,筛选第二数量的新分片对应的待选计算机设备节点;When the number of new fragments is not greater than the total number of computer equipment nodes, and the number of new fragments is greater than 1, according to the storage usage summary table, screen the candidate computer equipment nodes corresponding to the second number of new fragments;

根据各待选计算机设备节点的当前数据存储量及新分片的数据存储量,更新各待选计算机设备节点的存储使用率,除待选计算机设备节点以外的其他计算机设备节点的存储使用率不变,以对存储使用率汇总表进行迭代更新,并将更新后的存储使用率汇总表,作为新的存储使用率汇总表;According to the current data storage capacity of each computer device node to be selected and the data storage capacity of new fragments, update the storage usage rate of each computer device node to be selected, and the storage usage rate of other computer device nodes except the computer device node to be selected is not Change to iteratively update the storage usage summary table, and use the updated storage usage summary table as the new storage usage summary table;

更新新分片的数量减少第二数量,以对新分片的数量进行迭代更新,并返回至判断新分片的数量是否等于1的步骤。Updating the number of new shards by reducing the second number, so as to iteratively update the number of new shards, and return to the step of judging whether the number of new shards is equal to 1.

具体地,在一实施例中,确定模块,具体用于:Specifically, in an embodiment, the determination module is specifically used to:

当新分片的数量不大于计算机设备节点总量,且新分片的数量大于1时,根据存储使用率汇总表表征的各计算机设备节点的当前存储使用率,对各计算设备节点进行升序排序;When the number of new fragments is not greater than the total number of computer equipment nodes, and the number of new fragments is greater than 1, sort each computing equipment node in ascending order according to the current storage utilization rate of each computer equipment node represented by the storage utilization summary table ;

根据各计算设备节点的升序排序结果,从小到大筛选第二数量的待选计算机设备节点。According to the ascending sorting results of each computing device node, the second number of computer device nodes to be selected is screened from small to large.

具体地,在一实施例中,确定模块,还用于:Specifically, in an embodiment, the determining module is also used for:

根据计算机设备节点总量及预设第二分配比例,确定第二数量。The second quantity is determined according to the total number of computer equipment nodes and the preset second distribution ratio.

具体地,在一实施例中,获取模块,具体用于:Specifically, in an embodiment, the acquisition module is specifically used to:

获取各计算机设备节点的数据存储空间;Obtain the data storage space of each computer device node;

针对任一计算机设备节点,根据该计算机设备节点的当前数据存储量和数据存储空间,确定该计算机设备节点的当前存储使用率。For any computer device node, the current storage usage rate of the computer device node is determined according to the current data storage capacity and data storage space of the computer device node.

关于本实施例中的搜索引擎数据库管理装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the search engine database management device in this embodiment, the specific manner in which each module executes operations has been described in detail in the embodiment of the method, and will not be described in detail here.

本申请实施例提供的搜索引擎数据库管理装置,用于执行上述实施例提供的搜索引擎数据库管理方法,其实现方式与原理相同,不再赘述。The search engine database management device provided in the embodiment of the present application is used to implement the search engine database management method provided in the above embodiment, and its implementation method is the same as the principle, so it will not be repeated here.

本申请实施例提供了一种搜索引擎数据库,用于执行上述实施例提供的搜索引擎数据库管理方法。An embodiment of the present application provides a search engine database, which is used to implement the search engine database management method provided in the foregoing embodiments.

如图6所示,为本申请实施例提供的搜索引擎数据库的结构示意图。该搜索引擎数据库60包括:若干个计算机设备节点601。As shown in FIG. 6 , it is a schematic structural diagram of the search engine database provided by the embodiment of the present application. The search engine database 60 includes: several computer device nodes 601 .

任一计算机设备节点执行如上实施例提供的搜索引擎数据库管理方法。Any computer device node executes the search engine database management method provided in the above embodiments.

本申请实施例提供的一种搜索引擎数据库,用于执行上述实施例提供的搜索引擎数据库管理方法,其实现方式与原理相同,不再赘述。A search engine database provided in the embodiment of the present application is used to implement the search engine database management method provided in the above embodiment, and its implementation method is the same as the principle, so it will not be repeated here.

本申请实施例提供了一种电子设备,用于执行上述实施例提供的搜索引擎数据库管理方法。An embodiment of the present application provides an electronic device configured to execute the search engine database management method provided in the foregoing embodiments.

如图7所示,为本申请实施例提供的电子设备的结构示意图。该电子设备70包括:至少一个处理器71和存储器72。As shown in FIG. 7 , it is a schematic structural diagram of an electronic device provided in the embodiment of the present application. The electronic device 70 includes: at least one processor 71 and a memory 72 .

存储器存储计算机执行指令;至少一个处理器执行存储器存储的计算机执行指令,使得至少一个处理器执行如上实施例提供的搜索引擎数据库管理方法。The memory stores computer-executable instructions; at least one processor executes the computer-executable instructions stored in the memory, so that at least one processor executes the search engine database management method provided in the above embodiments.

本申请实施例提供的一种电子设备,用于执行上述实施例提供的搜索引擎数据库管理方法,其实现方式与原理相同,不再赘述。An electronic device provided in an embodiment of the present application is used to execute the method for managing a search engine database provided in the above embodiment, and the implementation method is the same as the principle, so details are not repeated here.

本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当处理器执行计算机执行指令时,实现如上任一实施例提供的搜索引擎数据库管理方法。An embodiment of the present application provides a computer-readable storage medium, and computer-executable instructions are stored in the computer-readable storage medium. When the processor executes the computer-executable instructions, the search engine database management method provided in any one of the above embodiments is implemented.

本申请实施例的包含计算机可执行指令的存储介质,可用于存储前述实施例中提供的搜索引擎数据库管理方法的计算机执行指令,其实现方式与原理相同,不再赘述。The storage medium containing computer-executable instructions in the embodiment of the present application can be used to store the computer-executable instructions of the search engine database management method provided in the foregoing embodiments.

在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。A unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software functional units.

上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated units implemented in the form of software functional units may be stored in a computer-readable storage medium. The above-mentioned software functional units are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or a processor (processor) to execute the methods described in the various embodiments of the present application. partial steps. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes. .

本领域技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above-mentioned functional modules is used as an example for illustration. The internal structure of the system is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiments, and details are not repeated here.

最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, rather than limiting them; although the application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present application. scope.

Claims (16)

1.一种搜索引擎数据库管理方法,其特征在于,包括:1. A search engine database management method is characterized in that, comprising: 获取搜索引擎数据库对应的各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量;Obtain the current data storage capacity of each computer device node corresponding to the search engine database, the current storage usage rate of each computer device node, and the number of new fragments; 根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点;Determine the target computer device node according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments; 将所述搜索引擎数据库的新分片,分配至所述目标计算机设备节点;Allocating the new fragment of the search engine database to the target computer device node; 所述根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点,包括:According to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node and the number of new fragments, determining the target computer device node includes: 根据各所述计算机设备节点的当前存储使用率,确定各所述计算机设备节点之间的当前平衡评价指标;According to the current storage usage rate of each of the computer equipment nodes, determine the current balance evaluation index between each of the computer equipment nodes; 根据各所述计算机设备节点之间的当前平衡评价指标,确定节点选择约束条件;Determine node selection constraints according to the current balance evaluation indicators among the computer equipment nodes; 按照所述节点选择约束条件,根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点;According to the node selection constraints, determine the target computer device node according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments; 所述按照所述节点选择约束条件,根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点,包括:According to the node selection constraints, determining the target computer device node according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments, including: 根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定待选计算机设备节点;According to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node and the number of new fragments, determine the computer device node to be selected; 预测将所述搜索引擎数据库的新分片分配至所述待选计算机设备节点后,各所述计算机设备节点之间的最新平衡评价指标;Predicting the latest balance evaluation index between each computer device node after the new fragment of the search engine database is allocated to the computer device node to be selected; 当所述最新平衡评价指标小于所述当前平衡评价指标时,将所述待选计算机设备节点,确定为所述目标计算机设备节点;When the latest balance evaluation index is smaller than the current balance evaluation index, determine the computer device node to be selected as the target computer device node; 所述根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定待选计算机设备节点,包括:According to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node and the number of new fragments, determining the computer device node to be selected includes: 根据各所述计算机设备节点的当前数据存储量,预测所述新分片的数据存储量;Predicting the data storage capacity of the new fragment according to the current data storage capacity of each of the computer device nodes; 根据各所述计算机设备节点的当前存储使用率、新分片的数量及所述新分片的数据存储量,确定待选计算机设备节点;Determine the computer equipment node to be selected according to the current storage usage rate of each of the computer equipment nodes, the number of new fragments and the data storage capacity of the new fragments; 所述根据各所述计算机设备节点的当前存储使用率、新分片的数量及所述新分片的数据存储量,确定待选计算机设备节点,包括:According to the current storage usage rate of each of the computer equipment nodes, the number of new fragments and the data storage capacity of the new fragments, determining the computer equipment node to be selected includes: 根据各所述计算机设备节点的当前存储使用率,构建存储使用率汇总表;According to the current storage utilization rate of each described computer device node, construct a storage utilization rate summary table; 判断所述新分片的数量是否大于所述计算机设备节点总量;Judging whether the number of the new fragments is greater than the total number of computer device nodes; 当所述新分片的数量大于所述计算机设备节点总量时,根据所述存储使用率汇总表,筛选第一数量的新分片对应的待选计算机设备节点;When the number of the new fragments is greater than the total number of computer equipment nodes, according to the storage usage summary table, screen candidate computer equipment nodes corresponding to the first number of new fragments; 根据各所述待选计算机设备节点的当前数据存储量及所述新分片的数据存储量,更新各所述待选计算机设备节点的存储使用率,除所述待选计算机设备节点以外的其他计算机设备节点的存储使用率不变,以对所述存储使用率汇总表进行迭代更新,并将更新后的存储使用率汇总表,作为新的存储使用率汇总表;According to the current data storage capacity of each computer device node to be selected and the data storage capacity of the new fragment, update the storage usage rate of each computer device node to be selected, and other than the computer device node to be selected The storage utilization rate of the computer device node remains unchanged, so as to iteratively update the storage utilization rate summary table, and use the updated storage utilization rate summary table as a new storage utilization rate summary table; 更新所述新分片的数量减少第一数量,以对所述新分片的数量进行迭代更新,并返回至所述判断所述新分片的数量是否大于所述计算机设备节点总量的步骤。Updating the number of new shards to reduce the first amount, so as to iteratively update the number of new shards, and return to the step of judging whether the number of new shards is greater than the total number of computer device nodes . 2.根据权利要求1所述的方法,其特征在于,所述根据各所述计算机设备节点的当前存储使用率,确定各所述计算机设备节点之间的当前平衡评价指标,包括:2. The method according to claim 1, wherein said determining the current balance evaluation index between each said computer device node according to the current storage utilization rate of each said computer device node comprises: 根据各所述计算机设备节点的当前存储使用率,确定各所述计算机设备节点之间的当前存储使用率标准差;According to the current storage usage rate of each of the computer device nodes, determine the current storage usage rate standard deviation between each of the computer device nodes; 将所述当前存储使用率标准差,作为各所述计算机设备节点之间的当前平衡评价指标。The standard deviation of the current storage usage rate is used as the current balance evaluation index among the computer device nodes. 3.根据权利要求1所述的方法,其特征在于,所述根据各所述计算机设备节点的当前数据存储量,预测所述新分片的数据存储量,包括:3. The method according to claim 1, wherein the predicting the data storage capacity of the new fragment according to the current data storage capacity of each of the computer device nodes comprises: 根据各所述计算机设备节点的当前数据存储量及各所述计算机设备节点的当前分片数量,预测所述新分片的数据存储量。According to the current data storage capacity of each of the computer device nodes and the current number of fragments of each of the computer device nodes, the data storage capacity of the new fragment is predicted. 4.根据权利要求3所述的方法,其特征在于,所述根据各所述计算机设备节点的当前数据存储量及各所述计算机设备节点的当前分片数量,预测所述新分片的数据存储量,包括:4. The method according to claim 3, characterized in that, predicting the data of the new fragmentation according to the current data storage capacity of each of the computer equipment nodes and the current fragmentation quantity of each of the computer equipment nodes Storage capacity, including: 根据各所述计算机设备节点的当前数据存储量及各所述计算机设备节点的当前分片数量,确定所述搜索引擎数据库单个分片的平均数据存储量;According to the current data storage capacity of each described computer device node and the current fragmentation quantity of each described computer device node, determine the average data storage capacity of a single fragment of the search engine database; 将所述单个分片的平均数据存储量,作为所述新分片的数据存储量。The average data storage capacity of the single shard is used as the data storage capacity of the new shard. 5.根据权利要求3所述的方法,其特征在于,所述根据各所述计算机设备节点的当前数据存储量及各所述计算机设备节点的当前分片数量,预测所述新分片的数据存储量,包括:5. The method according to claim 3, characterized in that, predicting the data of the new fragmentation according to the current data storage capacity of each of the computer equipment nodes and the current fragmentation quantity of each of the computer equipment nodes Storage capacity, including: 根据各所述计算机设备节点的当前数据存储量及各所述计算机设备节点的当前分片数量,确定所述搜索引擎数据库单个分片的几何平均数据存储量;According to the current data storage capacity of each described computer device node and the current fragmentation quantity of each described computer device node, determine the geometric mean data storage capacity of a single fragment of the search engine database; 将所述单个分片的几何平均数据存储量,作为所述新分片的数据存储量。The geometric mean data storage capacity of the single shard is used as the data storage capacity of the new shard. 6.根据权利要求1所述的方法,其特征在于,所述当所述新分片的数量大于所述计算机设备节点总量时,根据所述存储使用率汇总表,筛选第一数量的新分片对应的待选计算机设备节点,包括:6. The method according to claim 1, wherein when the number of the new fragments is greater than the total number of nodes of the computer device, according to the storage usage summary table, the first number of new fragments is screened. The computer equipment nodes to be selected corresponding to the fragments, including: 当所述新分片的数量大于所述计算机设备节点总量时,根据所述存储使用率汇总表表征的各所述计算机设备节点的当前存储使用率,对各所述计算机设备节点进行升序排序;When the number of new fragments is greater than the total number of computer device nodes, sort each computer device node in ascending order according to the current storage usage rate of each computer device node represented by the storage usage summary table ; 根据各所述计算机设备节点的升序排序结果,从小到大筛选第一数量的待选计算机设备节点。According to the ascending sorting results of each of the computer device nodes, the first number of computer device nodes to be selected is screened from small to large. 7.根据权利要求1所述的方法,其特征在于,所述方法还包括:7. The method according to claim 1, further comprising: 根据所述计算机设备节点总量及预设第一分配比例,确定所述第一数量。The first quantity is determined according to the total number of computer equipment nodes and a preset first distribution ratio. 8.根据权利要求1所述的方法,其特征在于,所述方法还包括:8. The method according to claim 1, further comprising: 当所述新分片的数量不大于所述计算机设备节点总量时,判断所述新分片的数量是否等于1;When the number of the new fragments is not greater than the total number of computer device nodes, it is judged whether the number of the new fragments is equal to 1; 当所述新分片的数量等于1时,将所述存储使用率汇总表表征的当前存储使用率最低的计算机设备节点,确定为该新分片对应的待选计算机设备节点。When the number of new fragments is equal to 1, the computer device node with the lowest current storage utilization represented by the storage utilization summary table is determined as the candidate computer device node corresponding to the new fragment. 9.根据权利要求8所述的方法,其特征在于,所述方法还包括:9. The method of claim 8, further comprising: 当所述新分片的数量不大于所述计算机设备节点总量,且所述新分片的数量大于1时,根据所述存储使用率汇总表,筛选第二数量的新分片对应的待选计算机设备节点;When the number of the new fragments is not greater than the total number of computer equipment nodes, and the number of the new fragments is greater than 1, according to the storage usage summary table, screen the second number of new fragments corresponding to the pending Select a computer device node; 根据各所述待选计算机设备节点的当前数据存储量及所述新分片的数据存储量,更新各所述待选计算机设备节点的存储使用率,除所述待选计算机设备节点以外的其他计算机设备节点的存储使用率不变,以对所述存储使用率汇总表进行迭代更新,并将更新后的存储使用率汇总表,作为新的存储使用率汇总表;According to the current data storage capacity of each computer device node to be selected and the data storage capacity of the new fragment, update the storage usage rate of each computer device node to be selected, and other than the computer device node to be selected The storage utilization rate of the computer device node remains unchanged, so as to iteratively update the storage utilization rate summary table, and use the updated storage utilization rate summary table as a new storage utilization rate summary table; 更新所述新分片的数量减少第二数量,以对所述新分片的数量进行迭代更新,并返回至所述判断所述新分片的数量是否等于1的步骤。Updating the number of new fragments by a second quantity, so as to iteratively update the number of new fragments, and return to the step of judging whether the number of new fragments is equal to 1. 10.根据权利要求9所述的方法,其特征在于,所述当所述新分片的数量不大于所述计算机设备节点总量,且所述新分片的数量大于1时,根据所述存储使用率汇总表,筛选第二数量的新分片对应的待选计算机设备节点,包括:10. The method according to claim 9, wherein when the number of the new fragments is not greater than the total number of nodes of the computer equipment, and the number of the new fragments is greater than 1, according to the A summary table of storage utilization ratios, filtering candidate computer device nodes corresponding to the second number of new fragments, including: 当所述新分片的数量不大于所述计算机设备节点总量,且所述新分片的数量大于1时,根据所述存储使用率汇总表表征的各所述计算机设备节点的当前存储使用率,对各所述计算机设备节点进行升序排序;When the number of new fragments is not greater than the total number of computer equipment nodes, and the number of new fragments is greater than 1, the current storage usage of each computer equipment node represented by the storage usage summary table rate, sort each computer device node in ascending order; 根据各所述计算机设备节点的升序排序结果,从小到大筛选第二数量的待选计算机设备节点。According to the ascending sorting results of each of the computer equipment nodes, the second number of computer equipment nodes to be selected is screened from small to large. 11.根据权利要求9所述的方法,其特征在于,还包括:11. The method of claim 9, further comprising: 根据所述计算机设备节点总量及预设第二分配比例,确定所述第二数量。The second quantity is determined according to the total amount of computer equipment nodes and a preset second distribution ratio. 12.根据权利要求1所述的方法,其特征在于,所述获取搜索引擎数据库对应的各计算机设备节点的当前存储使用率,包括:12. The method according to claim 1, wherein the obtaining the current storage utilization rate of each computer device node corresponding to the search engine database comprises: 获取各所述计算机设备节点的数据存储空间;Acquiring the data storage space of each computer device node; 针对任一所述计算机设备节点,根据该计算机设备节点的当前数据存储量和数据存储空间,确定该计算机设备节点的当前存储使用率。For any of the computer device nodes, the current storage usage rate of the computer device node is determined according to the current data storage capacity and data storage space of the computer device node. 13.一种搜索引擎数据库管理装置,其特征在于,包括:13. A search engine database management device, characterized in that it comprises: 获取模块,用于获取搜索引擎数据库对应的各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量;An acquisition module, configured to acquire the current data storage capacity of each computer device node corresponding to the search engine database, the current storage usage rate of each computer device node, and the number of new fragments; 确定模块,用于根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点;A determining module, configured to determine the target computer device node according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments; 管理模块,用于将所述搜索引擎数据库的新分片,分配至所述目标计算机设备节点;A management module, configured to assign new fragments of the search engine database to the target computer device node; 所述确定模块,具体用于:The determination module is specifically used for: 根据各所述计算机设备节点的当前存储使用率,确定各所述计算机设备节点之间的当前平衡评价指标;According to the current storage usage rate of each of the computer equipment nodes, determine the current balance evaluation index between each of the computer equipment nodes; 根据各所述计算机设备节点之间的当前平衡评价指标,确定节点选择约束条件;Determine node selection constraints according to the current balance evaluation indicators among the computer equipment nodes; 按照所述节点选择约束条件,根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定目标计算机设备节点;According to the node selection constraints, determine the target computer device node according to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node, and the number of new fragments; 所述确定模块,具体用于:The determination module is specifically used for: 根据各计算机设备节点的当前数据存储量、各计算机设备节点的当前存储使用率以及新分片的数量,确定待选计算机设备节点;According to the current data storage capacity of each computer device node, the current storage usage rate of each computer device node and the number of new fragments, determine the computer device node to be selected; 预测将所述搜索引擎数据库的新分片分配至所述待选计算机设备节点后,各所述计算机设备节点之间的最新平衡评价指标;Predicting the latest balance evaluation index between each computer device node after the new fragment of the search engine database is allocated to the computer device node to be selected; 当所述最新平衡评价指标小于所述当前平衡评价指标时,将所述待选计算机设备节点,确定为所述目标计算机设备节点;When the latest balance evaluation index is smaller than the current balance evaluation index, determine the computer device node to be selected as the target computer device node; 所述确定模块,具体用于:The determination module is specifically used for: 根据各所述计算机设备节点的当前数据存储量,预测所述新分片的数据存储量;Predicting the data storage capacity of the new fragment according to the current data storage capacity of each of the computer device nodes; 根据各所述计算机设备节点的当前存储使用率、新分片的数量及所述新分片的数据存储量,确定待选计算机设备节点;Determine the computer equipment node to be selected according to the current storage usage rate of each of the computer equipment nodes, the number of new fragments and the data storage capacity of the new fragments; 所述确定模块,具体用于:The determination module is specifically used for: 根据各所述计算机设备节点的当前存储使用率,构建存储使用率汇总表;According to the current storage utilization rate of each described computer device node, construct a storage utilization rate summary table; 判断所述新分片的数量是否大于所述计算机设备节点总量;Judging whether the number of the new fragments is greater than the total number of computer device nodes; 当所述新分片的数量大于所述计算机设备节点总量时,根据所述存储使用率汇总表,筛选第一数量的新分片对应的待选计算机设备节点;When the number of the new fragments is greater than the total number of computer equipment nodes, according to the storage usage summary table, screen candidate computer equipment nodes corresponding to the first number of new fragments; 根据各所述待选计算机设备节点的当前数据存储量及所述新分片的数据存储量,更新各所述待选计算机设备节点的存储使用率,除所述待选计算机设备节点以外的其他计算机设备节点的存储使用率不变,以对所述存储使用率汇总表进行迭代更新,并将更新后的存储使用率汇总表,作为新的存储使用率汇总表;According to the current data storage capacity of each computer device node to be selected and the data storage capacity of the new fragment, update the storage usage rate of each computer device node to be selected, and other than the computer device node to be selected The storage utilization rate of the computer device node remains unchanged, so as to iteratively update the storage utilization rate summary table, and use the updated storage utilization rate summary table as a new storage utilization rate summary table; 更新所述新分片的数量减少第一数量,以对所述新分片的数量进行迭代更新,并返回至所述判断所述新分片的数量是否大于所述计算机设备节点总量的步骤。Updating the number of new shards to reduce the first amount, so as to iteratively update the number of new shards, and return to the step of judging whether the number of new shards is greater than the total number of computer device nodes . 14.一种搜索引擎数据库,其特征在于,包括:若干个计算机设备节点;14. A search engine database, characterized in that it comprises: several computer equipment nodes; 任一所述计算机设备节点执行如权利要求1至12任一项所述的方法。Any one of the computer equipment nodes executes the method according to any one of claims 1 to 12. 15.一种电子设备,其特征在于,包括:至少一个处理器和存储器;15. An electronic device, comprising: at least one processor and a memory; 所述存储器存储计算机执行指令;the memory stores computer-executable instructions; 所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如权利要求1至12任一项所述的方法。The at least one processor executes the computer-implemented instructions stored in the memory, causing the at least one processor to perform the method according to any one of claims 1 to 12. 16.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至12任一项所述的方法。16. A computer-readable storage medium, wherein computer-readable instructions are stored in the computer-readable storage medium, and when the processor executes the computer-executable instructions, the computer-readable storage medium according to any one of claims 1 to 12 is implemented. described method.
CN202310445218.7A 2023-04-24 2023-04-24 Search engine database management method, device and search engine database Active CN116166755B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202310445218.7A CN116166755B (en) 2023-04-24 2023-04-24 Search engine database management method, device and search engine database
PCT/CN2023/132695 WO2024221868A1 (en) 2023-04-24 2023-11-20 Search engine database management method and apparatus, and search engine database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310445218.7A CN116166755B (en) 2023-04-24 2023-04-24 Search engine database management method, device and search engine database

Publications (2)

Publication Number Publication Date
CN116166755A CN116166755A (en) 2023-05-26
CN116166755B true CN116166755B (en) 2023-07-14

Family

ID=86416762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310445218.7A Active CN116166755B (en) 2023-04-24 2023-04-24 Search engine database management method, device and search engine database

Country Status (2)

Country Link
CN (1) CN116166755B (en)
WO (1) WO2024221868A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116166755B (en) * 2023-04-24 2023-07-14 苏州浪潮智能科技有限公司 Search engine database management method, device and search engine database

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8909615B2 (en) * 2011-08-30 2014-12-09 Open Text S.A. System and method of managing capacity of search index partitions
CN106528683B (en) * 2016-10-25 2018-04-06 深圳市盛凯信息科技有限公司 A kind of the big data cloud search system and its method balanced based on index burst
CN109857710B (en) * 2019-01-04 2023-10-27 平安科技(深圳)有限公司 File storage method and terminal equipment
CN109831487B (en) * 2019-01-08 2022-05-13 平安科技(深圳)有限公司 Fragmented file verification method and terminal equipment
CN111475108B (en) * 2020-03-20 2023-11-28 深圳赛安特技术服务有限公司 Distributed storage method, computer equipment and computer readable storage medium
CN112434039A (en) * 2020-11-30 2021-03-02 浙江大华技术股份有限公司 Data storage method, device, storage medium and electronic device
CN113177050B (en) * 2021-05-18 2023-04-25 浙江大华技术股份有限公司 Data equalization method, device, query system and storage medium
CN113590703B (en) * 2021-08-10 2023-11-07 平安银行股份有限公司 ES data importing method and device, electronic equipment and readable storage medium
CN115129768A (en) * 2022-05-25 2022-09-30 网易(杭州)网络有限公司 Node capacity expansion method of distributed search engine
CN115396447B (en) * 2022-08-17 2025-01-24 天元大数据信用管理有限公司 A distributed database load balancing method, device, equipment and medium
CN116166755B (en) * 2023-04-24 2023-07-14 苏州浪潮智能科技有限公司 Search engine database management method, device and search engine database

Also Published As

Publication number Publication date
CN116166755A (en) 2023-05-26
WO2024221868A1 (en) 2024-10-31

Similar Documents

Publication Publication Date Title
WO2021017269A1 (en) Data migration method and apparatus, computer device, and storage medium
CN113849273B (en) Access processing method, device, storage medium and program product
EP2480974A1 (en) Distributed content storage and retrieval
CN108595268A (en) Data distribution method and device based on MapReduce and computer-readable storage medium
US11442632B2 (en) Rebalancing of user accounts among partitions of a storage service
JP6135509B2 (en) Information system, management method and program thereof, data processing method and program, and data structure
CN109196807B (en) Network node and method of operating a network node for resource distribution
CN105051725A (en) Graph data query method and device
CN110309143B (en) Data similarity determination method, device and processing device
CN116166755B (en) Search engine database management method, device and search engine database
US9898518B2 (en) Computer system, data allocation management method, and program
CN111949736A (en) Database load balancing method and device, electronic equipment and storage medium
CN117472889A (en) An adaptive tuning method and system for LSM-Tree key value index
CN115470210A (en) Data query method, device, equipment and medium in OA system
CN120315860A (en) Request allocation method, device, non-volatile storage medium and electronic device
CN116595015B (en) Data processing method, device, equipment and storage medium
CN118051498A (en) Hbase-based data management method, hbase-based data management device, hbase-based data management equipment and Hbase-based data management medium
CN119441169A (en) Server log document processing method and device
CN117407921A (en) Differential privacy histogram release method and system based on must-connect and don-connect constraints
CN115242662B (en) Data resource allocation method and device based on cloud computing
JP2017107300A (en) Data management program and data management method
CN108345699A (en) Obtain the method, apparatus and storage medium of multi-medium data
CN115766589B (en) A Virtual Network Mapping System Based on High Fault Tolerance
CN113760850A (en) Symmetric storage cluster directory quota management method, daemon device and system
CN117478304B (en) Block chain management method, system and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 215000 Building 9, No.1 guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Suzhou City, Jiangsu Province

Patentee after: Suzhou Yuannao Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: 215000 Building 9, No.1 guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Suzhou City, Jiangsu Province

Patentee before: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address