CN107357659A - Towards the group technology and querying method of Storm successive ranges inquiry GSLB - Google Patents
Towards the group technology and querying method of Storm successive ranges inquiry GSLB Download PDFInfo
- Publication number
- CN107357659A CN107357659A CN201710536098.6A CN201710536098A CN107357659A CN 107357659 A CN107357659 A CN 107357659A CN 201710536098 A CN201710536098 A CN 201710536098A CN 107357659 A CN107357659 A CN 107357659A
- Authority
- CN
- China
- Prior art keywords
- query
- range
- subquery
- cost
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24549—Run-time optimisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5019—Workload prediction
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及通信网络技术领域,尤其涉及一种面向Storm连续范围查询全局负载均衡的分组方法及查询方法。The invention relates to the technical field of communication networks, in particular to a grouping method and a query method for querying global load balancing in a continuous range of Storm.
背景技术Background technique
随着定位手段的多样化、移动终端的普及与通讯基础设施的完备,以基于位置服务(Location Based Service,LBS)为代表的移动应用已经步入移动大数据时代。移动大数据环境下,数据规模更大、传播速度更快、多样性更加广泛,呈现出鲜明的流式特征,传统LBS技术面临多种新的挑战。基于位置服务的连续范围查询,具有高并发、低延迟特点,因此需要更高效的针对具有流式特征的移动大数据的处理能力。移动大数据时代的数据处理不仅需要存储与处理能力更强更灵活的计算平台,还需依托于计算平台的处理和优化技术。With the diversification of positioning methods, the popularization of mobile terminals and the completion of communication infrastructure, mobile applications represented by Location Based Service (LBS) have entered the era of mobile big data. In the mobile big data environment, the data scale is larger, the transmission speed is faster, and the diversity is more extensive, showing distinct streaming characteristics. Traditional LBS technology faces many new challenges. Continuous range query based on location services has the characteristics of high concurrency and low latency, so more efficient processing capabilities for mobile big data with streaming characteristics are required. Data processing in the era of mobile big data requires not only a computing platform with stronger and more flexible storage and processing capabilities, but also processing and optimization technologies based on the computing platform.
然而,在分布式系统中普遍存在着著名的“短板理论”,一个系统如果出现了负载不均衡问题,那么负载最大的节点往往将成为影响系统整体表现的瓶颈和短板。由于经济发展,地理位置等因素,人口密度在不同区域是不相同的,相应的,和LBS应用相对应的移动对象在地理分布上也是不均匀的。ApacheStorm本身作为一个分布式流处理系统,系统内部并没有提供有效的负载均衡机制,而且Storm自带的分组策略如Shuffle Grouping、FieldsGrouping都是基于一种通用思想而设计的分组策略,而没有考虑处理的任务所包含的语义,如连续范围查询具有查询范围、移动对象密度、范围重叠等时空语义,移动对象在地理分布上是不均匀的,因而范围查询的代价也不尽相同,这样很容易导致处理范围查询的各计算单元之间的负载不均衡,性能下降,所以Storm自带分组策略也不能满足系统负载均衡的需要,这无疑对整个系统的性能表现是一种挑战,针对云计算环境中的在线流处理的负载均衡研究相对较少,传统的和针对批处理的负载均衡技术无法直接应用到流处理系统中。However, there is a well-known "short board theory" in distributed systems. If a system has a load imbalance problem, the node with the largest load will often become the bottleneck and short board that affects the overall performance of the system. Due to factors such as economic development and geographical location, the population density is different in different regions, and correspondingly, the geographical distribution of mobile objects corresponding to LBS applications is also uneven. As a distributed stream processing system, ApacheStorm itself does not provide an effective load balancing mechanism, and Storm's own grouping strategies such as Shuffle Grouping and FieldsGrouping are grouping strategies designed based on a general idea without considering the processing The semantics contained in the task, such as continuous range query has time and space semantics such as query range, moving object density, range overlap, etc., moving objects are not evenly distributed geographically, so the cost of range query is also different, which can easily lead to The load between computing units that process range queries is unbalanced and the performance is degraded. Therefore, Storm's own grouping strategy cannot meet the needs of system load balancing. This is undoubtedly a challenge to the performance of the entire system. For cloud computing environments There are relatively few load balancing studies on online stream processing, and traditional and batch-oriented load balancing techniques cannot be directly applied to stream processing systems.
发明内容Contents of the invention
针对上述问题,本发明的目的在于提供一种面向Storm连续范围查询全局负载均衡的分组方法及查询方法。In view of the above problems, the object of the present invention is to provide a grouping method and a query method for querying global load balancing in the continuous range of Storm.
为了解决背景技术中所存在的问题,本发明的技术方案为:In order to solve the existing problems in the background technology, the technical solution of the present invention is:
一种面向Storm连续范围查询全局负载均衡的分组方法,包括:A grouping method for global load balancing of Storm continuous range queries, including:
1)、获取范围查询信息,所述范围查询信息包括查询范围和网格重叠量;1), obtaining range query information, the range query information includes query range and grid overlap;
2)、根据查询范围和网格重叠量,将范围查询分为多个子查询,每个子查询的查询范围只和一个网格重叠;2), according to the query range and grid overlap, the range query is divided into multiple sub-queries, and the query range of each sub-query only overlaps with one grid;
3)、读取Redis中存储的与子查询查询范围重叠网格中移动对象的密度,并根据子查询的查询范围以及网格中移动对象的密度,计算子查询的代价;3), read the density of moving objects in the grid overlapping with the subquery query range stored in Redis, and calculate the cost of the subquery according to the query range of the subquery and the density of moving objects in the grid;
4)、根据查询代价,将子查询映射到相应的组,然后从轮询计数器表中获取该组计数器的值;4), according to the query cost, map the subquery to the corresponding group, and then obtain the value of the group counter from the polling counter table;
5)、根据计数器的值对下游worker的数量取模,得到目标worker id,并将目标worker id下发到下游。5) Take the modulus of the number of downstream workers according to the value of the counter to obtain the target worker id, and send the target worker id to the downstream.
所述步骤2)中根据查询范围和网格重叠量,将范围查询分为多个子查询具体包括:In the step 2), according to the query range and grid overlap, the range query is divided into multiple sub-queries specifically including:
2.1、设定分组数量,根据分组数量计算第一次分组粒度;2.1. Set the number of groups, and calculate the first grouping granularity according to the number of groups;
2.2、设定二次分组数量,根据第一次分组粒度和二次分组的数量计算二次分组的粒度,计算公式为:2.2. Set the number of secondary groups, and calculate the granularity of the secondary group according to the granularity of the first group and the number of secondary groups. The calculation formula is:
minGrain=grain/minGroupminGrain=grain/minGroup
其中,所述minGrain为二次分组粒度,grain为第一次分组粒度,minGroup为二次分组数量。Wherein, the minGrain is the granularity of the secondary grouping, the grain is the granularity of the first grouping, and minGroup is the quantity of the secondary grouping.
所述子查询的代价公式为:The cost formula of the subquery is:
C(q)=r×dC(q)=r×d
其中,所述r为子查询的查询范围大小,d为网格中移动对象的密度。Wherein, the r is the size of the query range of the sub-query, and d is the density of moving objects in the grid.
所述根据查询代价,将子查询映射到相应的组的具体步骤为:The specific steps of mapping subqueries to corresponding groups according to the query cost are as follows:
4.1、设定查询代价阈值范围,判断查询代价与查询代价阈值范围的大小,若查询代价大于或小于查询代价阈值范围,则直接根据查询代价和第一次分组粒度计算所在组;4.1. Set the query cost threshold range, determine the query cost and the size of the query cost threshold range, if the query cost is greater than or less than the query cost threshold range, directly calculate the group according to the query cost and the first grouping granularity;
4.2、对于其他的范围子查询,由于其数量过多,所以将其做更细粒度的分组,则根据查询代价、第一次分组粒度和第二次分组粒度计算所在组。4.2. For other range sub-queries, because there are too many of them, they are grouped at a finer granularity, and the group is calculated according to the query cost, the granularity of the first grouping and the granularity of the second grouping.
所述轮询计算器存储在Redis中。The polling calculator is stored in Redis.
一种面向Storm连续范围查询全局负载均衡的查询方法,包括:A query method for global load balancing for continuous range query in Storm, including:
1)、获取范围查询信息,所述范围查询信息包括查询范围和网格重叠量;将范围查询按照查询范围与网格重叠情况,划分为相应的子查询,将所述子查询按照全局分组轮询的分组策略分发到下游目标worker;1), obtain the range query information, the range query information includes the query range and grid overlap; the range query is divided into corresponding sub-queries according to the query range and grid overlap, and the sub-query is divided into corresponding sub-queries according to the global grouping round The grouping strategy of the query is distributed to the downstream target worker;
2)、执行范围查询的子查询,并以范围查询的时间戳为key,按照Fields Grouping的分组策略分发到下游worker,确保属于同一个范围查询的子查询被分发到同一个worker;2) Execute the subquery of the range query, and use the timestamp of the range query as the key, and distribute it to downstream workers according to the grouping strategy of Fields Grouping, so as to ensure that the subqueries belonging to the same range query are distributed to the same worker;
3)、将属于同一个范围查询的子查询的查询结果合并,得到最终的查询结果,最后将最终的查询结果以Shuffle Grouping的分组方式分发到下游Bolt;3) Combine the query results of the subqueries belonging to the same range query to obtain the final query result, and finally distribute the final query result to the downstream Bolt in the grouping mode of Shuffle Grouping;
4)、将范围查询结果发布到Kafka,由客户端订阅。4) Publish the range query results to Kafka and subscribe to them by the client.
所述将所述子查询按照全局分组轮询的分组策略分发到下游目标worker具体包括:The distributing the subquery to the downstream target worker according to the grouping strategy of global grouping polling specifically includes:
1.1、获取范围查询信息,所述范围查询信息包括查询范围和网格重叠量;1.1. Obtain range query information, the range query information includes query range and grid overlap;
1.2、根据查询范围和网格重叠量,将范围查询分为多个子查询,每个子查询的查询范围只和一个网格重叠;1.2. According to the query range and grid overlap, the range query is divided into multiple subqueries, and the query range of each subquery only overlaps with one grid;
1.3、读取Redis中存储的与子查询查询范围重叠网格中移动对象的密度,并根据子查询的查询范围以及网格中移动对象的密度,计算子查询的代价;1.3. Read the density of moving objects stored in Redis in the grid overlapping with the subquery query range, and calculate the cost of the subquery according to the query range of the subquery and the density of moving objects in the grid;
1.4、根据查询代价,将子查询映射到相应的组,然后从轮询计数器表中获取该组计数器的值;1.4. According to the query cost, map the subquery to the corresponding group, and then obtain the value of the group counter from the polling counter table;
1.5、根据计数器的值对下游worker的数量取模,得到目标worker id,并将目标worker id下发到下游。1.5. Take the modulus of the number of downstream workers according to the value of the counter, obtain the target worker id, and send the target worker id to the downstream.
与现有技术相比较,本发明的有益效果为:Compared with prior art, the beneficial effects of the present invention are:
本发明提供了一种面向Storm连续范围查询全局负载均衡的分组方法,对网格索引下的连续范围查询,结合范围查询语义,利用Redis存储并负责更新网格内移动对象数量,将查询范围内的移动对象数量作为评估代价,按照查询代价将范围查询任务分到相应的组,同一个组内的范围查询任务轮询地分发到下游worker,轮询计数器由Redis维护,实现全局分组轮询的分组策略,能够根据分组策略有效地提高了系统负载均衡度,从而提高了系统的资源利用率。The present invention provides a grouping method for global load balancing of Storm's continuous range query. For the continuous range query under the grid index, combined with the range query semantics, Redis is used to store and update the number of moving objects in the grid. The number of moving objects is used as the evaluation cost, and the range query tasks are divided into corresponding groups according to the query cost. The range query tasks in the same group are distributed to the downstream workers in a round-robin manner. The polling counter is maintained by Redis to realize global group polling. The grouping strategy can effectively improve the system load balance according to the grouping strategy, thereby improving the resource utilization rate of the system.
本发明提供了一种面向Storm连续范围查询全局负载均衡的查询方法,针对服务系统在处理范围查询时负载均衡度不高的问题,结合范围查询的语义,提出了全局分组轮询的分组策略,并结合Storm的系统特点将其应用到服务系统中,对源源不断的范围查询任务按此策略进行分组,使处理范围查询任务的worker之间的负载差值尽量小,提高服务系统的负载均衡度。The present invention provides a query method for global load balancing of continuous range query in Storm, aiming at the problem that the load balance degree of the service system is not high when processing range query, combined with the semantics of range query, a grouping strategy of global group polling is proposed, Combined with the system characteristics of Storm, it is applied to the service system, and the continuous range query tasks are grouped according to this strategy, so that the load difference between workers processing range query tasks is as small as possible, and the load balance of the service system is improved. .
附图说明Description of drawings
图1是本发明图面向Storm连续范围查询全局负载均衡的分组方法流程图;Fig. 1 is the flow chart of the grouping method of the present invention's graph-oriented Storm continuous range query global load balancing;
图2是本发明图全局分组轮询的分组策略执行过程框图;Fig. 2 is a block diagram of the grouping policy execution process of the global grouping polling of the present invention;
图3是本发明面向Storm连续范围查询全局负载均衡的查询方法过程框图。Fig. 3 is a process block diagram of the query method of the present invention for querying global load balancing in the continuous range of Storm.
具体实施方式detailed description
下面结合附图对本发明做详细描述。The present invention will be described in detail below in conjunction with the accompanying drawings.
云环境下移动对象连续查询服务系统的连续范围查询存在的负载不均衡问题,直接影响到系统的资源利用率和性能,因此必须针对连续范围查询的特点,自定义分组策略以保证系统的负载均衡。本发明提出了一种全局分组轮询的分组策略(Global GroupingShuffle Grouping,GGSG)针对网格索引下的连续范围查询,结合范围查询语义,利用Redis存储并负责更新网格内移动对象数量,将查询范围内的移动对象数量作为评估代价,按照查询代价将范围查询任务分到相应的组,同一个组内的范围查询任务轮询地分发到下游worker,轮询计数器由Redis维护,实现全局分组轮询的分组策略。The unbalanced load of the continuous range query of the mobile object continuous query service system in the cloud environment directly affects the resource utilization and performance of the system. Therefore, it is necessary to customize the grouping strategy according to the characteristics of the continuous range query to ensure the load balance of the system. . The present invention proposes a global grouping polling grouping strategy (Global GroupingShuffle Grouping, GGSG) for the continuous range query under the grid index, combined with the range query semantics, using Redis to store and update the number of moving objects in the grid, and query The number of moving objects in the range is used as the evaluation cost, and the range query tasks are divided into corresponding groups according to the query cost. The range query tasks in the same group are distributed to the downstream workers in a round-robin manner. The polling counter is maintained by Redis to realize the global grouping round query grouping strategy.
如图1、2所示,本发明提供了一种面向Storm连续范围查询全局负载均衡的分组方法,包括:As shown in Figures 1 and 2, the present invention provides a grouping method for querying global load balancing in the continuous range of Storm, including:
1)、获取范围查询信息,所述范围查询信息包括查询范围和网格重叠量;1), obtaining range query information, the range query information includes query range and grid overlap;
2)、根据查询范围和网格重叠量,将范围查询分为多个子查询,每个子查询的查询范围只和一个网格重叠;2), according to the query range and grid overlap, the range query is divided into multiple sub-queries, and the query range of each sub-query only overlaps with one grid;
根据查询范围和网格重叠量,将范围查询分为多个子查询具体包括:According to the query range and the amount of grid overlap, the range query is divided into multiple subqueries, including:
2.1、设定分组数量,根据分组数量计算第一次分组粒度;2.1. Set the number of groups, and calculate the first grouping granularity according to the number of groups;
2.2、设定二次分组数量,根据第一次分组粒度和二次分组数量计算二次分组的粒度,计算公式为:2.2. Set the number of secondary groups, and calculate the granularity of the secondary group according to the granularity of the first group and the number of secondary groups. The calculation formula is:
minGrain=grain/minGroupminGrain=grain/minGroup
其中,所述minGrain为二次分组粒度,grain为第一次分组粒度,minGroup为二次分组数量。Wherein, the minGrain is the granularity of the secondary grouping, the grain is the granularity of the first grouping, and minGroup is the quantity of the secondary grouping.
1)分组数量越多,分组粒度越小,每组中包含的范围查询之间的代价相差也就越小,这种情况下当范围查询总量能达到一定数量时,每组包含的范围查询数量也就足够多,每组按照轮询分组策略分发范围查询任务,能够保证worker之间的负载相差较小,系统有较高的负载均衡度,而当范围查询总量达不到一定数量的时候,每组包含的范围查询数量较少,轮询分组过程中有可能导致分发到每个worker的范围查询数量相差较多,容易导致系统的负载均衡度不高;1) The larger the number of groups, the smaller the grouping granularity, and the smaller the cost difference between the range queries contained in each group. In this case, when the total amount of range queries can reach a certain number, the range queries contained in each group The number is sufficient. Each group distributes range query tasks according to the polling grouping strategy, which can ensure that the load difference between workers is small, and the system has a high degree of load balance. When the total amount of range queries does not reach a certain number Sometimes, the number of range queries contained in each group is small, and the number of range queries distributed to each worker may vary greatly during the polling grouping process, which may easily lead to low load balance of the system;
2)分组数量越少,分组粒度越大,每组中包含的范围查询之间的代价相差较大,当分组数量为1时,相当于Shuffle Grouping。综上所述,单纯的分组数量太多或者太少,都不是合适的选择,而根据连续范围查询总量或者连续范围查询到来的速度,选择合适的分组,能更好的提高系统的负载均衡度。2) The smaller the number of groups, the larger the grouping granularity, and the larger the cost difference between the range queries contained in each group. When the number of groups is 1, it is equivalent to Shuffle Grouping. To sum up, simply having too many or too few groups is not an appropriate choice, but selecting the appropriate group according to the total amount of continuous range queries or the speed at which continuous range queries arrive can better improve the load balance of the system Spend.
3)、读取Redis中存储的与子查询查询范围重叠网格中移动对象的密度,并根据子查询的查询范围以及网格中移动对象的密度,计算子查询的代价;所述子查询的代价公式为:3), read the density of moving objects stored in Redis overlapping with the subquery query range, and calculate the cost of the subquery according to the query range of the subquery and the density of the moving objects in the grid; the cost formula of the subquery for:
C(q)=r×dC(q)=r×d
其中,所述r为子查询的查询范围大小,d为网格中移动对象的密度;Wherein, the r is the size of the query range of the sub-query, and d is the density of moving objects in the grid;
4)、根据查询代价,将子查询映射到相应的组,然后从轮询计数器表中获取该组计数器的值;所述轮询计算器存储在Redis中。4) According to the query cost, the subquery is mapped to the corresponding group, and then the value of the group counter is obtained from the polling counter table; the polling calculator is stored in Redis.
所述根据查询代价,将子查询映射到相应的组的具体步骤为:The specific steps of mapping subqueries to corresponding groups according to the query cost are as follows:
4.1、设定查询代价阈值范围,判断查询代价与查询代价阈值范围的大小,若查询代价大于或小于查询代价阈值范围,则直接根据查询代价和第一次分组粒度计算所在组;4.1. Set the query cost threshold range, determine the query cost and the size of the query cost threshold range, if the query cost is greater than or less than the query cost threshold range, directly calculate the group according to the query cost and the first grouping granularity;
4.2、对于其他的范围子查询,由于其数量过多,所以将其做更细粒度的分组,则根据查询代价、第一次分组粒度和第二次分组粒度计算所在组。4.2. For other range sub-queries, because there are too many of them, they are grouped at a finer granularity, and the group is calculated according to the query cost, the granularity of the first grouping and the granularity of the second grouping.
对于映射函数,由于查询代价符合正态分布,代价偏大和代价偏小的范围子查询任务数量少,所以对于代价偏大和代价偏小的范围子查询,直接按照其查询代价计算所在组,对于其他的范围子查询,由于其数量过多,所以可以将其做更细粒度的分组,这样可以使得下游负载更为均衡。For the mapping function, since the query cost conforms to the normal distribution, the number of range subquery tasks with relatively high cost and relatively small cost is small, so for the range subquery with relatively large cost and relatively small cost, the group is directly calculated according to its query cost, and for other Due to the large number of range subqueries, they can be grouped into finer-grained groups, which can make the downstream load more balanced.
5)、根据计数器的值对下游worker的数量取模,得到目标worker id,并将目标worker id下发到下游。5) Take the modulus of the number of downstream workers according to the value of the counter to obtain the target worker id, and send the target worker id to the downstream.
最后对该组的计数器执行操作worker_j=(counter_i++)mod n,即得子查询的下游目标worker。轮询计算器存储在Redis中,可以对计数器的值执行原子更新操作。Finally, the operation worker_j=(counter_i++) mod n is performed on the counters of the group to obtain the downstream target worker of the subquery. Polling counters are stored in Redis and atomic update operations can be performed on the counter's value.
如图2所示,GetCoveredCellsBolt和CellScanBolt之间使用了全局轮询分组的分组策略(GGSG),GGSG的具体执行过程。As shown in Figure 2, the global polling group grouping strategy (GGSG) is used between GetCoveredCellsBolt and CellScanBolt, and the specific execution process of GGSG.
本发明还提供了一种面向Storm连续范围查询全局负载均衡的查询方法,如图3所示,包括:The present invention also provides a query method for querying global load balancing facing the continuous range of Storm, as shown in Figure 3, including:
1)、获取范围查询信息,所述范围查询信息包括查询范围和网格重叠量;将范围查询按照查询范围与网格重叠情况,划分为相应的子查询,将所述子查询按照全局分组轮询的分组策略(GGSG)分发到下游目标worker;1), obtain the range query information, the range query information includes the query range and grid overlap; the range query is divided into corresponding sub-queries according to the query range and grid overlap, and the sub-query is divided into corresponding sub-queries according to the global grouping round The grouping strategy (GGSG) of the query is distributed to the downstream target worker;
2)、执行范围查询的子查询,并以范围查询的时间戳为key,按照Fields Grouping的分组策略分发到下游worker,确保属于同一个范围查询的子查询被分发到同一个worker;2) Execute the subquery of the range query, and use the timestamp of the range query as the key, and distribute it to downstream workers according to the grouping strategy of Fields Grouping, so as to ensure that the subqueries belonging to the same range query are distributed to the same worker;
3)、将属于同一个范围查询的子查询的查询结果合并,得到最终的查询结果,最后将最终的查询结果以Shuffle Grouping的分组方式分发到下游Bolt;3) Combine the query results of the subqueries belonging to the same range query to obtain the final query result, and finally distribute the final query result to the downstream Bolt in the grouping mode of Shuffle Grouping;
4)、将范围查询结果发布到Kafka,由客户端订阅。4) Publish the range query results to Kafka and subscribe to them by the client.
所述将所述子查询按照全局分组轮询的分组策略分发到下游目标worker具体包括:The distributing the subquery to the downstream target worker according to the grouping strategy of global grouping polling specifically includes:
1.1、获取范围查询信息,所述范围查询信息包括查询范围和网格重叠量;1.1. Obtain range query information, the range query information includes query range and grid overlap;
1.2、根据查询范围和网格重叠量,将范围查询分为多个子查询,每个子查询的查询范围只和一个网格重叠;1.2. According to the query range and grid overlap, the range query is divided into multiple subqueries, and the query range of each subquery only overlaps with one grid;
1.3、读取Redis中存储的与子查询查询范围重叠网格中移动对象的密度,并根据子查询的查询范围以及网格中移动对象的密度,计算子查询的代价;1.3. Read the density of moving objects stored in Redis in the grid overlapping with the subquery query range, and calculate the cost of the subquery according to the query range of the subquery and the density of moving objects in the grid;
1.4、根据查询代价,将子查询映射到相应的组,然后从轮询计数器表中获取该组计数器的值;1.4. According to the query cost, map the subquery to the corresponding group, and then obtain the value of the group counter from the polling counter table;
1.5、根据计数器的值对下游worker的数量取模,得到目标worker id,并将目标worker id下发到下游。1.5. Take the modulus of the number of downstream workers according to the value of the counter, obtain the target worker id, and send the target worker id to the downstream.
首先,GetCoveredCellsBolt将一个到来的范围查询,按照其与网格的重叠情况,分为多个子查询,子查询的查询范围大小用r表示,每个子查询的查询范围只和一个网格重叠。然后读取Redis中存储的和子查询的查询范围有重叠的网格中移动对象的密度d,用公式3.1计算子查询的代价,根据查询代价,将子查询映射到相应的组。对于映射函数,由于查询代价符合正态分布,代价偏大和代价偏小的范围子查询任务数量少,所以对于代价偏大和代价偏小的范围子查询,直接按照其查询代价计算所在组,对于其他的范围子查询,由于其数量过多,所以可以将其做更细粒度的分组,这样可以使得下游负载更为均衡。First, GetCoveredCellsBolt divides an incoming range query into multiple sub-queries according to its overlap with the grid. The size of the query range of the sub-query is represented by r, and the query range of each sub-query only overlaps with one grid. Then read the density d of moving objects in the grid that overlaps the query scope of the subquery stored in Redis, calculate the cost of the subquery with formula 3.1, and map the subquery to the corresponding group according to the query cost. For the mapping function, since the query cost conforms to the normal distribution, the number of range subquery tasks with relatively high cost and relatively small cost is small, so for the range subquery with relatively large cost and relatively small cost, the group is directly calculated according to its query cost, and for other Due to the large number of range subqueries, they can be grouped into finer-grained groups, which can make the downstream load more balanced.
最后对该组的计数器执行操作:And finally perform an operation on the group's counter:
worker_j=(counter_i++)mod n,worker_j = (counter_i++) mod n,
即得子查询的下游目标worker。轮询计算器存储在Redis中,可以对计数器的值执行原子更新操作。That is, the downstream target worker of the subquery. Polling counters are stored in Redis and atomic update operations can be performed on the counter's value.
最终,通过实验显示全局分组轮询的分组策略比Fields Grouping和ShuffleGrouping有更好的均衡度,可以有效的提高系统资源利用率,提升系统吞吐量。Finally, experiments show that the grouping strategy of global group polling has a better balance than Fields Grouping and ShuffleGrouping, which can effectively improve system resource utilization and system throughput.
对于本领域技术人员而言,显然能了解到上述具体实施例只是本发明的优选方案,因此本领域的技术人员对本发明中的某些部分所可能作出的改进、变动,体现的仍是本发明的原理,实现的仍是本发明的目的,均属于本发明所保护的范围。For those skilled in the art, it is obvious that the above-mentioned specific embodiments are only preferred solutions of the present invention, so those skilled in the art may make improvements and changes to some parts of the present invention, which still reflect the present invention The principle of the present invention is still the object of the present invention, and all belong to the protection scope of the present invention.
Claims (7)
- A kind of 1. group technology towards Storm successive ranges inquiry GSLB, it is characterised in that including:1) range query information, is obtained, the range query information includes query context and mesh overlay amount;2), according to query context and mesh overlay amount, range query is divided into multiple subqueries, the query context of each subquery Only and a mesh overlay;3) density of mobile object in the grid overlapping with subquery query context stored in Redis, is read, and according to subquery Query context and grid in mobile object density, calculate the cost of subquery;4), according to Query Cost, subquery is mapped to corresponding group, then obtains this group of counter from poll counter device table Value;5), the quantity modulus according to the value of counter to downstream worker, obtains target worker id, and by target worker Id is issued to downstream.
- 2. according to claim 1 inquire about the overall situation towards Storm successive ranges according to the grouping strategy of global packet poll The group technology of load balancing, it is characterised in that according to query context and mesh overlay amount in the step 2), by range query It is divided into multiple subqueries to specifically include:2.1st, number of packet is set, first time Packet granularity is calculated according to number of packet;2.2nd, secondary number of packet is set, the granularity of secondary packet is calculated according to first time Packet granularity and secondary number of packet, Calculation formula is:MinGrain=grain/minGroupWherein, the minGrain is secondary Packet granularity, and grain is first time Packet granularity, and minGroup is secondary packet Quantity.
- 3. the group technology according to claim 1 towards Storm successive ranges inquiry GSLB, its feature exists In the cost formula of the subquery is:C (q)=r × dWherein, the r is the query context size of subquery, and d is the density of mobile object in grid.
- 4. the group technology according to claim 1 towards Storm successive ranges inquiry GSLB, its feature exists In, it is described according to Query Cost, subquery is mapped to concretely comprising the following steps of organizing accordingly:4.1st, Query Cost threshold range is set, the size of Query Cost and Query Cost threshold range is judged, if Query Cost More than or less than Query Cost threshold range, then place group is directly calculated according to Query Cost and first time Packet granularity;4.2nd, for other scope subqueries, because its quantity is excessive, so being done more fine-grained packet, then basis is looked into Ask group where cost, first time Packet granularity and second of Packet granularity calculate.
- 5. the group technology according to claim 1 towards Storm successive ranges inquiry GSLB, its feature exists In the poll calculator is stored in Redis.
- A kind of 6. querying method towards Storm successive ranges inquiry GSLB, it is characterised in that including:1) range query information, is obtained, the range query information includes query context and mesh overlay amount;Range query is pressed According to query context and mesh overlay situation, corresponding subquery is divided into, point by the subquery according to global packet poll Group policy is distributed to downstream targets worker;2) subquery of range query, is performed, and using the timestamp of range query as key, according to FieldsGrouping point Group policy is distributed to downstream worker, it is ensured that the subquery for belonging to same range query is distributed to same worker;3) Query Result that, will belong to the subquery of same range query merges, and obtains final Query Result, finally will most Whole Query Result is distributed to downstream Bolt with Shuffle Grouping packet mode;4) range query result, is published to Kafka, by client subscription.
- 7. the querying method according to claim 6 towards Storm successive ranges inquiry GSLB, its feature exists In described the subquery is distributed into downstream targets worker according to the grouping strategy of global packet poll to specifically include:1.1st, range query information is obtained, the range query information includes query context and mesh overlay amount;1.2nd, according to query context and mesh overlay amount, range query is divided into multiple subqueries, the inquiry model of each subquery Enclose only and a mesh overlay;1.3rd, the density of mobile object in the grid overlapping with subquery query context stored in Redis is read, and is looked into according to son The density of mobile object in the query context and grid of inquiry, calculate the cost of subquery;1.4th, according to Query Cost, subquery is mapped to corresponding group, group counting is then obtained from poll counter device table The value of device;1.5th, the quantity modulus according to the value of counter to downstream worker, obtains target worker id, and by target worker Id is issued to downstream.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710536098.6A CN107357659B (en) | 2017-07-04 | 2017-07-04 | Grouping method and query method of global load balancing for continuous range query of Storm |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710536098.6A CN107357659B (en) | 2017-07-04 | 2017-07-04 | Grouping method and query method of global load balancing for continuous range query of Storm |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN107357659A true CN107357659A (en) | 2017-11-17 |
| CN107357659B CN107357659B (en) | 2020-09-29 |
Family
ID=60292067
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201710536098.6A Expired - Fee Related CN107357659B (en) | 2017-07-04 | 2017-07-04 | Grouping method and query method of global load balancing for continuous range query of Storm |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN107357659B (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109726225A (en) * | 2019-01-11 | 2019-05-07 | 广东工业大学 | A Storm-based Distributed Streaming Data Storage and Query Method |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105701209A (en) * | 2016-01-13 | 2016-06-22 | 广西师范大学 | Load balancing method for improving parallel connection performance on big data |
| CN105868218A (en) * | 2015-01-23 | 2016-08-17 | 中国移动通信集团河北有限公司 | Data processing method and electronic device |
-
2017
- 2017-07-04 CN CN201710536098.6A patent/CN107357659B/en not_active Expired - Fee Related
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105868218A (en) * | 2015-01-23 | 2016-08-17 | 中国移动通信集团河北有限公司 | Data processing method and electronic device |
| CN105701209A (en) * | 2016-01-13 | 2016-06-22 | 广西师范大学 | Load balancing method for improving parallel connection performance on big data |
Non-Patent Citations (5)
| Title |
|---|
| JING ZHANG等: "The Real-time Scheduling Strategy Based on Traffic and Load Balancing in Storm", 《2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS》 * |
| YINGHAI LI等: "Grouping-Shuffling particle swarm optimization: an improved PSO for continuous optimization", 《ICSI10 PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON ADVANCES IN SWARM INTELLIGENCE》 * |
| 李浩: "基于Twitter Storm的云平台监控系统研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
| 王波涛等: "基于storm的连续范围查询优化技术", 《计算机工程与 科学》 * |
| 黄容等: "基于Storm slot使用率低优先的动态负载均衡策略", 《电脑知识与技术》 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109726225A (en) * | 2019-01-11 | 2019-05-07 | 广东工业大学 | A Storm-based Distributed Streaming Data Storage and Query Method |
| CN109726225B (en) * | 2019-01-11 | 2023-08-01 | 广东工业大学 | A Storm-based distributed stream data storage and query method |
Also Published As
| Publication number | Publication date |
|---|---|
| CN107357659B (en) | 2020-09-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Xu et al. | Stela: Enabling stream processing systems to scale-in and scale-out on-demand | |
| CN106462578B (en) | Methods for querying and updating database entries | |
| US10268726B1 (en) | Partition key management for improved throughput | |
| US8635250B2 (en) | Methods and systems for deleting large amounts of data from a multitenant database | |
| US9813516B2 (en) | Transparent sharding of traffic across messaging brokers | |
| US11381506B1 (en) | Adaptive load balancing for distributed systems | |
| US12399900B2 (en) | Real-time streaming data ingestion into database tables | |
| US10158709B1 (en) | Identifying data store requests for asynchronous processing | |
| US11061930B1 (en) | Dynamic management of storage object partitioning | |
| CN102880557B (en) | look-up method of multistage distribution type high-speed cache of heterogeneous data source | |
| US20190034084A1 (en) | Selecting controllers based on affinity between access devices and storage segments | |
| US9137325B2 (en) | Efficiently isolating malicious data requests | |
| CN111913670B (en) | Processing method and device for load balancing, electronic equipment and storage medium | |
| CN109510852B (en) | Method and device for gray scale publishing | |
| Brinkmann et al. | Scalable monitoring system for clouds | |
| Xie et al. | Pandas: robust locality-aware scheduling with stochastic delay optimality | |
| WO2023231339A1 (en) | Transaction execution method and node in blockchain system, and blockchain system | |
| US11816511B1 (en) | Virtual partitioning of a shared message bus | |
| US9998865B2 (en) | Method for performing distributed geographic event processing and geographic event processing system | |
| CN117785952A (en) | Data query method, device, server and medium | |
| CN108696571A (en) | Cloud storage service system, method, cloud service smart machine and electronic device | |
| Hefny et al. | Comparative study load balance algorithms for map reduce environment | |
| US20230344796A1 (en) | Secure message exchange between deployments | |
| Costa et al. | Towards an adaptive and distributed architecture for managing workflow provenance data | |
| CN107357659A (en) | Towards the group technology and querying method of Storm successive ranges inquiry GSLB |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200929 |
|
| CF01 | Termination of patent right due to non-payment of annual fee |