CN107544999B

CN107544999B - Synchronization device and synchronization method for retrieval system, and retrieval system and method

Info

Publication number: CN107544999B
Application number: CN201610487175.9A
Authority: CN
Inventors: 马振; 冯咀志; 吴鹏
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2016-06-28
Filing date: 2016-06-28
Publication date: 2022-10-21
Anticipated expiration: 2036-06-28
Also published as: CN107544999A

Abstract

The invention discloses a synchronization device and a synchronization method for a retrieval system, and the retrieval system and the retrieval method. The retrieval system comprises a synchronization node and a cluster node, wherein the synchronization node is independent from the cluster node and comprises one or more synchronization devices, and each synchronization device comprises at least one synchronization trigger and a synchronization unit, wherein the synchronization trigger is used for generating a synchronization trigger indication containing data identification information of data to be synchronized when monitoring that a synchronization trigger condition is met; the data grabber is used for grabbing data corresponding to the data identification information from a corresponding data source according to the synchronous trigger indication and transmitting the data to the dump device; and the dump device is used for dumping the captured data to a corresponding data table in the cluster node, and the cluster node is used for storing the data table and providing retrieval service. The invention does not need to restart the cluster nodes during upgrading or data transfer, meets the actual retrieval requirement, improves the retrieval efficiency and has the characteristics of high availability and high concurrency.

Description

Synchronization device and synchronization method for retrieval system, and retrieval system and method

Technical Field

The present invention relates to a retrieval technology, and in particular, to a synchronization apparatus and a synchronization method for a retrieval system, and a retrieval system and a retrieval method.

Background

Many existing application scenarios need to provide retrieval services for users, for example, a glutinous rice operation behavior is developed based on stores, and many application programs need to retrieve information of the stores, including multi-field query and sorting, chinese retrieval of store names and brand names, and distance retrieval of geographic coordinates.

At present, there are many implementations of a search system providing a search service, for example, tools or components such as an Elasticsearch (ES), river, and Elasticsearch-jetty are used.

Taking ES as an example, the retrieval architecture of ES provides retrieval services with a plurality of cluster nodes. Each cluster node is used to provide a database for storing data, and the database is composed of a data table (Type). The database is provided with an Index (Index), and the cluster nodes can respond to the data retrieval request of the user based on the Index, inquire corresponding data in the database and the data table, and feed back the data to the user. In the data table, data can be provided by different types of data sources, including mysql, mongoDB and rabitMQ, and the data table acquires data by calling the data sources.

Two main links are involved in the retrieval service, namely data synchronization and data operation (which can include reading, writing, querying and the like) of a user. The existing retrieval system has some disadvantages aiming at the operations provided by the two links.

For the data synchronization link, data synchronization implemented based on the ES system needs to provide different synchronization modes for different types of data sources, and a program for implementing synchronization operation is installed on a cluster node in a plug-in manner, which results in two defects: 1. different synchronous plug-ins need to be separately compiled for different data sources; 2. when the synchronous plug-in is required to run, the cluster nodes need to stop working, and the synchronous plug-in is installed and then started to run. In summary, if a new function is added in a plug-in form, for example, upgrading, the cluster needs to be restarted, which results in unstable service.

Meanwhile, for the data operation link of the user, the cluster node of the ES system cannot perform authority control, that is, any user data operation instruction sent to the cluster node is executed. And the data of different users are not physically isolated and stored in the cluster nodes, so that the problems of misoperation or malicious operation are easily caused.

Disclosure of Invention

The invention provides a synchronization device and a synchronization method for a retrieval system, and the retrieval system and the retrieval method, which are used for realizing that a user can flexibly configure a synchronization strategy according to different design requirements to obtain the retrieval system matched with the design requirements.

According to an aspect of the present invention, there is provided a synchronization apparatus applied to a retrieval system, including: the synchronous trigger is used for generating a synchronous trigger indication when the synchronous trigger condition is monitored to be met, and the synchronous trigger indication comprises data identification information of data to be synchronized; the data grabber is used for grabbing data corresponding to the data identification information from a corresponding data source according to the synchronous trigger instruction and transmitting the grabbed data to the dump device; and the dump memory is used for dumping the captured data to a corresponding data table in a cluster node, and the cluster node is used for storing the data table and providing retrieval service.

According to another aspect of the present invention, there is provided a retrieval system comprising a cluster node and a synchronization node, the synchronization node being independent of the cluster node and comprising one or more synchronization devices as described above, the cluster node being configured to store data tables and provide retrieval services.

According to another aspect of the present invention, there is provided a retrieval system, including a cluster node and a synchronization node, the cluster node being configured to store a data table and provide a retrieval service, the synchronization node being independent of the cluster node and configured to synchronize the data table in the cluster node, the cluster node further including: the permission control module is used for judging whether an account corresponding to the retrieval request has operation permission or not according to an account identifier, operation content, an operation object and a permission configuration table in the received user retrieval request after receiving the user retrieval request, wherein the permission configuration table stores a mapping relation among the account identifier, the operation content, the operation object and the operation permission; if the operation authority is provided, transmitting the retrieval request to a retrieval application program interface of the cluster node, and searching a matched result in a data table of the cluster node; and if the operation authority is not available, shielding the retrieval request.

According to another aspect of the present invention, there is provided a synchronization method applied to a retrieval system, including: when the condition that synchronous triggering is met is monitored through at least one synchronous trigger, generating synchronous triggering indication, wherein the synchronous triggering indication comprises data identification information of data to be synchronized; capturing data corresponding to the data identification information from a data source through a data grabber according to the synchronous trigger instruction, and transmitting the captured data to a dump device; and dumping the captured data to a corresponding data table in a cluster node through the dump memory, wherein the cluster node is used for storing the data table and providing retrieval service.

According to another aspect of the present invention, there is provided a retrieval method including: after receiving a user retrieval request, searching a data table of a cluster node for a matching result, wherein the data table is synchronized by a synchronization device according to the synchronization method, and the synchronization device is independent of the cluster node.

According to another aspect of the present invention, there is provided a retrieval method including: after receiving a user retrieval request, judging whether an account corresponding to the retrieval request has an operation authority or not according to an account identifier, operation content, an operation object and an authority configuration table in the received user retrieval request, wherein the authority configuration table stores a mapping relation among the account identifier, the operation content, the operation object and the operation authority; if the operation authority is provided, transmitting the retrieval request to a retrieval application program interface of the cluster node, and searching a matched result in a data table of the cluster node; and if the operation authority is not available, shielding the retrieval request.

According to the invention, the synchronous nodes are set independently of the cluster nodes, and the cluster nodes do not need to be restarted during installation or updating of the synchronous nodes, so that the stability of the service is improved. Different synchronous trigger conditions are configured according to different design requirements, different synchronous trigger indications are generated so as to control the data grabber and the data dump device to execute different data synchronization strategies, and the actual retrieval requirements are met. Under the same conditions, the synchronization time of 100 pieces of data synchronized by a single synchronization trigger from a data source in real time is less than 1 second, the synchronization time of 1000 ten thousand pieces of data synchronized by the single synchronization trigger in batch is less than 1 hour, and the response time of a retrieval system is less than 500 milliseconds.

Drawings

Fig. 1 is a schematic structural diagram of a retrieval system according to a first embodiment of the present invention;

fig. 2 is a schematic structural diagram illustrating an example of a synchronization apparatus according to a first embodiment of the present invention;

fig. 3 is a flowchart showing an example of a synchronization method for a retrieval system according to a first embodiment of the present invention;

fig. 4 is a schematic structural diagram of another example of a synchronization apparatus according to the first embodiment of the present invention;

FIG. 5 is a flowchart illustrating another example of a synchronization method for a retrieval system according to the first embodiment of the present invention;

FIG. 6 is a flow chart of a retrieval method according to a first embodiment of the invention;

FIGS. 7A-7D are schematic diagrams illustrating an example of one implementation of a retrieval method according to an embodiment of the invention;

FIG. 8 is a diagram showing a structure of a retrieval system according to a second embodiment of the present invention;

fig. 9 is a flowchart showing an example of a retrieval method according to the second embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be further noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings, not all of them.

Example one

Fig. 1 is a schematic structural diagram of a retrieval system 1 according to an embodiment of the present invention, which is applicable to a situation where a synchronization trigger configured by a user is used to generate a synchronization trigger indication, and data is synchronized to a set cluster according to different synchronization policies.

As shown in fig. 1, the retrieval system 1 comprises a synchronization node 110 and a cluster node 120, the synchronization node 110 being independent of the cluster node 120. The cluster node 120 stores a data table 121, and is configured to perform a search in the data table 121 and provide a search result after receiving a search request sent by a user. The synchronization node 110 includes one or more synchronization devices for synchronizing the data tables 121 in the cluster nodes 120.

Cluster node 120 is a cluster of servers that store data table 121. The cluster is constructed in a distributed manner by at least two cluster nodes 120, and has the characteristics of fragmented storage, multiple copies, load balancing and no center, so that the cluster expansion is facilitated. For example, an Elasticsearch cluster may be adopted, and since the Elasticsearch cluster itself has the features of hot standby, lateral expansion, no center, and the like, it may not be necessary to add load balancing and hot standby strategies.

The synchronization node 110 is a control terminal that is provided independently from the cluster node 120 and can perform data transmission with the cluster to synchronize data from a data source to the cluster. One or more synchronization devices may be created in the synchronization node 110 to perform data synchronization operations. While one of the synchronizers is upgraded, the other synchronizers 111 may continue to provide service without requiring the cluster node 120 to be down for maintenance.

Fig. 2 is a schematic structural diagram illustrating an example of a synchronization apparatus according to a first embodiment of the present invention. As shown in fig. 2, the synchronization apparatus includes at least one synchronization trigger 111-1, 111-2, \8230;, 111-n, a data grabber 112, and a dumper 113.

The synchronization trigger 111 is configured to generate a synchronization trigger indication when it is monitored that the synchronization trigger condition is satisfied, where the synchronization trigger indication includes data identification information of data to be synchronized, and send the generated synchronization trigger indication to the data grabber 112. In an example of the present invention, after the synchronization trigger 111 listens to the information, the synchronization trigger 111 may obtain the data identification information of the data to be synchronized based on the configuration information of the synchronization trigger 111 and the information listened by the synchronization trigger 111.

In an example of the present invention, the synchronization trigger 111 may include at least one of a timing trigger, a message trigger, and a log trigger. The timing trigger is a functional module that generates a synchronization trigger instruction for synchronizing the data to be synchronized of the set data source to the target data table 121 when the current time meets the set time or reaches the set period. The message trigger is a functional module that generates a synchronization trigger instruction for synchronizing data to be synchronized of the set data source to the target data table 121 when a set message is generated in the operation message for the data source. The log trigger is a functional module that generates a synchronization trigger instruction for synchronizing data of a set data source to the target data table 121 when the operation log of the data source includes data update of the data source.

The configuration message of the timing trigger may include a trigger period, which may be, for example, a set time or a set period, and data identification field information to be monitored, which may be unique identification information of data to be monitored, such as a data ID, and the like. The configuration message of the message trigger may include a message cluster, a message queue name, and a message token, and the configuration message is used to configure the pipe information for receiving the message. The configuration message of the log trigger may include the databus source, log data information to be listened to.

When the synchronization triggering condition is monitored to be met, generating a synchronization triggering indication, wherein the synchronization triggering indication comprises at least one of the following conditions:

(1) When the current time is monitored to meet the set time through the timing trigger or a set period is reached, generating a synchronous trigger indication containing unique identification information of the data needing to be synchronized based on monitoring information of which data need to be synchronized;

(2) When the message trigger monitors that a set message is generated in an operation message of a data source, extracting unique identification information of data needing to be synchronized from the set message, and generating a synchronization trigger indication containing the unique identification information of the data needing to be synchronized;

(3) And when the operation log of the data source is acquired and analyzed through the log trigger and the data updating of the data source is monitored, generating a synchronization trigger instruction containing the unique identification information of the data needing to be synchronized.

After receiving the synchronization trigger indication, the data grabber 112 grabs data corresponding to the data identification information from the corresponding data source according to the synchronization trigger indication, and transmits the grabbed data to the dump memory 113, such as a dump queue of the dump memory 113. The capturing mode can adopt a distributed service framework Dubbo mode or an SQL mode. The strategy of grabbing can be batch grabbing or time-limited grabbing. For example, it is preset that a set number of records are fetched from the data source for one fetch operation. The batch grabbing is to grab records with set number from a data source according to the preset number of the records grabbed at one time, and transmit the grabbed records to a dump. Meanwhile, the execution time of one-time grabbing operation can be preset. And if the number of the records captured at one time does not meet the set number and the execution time of the current capturing operation reaches the preset execution time, transmitting the captured records to the dump 113. When the data volume needing to be synchronized is larger than a set threshold value, a batch capture strategy is adopted, and long-time occupation of network and disk IO interfaces is reduced. And when the number of the required synchronizations is smaller than a set threshold value, a time-limited grabbing strategy is adopted to meet the timeliness requirement.

In an example of the present invention, the correspondence between the data fetcher and the data source may be obtained based on configuration information of the data fetcher. The configuration information of the data fetcher may include corresponding data source address information. In one example of the invention, the data fetcher may include an SQL fetcher and a Dubbo fetcher. Where the data fetcher is an SQL fetcher, the configuration information may include a database address. In addition, preferably, the configuration information may further include a database user name, a database password, and a monitoring statement. When the data fetcher is a Dubbo fetcher, the configuration information may include a remote service address. In addition, preferably, the configuration information may further include a service name, a method name, and a parameter.

Dump memory 113 is used to dump the captured data into corresponding data tables 121 in cluster node 120 for use by cluster node 120 to provide retrieval services. In an example of the present invention, the correspondence between the dump bank 113 and the data table 121 in the cluster node 120 may be obtained based on the configuration information of the dump bank 113. The configuration information of the dump 113 may include a data structure of index data.

Furthermore, preferably, the synchronization apparatus may further include a configuration module (not shown) for providing the configuration information of the synchronization trigger, the configuration information of the data grabber, and the configuration information of the dump memory for the user.

Fig. 3 is a flowchart illustrating an example of a synchronization method for a retrieval system according to a first embodiment of the present invention. As shown in fig. 3, in step S310, when it is monitored by at least one synchronization trigger that a synchronization trigger condition is met, the synchronization trigger generates a synchronization trigger indication, where the synchronization trigger indication includes data identification information of data to be synchronized, and sends the synchronization trigger indication to the data grabber. After receiving the synchronization trigger instruction sent by the synchronization trigger, in step S320, the data grabber grabs the data corresponding to the data identification information from the corresponding data source according to the synchronization trigger instruction, and transmits the grabbed data to the dump device. Next, in step S330, the dump memory dumps the captured data into a corresponding data table in the cluster node, so that the cluster node can provide the retrieval service.

Preferably, before the synchronization trigger generates the synchronization trigger indication, the method may further include: and receiving configuration information of the synchronous trigger, configuration information of the data grabber and configuration information of the dump device, which are provided by a user. The configuration information of the synchronization trigger, the data grabber and the dump device is as described above, and is not described herein again.

After receiving the configuration information, the data identification information of the data to be synchronized may be obtained based on the configuration information of the synchronization trigger and the information monitored by the synchronization trigger, the correspondence between the data grabber and the data source may be obtained based on the configuration information of the data grabber, and the correspondence between the data table in the dump and the data table in the cluster node may be obtained based on the configuration information of the dump.

Fig. 4 is a schematic structural diagram illustrating another example of a synchronization apparatus according to a first embodiment of the present invention. As shown in fig. 4, the synchronization apparatus includes at least one synchronization trigger 111' -1, 111' -2, \8230;, 111' -n, a data grabber 112', a dump device 113', and a scheduler 114.

The synchronization trigger 111' is configured to generate a synchronization trigger indication when it is monitored that the synchronization trigger condition is satisfied, the synchronization trigger indication including data identification information of data to be synchronized, and send the generated synchronization trigger indication to the scheduler 114. After receiving at least one synchronization trigger indication sent by the synchronization trigger, the scheduler 114 sends the synchronization trigger indication to the data fetcher 112' to obtain corresponding data according to a scheduling policy in the scheduler.

The data fetcher 112' fetches data corresponding to the data identification information from a corresponding data source according to the received synchronization trigger indication, and transmits the fetched data to the scheduler 114. The scheduler 114 sends the received captured data to the dump device 113 'according to the scheduling policy in the scheduler, and the dump device 113' dumps the captured data into the corresponding data table.

In an example of the present invention, after receiving at least one synchronization trigger indication sent by the synchronization trigger, scheduling, by a scheduler, the data fetcher to acquire corresponding data according to a scheduling policy in the scheduler may include: after receiving at least one synchronous trigger instruction sent by the synchronous trigger, putting the received at least one synchronous trigger instruction into a task pool as a task; acquiring a synchronous trigger instruction to be distributed to the data grabber according to a scheduling strategy in a scheduler; and allocating the acquired synchronous trigger instruction to be allocated to the data grabber to grab the corresponding data.

In one example of the present invention, the scheduling policy may include: maximum allocation synchronization trigger indication number, maximum dump data number, and synchronization trigger indication allocation mechanism. Here, the maximum number of assigned synchronization trigger indications refers to the maximum number of synchronization trigger indications assigned to the data fetcher at a time, that is, the maximum number of synchronization trigger indications that the data fetcher can process at a time. Generally, the maximum allocation trigger indication number refers to the maximum number of threads that can be concurrently processed by the data fetcher at a time. The maximum dump data amount refers to the maximum amount of data of the dump at a time. Generally, the maximum dump data amount refers to the maximum number of threads that the dump can concurrently process at a single time. Furthermore, preferably, the synchronization trigger indication allocation mechanism may include: a priority assignment mechanism; and/or a resource-saving allocation mechanism.

The priority allocation mechanism means that the scheduler determines the synchronization trigger indications to be allocated according to the priority of the synchronization trigger indications, for example, the synchronization trigger indications to be allocated are determined according to the priority from high to low according to the maximum allocation synchronization trigger indication number. Specifically, after receiving the synchronization triggering indication, the scheduler may assign different priorities to the synchronization triggering indication according to the trigger type corresponding to the synchronization triggering indication. For example, the priority of the synchronization indication triggered by the timing trigger is common; and the message trigger and the log trigger synchronously trigger and indicate that the priority is priority. And synchronous trigger indications triggered by the message trigger and the log trigger are sequenced according to the triggering time.

The resource saving allocation mechanism is to collect single synchronous instructions into a batch of synchronous instructions to capture and dump in batch, so that the execution times of a grabber and a dump device are reduced, and the consumption of a CPU, a network IO and a disk IO is further reduced.

Preferably, when the synchronization apparatus 110' includes a configuration module, the configuration module can be further used to configure a scheduling policy in the scheduler.

Fig. 5 is a flowchart illustrating another example of a synchronization method for a retrieval system according to the first embodiment of the present invention.

As shown in fig. 5, in step S510, when it is monitored by at least one synchronization trigger that a synchronization trigger condition is satisfied, the synchronization trigger generates a synchronization trigger indication containing data identification information of data to be synchronized, and sends the generated synchronization trigger indication to a scheduler. After receiving the synchronization trigger indication sent by the synchronization trigger, in step S520, after receiving at least one synchronization trigger indication sent by the synchronization trigger, the scheduler sends the synchronization trigger indication to the data grabber to obtain the corresponding data according to the scheduling policy in the scheduler. Next, in step S530, the data fetcher fetches data corresponding to the data identification information from a corresponding data source according to the received synchronization trigger indication, and transmits the fetched data to the scheduler. Subsequently, in step S540, the scheduler sends the received fetched data to the dump according to the scheduling policy in the scheduler. Then, in step S550, the dump memory dumps the captured data into a corresponding data table for use by the cluster node to provide the retrieval service.

Preferably, before the synchronization trigger generates the synchronization trigger indication, the synchronization method may further include: and receiving configuration information of the synchronization trigger, configuration information of the data grabber, configuration information of the dump memory and a scheduling strategy of the scheduler, wherein the configuration information of the synchronization trigger, the configuration information of the data grabber and the configuration information of the dump memory are provided by a user. The configuration information of the synchronization trigger, the data grabber and the dump device and the scheduling policy of the scheduler are as described above, and are not described herein again.

Fig. 6 shows a flowchart of a retrieval method according to a first embodiment of the present invention, which is executed by the retrieval system shown in fig. 1. As shown in fig. 6, a retrieval request input by a user is received at step S610, and then, a data table of a cluster node, which is synchronized according to the synchronization method as described above by a synchronization means independent of the cluster node, is searched for a matching result based on the received retrieval request at step S620.

Fig. 7A to 7D are schematic diagrams showing an implementation example of the retrieval method according to the embodiment of the present invention. FIG. 7A illustrates a schematic diagram of an application registering and acquiring a token on a web page; FIG. 7B is a diagram showing index mapping and synchronization configuration applied under a web page creation application; FIG. 7C is a schematic diagram illustrating the system automatically completing real-time synchronization of retrieved data; fig. 7D shows a schematic diagram of an application retrieving index data under the application through the api.

In this example, a user with a search requirement first registers in the RTS system to obtain an app and a token, as shown in fig. 7A. The rules for data synchronization and the data structure of the index are specified by the configuration module, as shown in FIG. 7B. The system automatically completes the synchronization of the retrieved data as shown in fig. 7C. Users can search data through ES native api, which facilitates migration of history items, as shown in FIG. 7D.

In summary, a mode of splitting a synchronization trigger, a grabber and a dump device in the synchronization node 110 is adopted, the synchronization trigger is responsible for indicating data to be synchronized, the grabber grabs the data and then transmits the data to a dump queue, the dump device acquires the data from the dump queue and synchronizes the data to a corresponding data table 121 of the cluster node 120, so that the stateless characteristic of the data of the synchronization node 110 is realized. Therefore, when the same batch of data (data corresponding to the same index) is synchronized, a plurality of synchronization strategies can be configured. Due to the data stateless characteristic, a plurality of synchronization strategies can be executed concurrently, and errors of the synchronization data version caused by different execution sequence of the synchronization strategies and the data updating sequence can be avoided. For example, at least one synchronization trigger can be configured for the same index, and different synchronization trigger conditions are adopted to generate a synchronization trigger indication, so that the fetcher and the dump execute data synchronization according to the synchronization trigger indication. Due to the stateless characteristic of the data, the error of the version of the synchronous data can not be caused by the difference between the execution sequence of the synchronous strategy and the update sequence of the data.

According to the technical scheme, the synchronization node is set independently of the cluster node, the cluster node does not need to be restarted during installation or updating of the synchronization node, and the stability of service is improved. Different synchronous trigger conditions are configured according to different design requirements, different synchronous trigger instructions are generated to control the data grabber and the data dump device to execute different data synchronization strategies, actual retrieval requirements are met, retrieval efficiency is improved, and the method has the advantages of being high in availability and high in concurrency.

Example two

Fig. 8 is a schematic structural diagram of a retrieval system in the second embodiment of the present invention. The technical solution of this embodiment is based on the above embodiment, and further includes a configuration platform 130 and an authority control module 122.

The authority control module 122 is configured in the cluster node 120, and is configured to determine whether an account corresponding to the received user retrieval request has an operation authority according to an account identifier, operation content, an operation object, and an authority configuration table, where the authority configuration table stores a mapping relationship between an account identifier, operation content, and an operation object. If the operation authority is provided, transmitting the retrieval request to a retrieval application program interface of the cluster node, and searching a matched result in a data table of the cluster node; and if the operation authority is not available, shielding the retrieval request. The advantage of this arrangement is that the data in the cluster node 120 is controlled by the authority, and the requirements of isolation and sharing of data when multiple users work cooperatively are met.

Preferably, the account identifier may include an application name and an account password, the operation object includes database index information and data table information, and the operation content includes an operation right, such as a read-write right. The permission configuration table includes mapping relationships of application names, account passwords, database Index information (i.e., index information), data table information (i.e., type information), and operation permissions. For example, a type defining a dedicated index is used to store ACL mapping rules. The ACL mapping rule includes, among other things, application names (apps), keys (keys), indexes (indices), data tables (types), and permissions (permissions) defined. When the user input operation request is detected, whether the ACL mapping rule can be passed or not is judged. And if the operation request meets the ACL mapping rule, executing the operation request, otherwise, prompting the user that no operation authority exists. The authority configuration table may be stored in the authority control module, or may be stored in other locations of the cluster node.

In this way, for the index or data table created by the user himself, the user's account can be given the read-write authority for the created index or data table. The index or the data table created by the non-user does not have the operation authority for the index or the data table. If the user wants to access the index or the data table without the operation authority, the user needs to apply for the administrator, and the administrator allocates the reading authority to the account of the user. The advantage of this setting is that the index level and type level of the control of the authority division can be realized, and the user's read-write authority for each index and type can be controlled.

The configuration platform 130 is configured to provide a human-computer interaction configuration interface based on a WEB manner, so that a user inputs configuration information to configure the permission configuration table. The configuration information includes permission configuration parameters (including a user account and a key, and an operable index), data table structure parameters (including a data table name and a data table type), data capture configuration parameters (including a data source, a network address url, an account, a key, and a capture mode), and data synchronization configuration parameters (including a data source, a network address url, an account, a key, a trigger mode, and a trigger frequency). The configuration platform 130 is used for realizing the decentralized control and the data synchronous configuration, so that a user does not need to develop codes by himself, and the research and development efficiency is improved.

According to the technical scheme of the embodiment, the authority control module 122 is used for performing the authority-sharing control on the data in the cluster node 120, and the requirements of data isolation and sharing during multi-user cooperative work are met. The configuration platform 130 is used for realizing the decentralized control and the data synchronous configuration, so that a user does not need to develop codes by himself, and the research and development efficiency is improved.

Furthermore, it is noted here that preferably, the configuration module in the synchronization apparatus may also be provided in the configuration platform 130.

As shown in fig. 9, in step 910, the right control module in the cluster node receives a retrieval request of a user through a right control plug-in running in the cluster node, where the retrieval request includes an account identifier, operation content, and an operation object. In one example, the account identifier may include an application name and an account password, the operation object includes database index information and data table information, and the operation content includes an operation right.

After receiving the retrieval request of the user, in step S920, extracting the account identifier, the operation content and the operation object information from the retrieval request, and then in step S930, querying a rights configuration table in the rights control module based on the extracted account identifier, the extracted operation content and the extracted operation object information to determine whether the rights control module has the operation rights, wherein the rights configuration table stores the mapping relationship between the account identifier, the extracted operation content, the extracted operation object and the extracted operation object information. For example, the authority configuration table includes mapping relationships between application names, account passwords, database index information, data table information, and operation authorities.

For example, a type defining a dedicated index is used to store the ACL mapping rule, i.e., the rights configuration table. The ACL mapping rule includes, among other things, application names (apps), keys (keys), indexes (indices), data tables (types), and permissions (permissions) defined.

After intercepting an operation instruction of a user, the authority control plug-in extracts an account identifier in the operation instruction, inquires an authority configuration table according to the account identifier, and determines an index or a data table which can be operated (written and/or read) by the user. And extracting the operation object in the operation instruction, and matching the operation object with the determined index or data table which can be operated by the user. And extracting the operation content corresponding to the successfully matched index or data table, and comparing the operation content with the operation authority of the user, thereby judging whether the user has the operation authority.

If the operation authority is provided, in step S940, the retrieval request is transmitted to a retrieval application program interface of the cluster node, and a matching result is searched in a data table of the cluster node;

if the operation authority is not available, in step S950, the search request is masked.

The invalid operation instruction input by the user includes that the user has no operation authority on the data table included in the operation instruction, or the operation authority of the user on the data table included in the operation instruction does not accord with the operation content in the operation instruction. For example, the operation authority of the user on the data table included in the operation instruction has only a read operation, but the operation content in the input operation instruction is to perform a write operation on the data table.

And when the operation instruction input by the user is invalid, shielding the operation instruction and prompting that no operation authority exists.

According to the technical scheme, the authority control plug-in is operated in the cluster node, the authority control of the account at the index level or the data table level is achieved, the data of different users are physically isolated and stored, the users cannot execute operation on the data without the operation authority, and the effect of avoiding misoperation or malicious operation is achieved.

It will be appreciated that the invention also extends to computer programs, particularly computer programs on or in a carrier, adapted for putting the invention into practice. The program may be in the form of source code, object code, a code intermediate source and object code such as partially compiled form, or in any other form suitable for use in the implementation of the method according to the invention. It will also be noted that such programs may have many different architectural designs. For example, program code implementing the functionality of a method or system according to the invention may be subdivided into one or more subroutines.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in some detail by the above embodiments, the invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the invention, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. A synchronization apparatus for use in a search system, comprising:

the synchronous trigger is used for generating a synchronous trigger instruction when the synchronous trigger condition is monitored to be met, and the synchronous trigger instruction comprises data identification information of data to be synchronized;

the data grabber is used for grabbing data corresponding to the data identification information from a corresponding data source according to the synchronous trigger instruction and transmitting the grabbed data to the dump device;

the dump device is used for dumping the captured data to a corresponding data table in a cluster node, and the cluster node is used for storing the data table and providing retrieval service;

the configuration module is used for providing configuration information of the synchronous trigger, configuration information of the data grabber and configuration information of the dump memory for a user; the corresponding relation between the data grabber and the data source and the corresponding relation between the data dump and the data table in the cluster node are respectively obtained based on the configuration information of the data grabber and the configuration information of the data dump;

wherein the retrieval system comprises a synchronization node and the cluster node; the synchronization node is used for creating one or more synchronization devices to execute data synchronization operation;

the synchronous node is a control terminal which is independent of the cluster node, and the control terminal is used for carrying out data transmission with the cluster and synchronizing data from the data source to the cluster; the cluster is composed of at least two cluster nodes, and the cluster nodes are server clusters for storing data tables.

2. The synchronization apparatus according to claim 1, further comprising:

the data identification information of the data to be synchronized is obtained based on the configuration information of the synchronization trigger and the information monitored by the synchronization trigger.

3. The synchronization apparatus of claim 1 or 2, wherein the synchronization trigger comprises at least one of a timing trigger, a message trigger, and a log trigger;

and when the synchronization triggering condition is monitored to be met, generating a synchronization triggering indication, wherein the synchronization triggering indication comprises at least one of the following conditions:

when the current time is monitored to meet the set moment through the timing trigger or a set period is reached, the synchronous trigger indication is generated;

generating the synchronous trigger indication when monitoring that a set message is generated in an operation message of a data source through the message trigger;

and when the operation log of the data source is acquired and analyzed through the log trigger and the data update of the data source is monitored, generating the synchronous trigger indication.

4. The synchronization apparatus according to claim 2, further comprising:

and the scheduler is used for scheduling the data grabber to acquire corresponding data and/or scheduling the dump memory to store the grabbed data into a corresponding data table according to a scheduling strategy in the scheduler after receiving at least one synchronous trigger instruction sent by the synchronous trigger and/or the data grabbed by the data grabber.

5. The synchronization apparatus according to claim 4, wherein the scheduler is specifically configured to:

after receiving at least one synchronous trigger instruction sent by the synchronous trigger, putting the received at least one synchronous trigger instruction into a task pool as a task;

acquiring a synchronous trigger instruction to be allocated to the data grabber according to a scheduling strategy in a scheduler;

and distributing the acquired synchronous trigger indication to be distributed to the data grabber to grab corresponding data.

6. The synchronization apparatus of claim 5, wherein the scheduling policy comprises: maximum allocation synchronization trigger indication number, maximum dump data number, and synchronization trigger indication allocation mechanism.

7. The synchronization apparatus of claim 6, wherein the synchronization trigger indication allocation mechanism comprises:

a priority assignment mechanism; and/or the presence of a gas in the atmosphere,

a resource saving allocation mechanism.

8. The synchronization apparatus of claim 4, wherein the configuration module is further configured to:

and configuring a scheduling strategy in the scheduler.

9. A retrieval system, characterized in that the retrieval system comprises a synchronization node comprising one or more synchronization devices according to any of claims 1 to 8.

10. The retrieval system of claim 9, further comprising:

the authority control module is configured in the cluster node and used for judging whether an account corresponding to the retrieval request has an operation authority or not according to an account identifier, operation content, an operation object and an authority configuration table in the received user retrieval request, wherein the authority configuration table stores a mapping relation among the account identifier, the operation content and the operation object;

if the operation authority is provided, transmitting the retrieval request to a retrieval application program interface of the cluster node, and searching a matched result in a data table of the cluster node; and if the operation authority is not available, shielding the retrieval request.

11. The retrieval system of claim 10, wherein the account identifier comprises an application name and an account password, the operation object comprises database index information and data table information, the operation content comprises an operation right, and the right configuration table comprises a mapping relationship of the application name, the account password, the database index information, the data table information and the operation right.

12. The retrieval system of claim 10 or 11, further comprising:

and the configuration platform is used for providing a human-computer interaction configuration interface based on a WEB mode, and a user inputs configuration information to configure the permission configuration table.

13. Retrieval system according to any of claims 9 to 11, characterised in that the cluster node is an ElasticSearch cluster node.

14. A retrieval system, characterized in that the retrieval system comprises a synchronization node and a cluster node, said synchronization node being adapted to synchronize data tables in said cluster node, said synchronization node comprising one or more synchronization devices according to any of claims 1 to 8; the cluster node further comprises: the permission control module is used for judging whether an account corresponding to the retrieval request has operation permission or not according to an account identifier, operation content, an operation object and a permission configuration table in the received user retrieval request after receiving the user retrieval request, wherein the permission configuration table stores a mapping relation among the account identifier, the operation content, the operation object and the operation permission;

15. A synchronization method for use in a search system, comprising:

when the condition that synchronous triggering is met is monitored through at least one synchronous trigger, generating synchronous triggering indication, wherein the synchronous triggering indication comprises data identification information of data to be synchronized;

capturing data corresponding to the data identification information from a corresponding data source through a data capturing device according to the synchronous trigger indication, and transmitting the captured data to a dump device;

dumping the captured data to a corresponding data table in a cluster node through the dump memory, wherein the cluster node is used for storing the data table and providing retrieval service;

the synchronous node is a control terminal which is independent of the cluster node, and the control terminal is used for carrying out data transmission with the cluster and synchronizing data from the data source to the cluster; the cluster is composed of at least two cluster nodes, and the cluster nodes are server clusters for storing data tables;

wherein the method further comprises:

receiving configuration information of the synchronous trigger, configuration information of the data grabber and configuration information of the dump device, which are provided by a user;

and acquiring the corresponding relation between the data grabber and the data source and the corresponding relation between the dump device and the data table in the cluster node respectively based on the configuration information of the synchronous trigger, the configuration information of the data grabber and the configuration information of the dump device.

16. The synchronization method according to claim 15, characterized in that the method further comprises:

17. The synchronization method according to claim 15 or 16, wherein the synchronization trigger comprises at least one of a timing trigger, a message trigger, and a log trigger;

and generating a synchronization trigger indication when the synchronization trigger condition is monitored to be met by at least one synchronization trigger, wherein the synchronization trigger indication comprises at least one of the following conditions:

when the current time is monitored to meet the set time through the timing trigger or a set period is reached, the synchronous trigger indication is generated;

when monitoring that a set message is generated in an operation message of a data source through the message trigger, generating the synchronous trigger indication;

18. The synchronization method according to claim 15, further comprising:

after receiving at least one synchronous trigger instruction sent by the synchronous trigger and/or data grabbed by the data grabber, respectively scheduling the data grabber by the scheduler to acquire corresponding data and/or scheduling the dump memory to store the grabbed data into a corresponding data table according to a scheduling strategy in the scheduler.

19. The synchronization method according to claim 18, wherein after receiving at least one synchronization trigger indication sent by the synchronization trigger, scheduling the data fetcher by the scheduler to obtain corresponding data according to a scheduling policy in the scheduler comprises:

acquiring a synchronous trigger instruction to be distributed to the data grabber according to a scheduling strategy in a scheduler;

and allocating the acquired synchronous trigger instruction to be allocated to the data grabber to grab the corresponding data.

20. The synchronization method of claim 18, wherein the scheduling policy comprises: maximum allocation synchronization trigger indication number, maximum dump data number, and synchronization trigger indication allocation mechanism.

21. The synchronization method of claim 20, wherein the synchronization trigger indication allocation mechanism comprises:

a resource saving allocation mechanism.

22. The synchronization method of claim 18, wherein the scheduling policy is configured based on user-provided configuration information.

23. A method of searching, the method comprising:

after receiving the user retrieval request, searching the data table of the cluster node for a matched result,

wherein the data table is synchronized with a synchronization method according to any of claims 15 to 22 by a synchronization means, which is independent of the cluster nodes.

24. The retrieval method of claim 23, wherein the method further comprises:

after receiving a user retrieval request, judging whether an account corresponding to the retrieval request has an operation authority or not according to an account identifier, operation content, an operation object and an authority configuration table in the received user retrieval request, wherein the authority configuration table stores a mapping relation among the account identifier, the operation content, the operation object and the operation authority;

25. A method of searching, the method comprising:

if the operation authority is provided, transmitting the retrieval request to a retrieval application program interface of the cluster node, and searching a matching result in a data table of the cluster node, wherein the data table is synchronized by a synchronization device according to the synchronization method of any one of claims 15 to 22, and the synchronization device is independent of the cluster node;

and if the operation authority is not available, shielding the retrieval request.