WO2010098034A1

WO2010098034A1 - Distributed database management system and distributed database management method

Info

Publication number: WO2010098034A1
Application number: PCT/JP2010/000935
Authority: WO
Inventors: 上村純平; 柏木岳彦
Original assignee: 日本電気株式会社
Priority date: 2009-02-24
Filing date: 2010-02-16
Publication date: 2010-09-02
Also published as: US20110307470A1; JPWO2010098034A1

Abstract

Provided is a shared nothing database system which can effectively perform data manipulation on distributed databases. A distributed database management system is provided with a query receiving unit (load balancer) to receive a query, and a plurality of storage processing units to cooperatively perform data manipulation on the distributed databases on the basis of the received query. Each of the plurality of storage processing units comprises a storage device to store one of a plurality of partial databases which constitute the distributed database, and a data manipulation unit to perform data manipulation on the partial database stored in the storage device on the basis of the query.

Description

Distributed database management system and distributed database management method

The present invention relates to a technique for executing data operations on a distributed database.

In a database process, a cluster configuration using a plurality of processors such as a plurality of servers is widely adopted in order to distribute a large amount of transaction processing load. As a database system in a cluster configuration, a shared disk type system and a shared nothing type system are known. The shared disk type is a shared system that shares computer resources such as a CPU and storage, and the shared nothing type is a non-shared system that does not share computer resources. Here, the computer resources include not only real computer resources but also virtual computer resources. The advantage of the shared nothing type is that there is no competing for computer resources between processors (between servers), so it is possible to achieve processing efficiency according to the number of processors, and scalability (system expandability) compared to the shared disk type ) Is superior.

A shared nothing type database system is disclosed in, for example, Patent Document 1 (Japanese Patent Laid-Open No. 2007-025785) and Patent Document 2 (Japanese Patent Laid-Open No. 2005-078394).

JP 2007-025785 A JP-A-2005-078394

However, in a shared nothing type (non-shared) database system, a plurality of processors each control a non-shared computer resource group, and the database is distributed and stored in these non-shared computer resource groups. Therefore, when query processing using the entire data group distributedly stored in the non-shared computer resource group is executed, there is a problem that the processing speed is reduced.

For example, the non-shared database system of Patent Document 2 includes a plurality of database nodes and a load balancer that manages these database nodes. When the load balancer executes a transaction using a plurality of data groups distributed and stored in a plurality of database nodes in response to a processing request from a client terminal, the load balancer Request data transfer. Thereafter, the load balancer executes a transaction using the data group transferred from these database nodes. However, unless all the necessary data groups are transferred from the database node, the load balancer cannot complete the transaction, which causes a reduction in processing speed.

In view of the above, an object of the present invention is to provide a non-shared database system and a database management method capable of efficiently executing data operations on a distributed database.

According to the present invention, a distributed database management system for performing data operations on a distributed database is provided. The distributed database management system includes a query receiving unit that receives a query, and a plurality of storage processing units that cooperatively execute data operations on the distributed database based on the received query. Each of the storage processing units is based on a storage device storing one of a plurality of partial databases constituting the distributed database, and the partial database stored in the storage device based on the query. A data operation unit that executes data operations.

According to the present invention, it has a plurality of storage processing units that cooperatively execute data operations on a distributed database based on a query, and each of the storage processing units includes a plurality of partial databases constituting the distributed database. A distributed database management method in a distributed database management system configured to include a storage apparatus storing one of them is provided. In this distributed database management method, (a) a first storage processing unit among the plurality of storage processing units stores a data set necessary for executing a data operation based on the query in the partial database. If not, issuing a data transfer request for the data set to one or a plurality of second storage processing units different from the first storage processing unit among the plurality of storage processing units; and (b) the first 2 in the storage processing unit, acquiring the data set from the partial database in response to the data transfer request, and transferring the data set to the first storage processing unit; and (c) in the first storage processing unit The data using the data set transferred from the second storage processing unit Comprising performing a work, a.

According to the present invention, the plurality of storage processing units execute data operations on the partial databases managed by each of them in parallel and in cooperation, so that distributed database management that efficiently executes data operations on the distributed database A system is provided.

The above-described object and other objects, features, and advantages will be further clarified by a preferred embodiment described below and the following drawings attached thereto.

1 is a functional block diagram schematically showing a configuration of a distributed database management system according to an embodiment of the present invention. It is a figure which shows roughly an example of the database table which comprises a distributed database. It is a functional block diagram which shows the structure of a storage process part roughly. It is a flowchart which shows roughly the procedure of the transaction process by the data operation part of a storage process part. It is a flowchart which shows roughly the process sequence by the data operation part which received the data transfer request. It is a figure which shows an example of a communication sequence schematically. It is a figure which shows the other example of a communication sequence schematically. It is a figure which shows schematically the further another example of a communication sequence. It is a figure which shows schematically the further another example of a communication sequence. It is a figure which shows schematically the further another example of a communication sequence. It is a figure which shows roughly an example of the structure of a partial database. It is a figure which shows an example of a real table typically. (A) And (B) is a figure which shows the logical data structure which comprises a partial database. It is a figure which shows the structure of a partial database roughly. It is a figure which shows the structure of a partial database roughly. It is a figure for demonstrating the aggregation and adjustment function of a router.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. In all the drawings, the same components are denoted by the same reference numerals, and detailed description thereof is appropriately omitted so as not to overlap.

FIG. 1 is a functional block diagram schematically showing the configuration of a distributed database management system 10 according to an embodiment of the present invention. As shown in FIG. 1, the distributed database management system 10 includes a load balancer 11,

query servers

20A, 20B, and 20C, data servers 22 ₁ to 22 _N, and a management server 30. Each of the data servers 22 ₁ to 22 _N stores a partial database constituting the distributed database, and the distributed database management system 10 performs data operations on the distributed database.

As will be described later, the distributed database has at least one table structure, and the partial database constitutes a subset (subset) of the table structure. FIG. 2 is a diagram schematically showing an example of the database table TBL constituting the distributed database. As shown in FIG. 2, the database table TBL includes a plurality of tuples (rows), the column defined in the column direction (attribute _field) A 1, _{A 2,} ..., and _{A P.} Tuples and columns A _1, A _2, ..., in a region defined by intersections of the A _P is stored data. As shown in FIG. 2, a plurality of subsets TG ₁ , TG ₂ ,..., TG _N can be configured by dividing (horizontal division) this database table TBL in the row direction. Such subsets TG ₁ , TG ₂ ,..., TG _N can be stored in the data servers 22 ₁ to 22 _N as tables of partial databases, respectively.

A plurality of partial database tables may be configured by dividing the database table TBL in the column direction (vertical division), or a plurality of partial database tables may be formed by a combination of horizontal division and vertical division. It may be configured.

As shown in FIG. 1, a distributed database management system 10 and a client terminal T1 are connected to the communication network NW. In addition to the distributed database management system 10 and the client terminal T1, many client terminals (not shown) are connected to the communication network NW. Examples of the network NW include a wide area network such as the Internet, but are not limited thereto.

The client terminal T1 generates a query described in a database language (data manipulation language) such as SQL (Structured Query Query Language) or XQuery (XML Query Language) for the database of the distributed database management system 10. It has a function of transmitting a query to the distributed database management system 10. The query describes a database language that prescribes data operations such as data search, insertion, update, or deletion for the distributed database.

The load balancer 11 receives a query transmitted from the client terminal T1 via the communication network NW as a data processing request, and uses this query (hereinafter referred to as a received query) as a query server (query accepting unit) 20A to 20C. A function of distributing the processing load evenly. The load balancer 11 may select any of the query servers 20A to 20C according to, for example, a round robin method.

The

query servers

20A, 20B, and 20C include

query analysis units

21A, 21B, and 21C, respectively. The query analysis units 21A to 21C have a function of analyzing and optimizing the received query distributed by the load balancer 11. The query analysis units 21A to 21C analyze the received query, and convert the received query into an analysis tree format query optimized for a specific database structure based on the analysis result. At this time, it is possible to convert the received query into, for example, an abstract syntax tree (AST) format query.

Each of the data servers 22 ₁ to 22 _N includes a router 24 and a plurality of storage processing units 25 ₁ to 25 _M. The router 24 has a function of controlling data transfer between any storage processing units among the storage processing units 25 ₁ to 25 _M. The data servers 22 ₁ to 22 _N are connected to each other via a wired transmission line such as a LAN (Local Area Network) or a wireless transmission line. The router 24 in any data server 22 _i has a function of performing data communication with another router 24 in another data server 22 _j (i ≠ j).

The management server 30 has a management table 30T that defines the correspondence between a plurality of partial databases constituting the distributed database and the data servers 22 ₁ to 22 _N. Since one of the

query servers

20A, 20B, and 20C transfers the analysis result of the received query to the management server 30, the management server 30 refers to the management table 30T based on the analysis result, and the data server 22 ₁ to 22 _N determines a query supply destination, and notifies the query server of the result. The query server transmits the converted query to one or more data servers from the data servers 22 ₁ to 22 _N according to the notification from the management server 30.

Each router 24 has a routing table RTL defining a storage processing unit ₂₅ 1 ~ 25 _M, the correspondence between the database tables stored respectively in these storage processing unit ₂₅ 1 ~ ₂₅ _M. The router 24 refers to the routing table RTL and determines _one of the storage processing units 25 ₁ to 25 _M as a supply destination of the query received from the query servers 20A to 20C.

FIG. 3 is a functional block diagram schematically showing the configuration of the storage processing unit _25k . As illustrated in FIG. 3, the storage processing unit 25 _k includes a queue unit 250, a data operation unit 251, and a storage device 255. The data operation unit 251 includes a query analysis unit 252, a transaction execution unit 253, and an internal query issue unit 254. The storage device 255 has a plurality of storages, and has a controller and input / output ports (not shown) for controlling these storages.

The queue unit 250 has a function of temporarily holding a plurality of queries sequentially input from the router 24, and supplies the previously input and held queries to the data operation unit 251 with priority. In the data operation unit 251, the query analysis unit 252 analyzes the query supplied from the queue unit 250 and generates an execution plan. The transaction execution unit 253 executes a transaction according to this execution plan.

The transaction execution unit 253 issues a data acquisition request for the data set to the internal query issuing unit 254 when the data set necessary for executing the transaction is not stored in the partial database in the storage device 255. In response to this data acquisition request, the internal query issuing unit 254 has a function of generating an internal query and issuing a data transfer request including the internal query to the router 24 to acquire the data set. The function of the internal query issuing unit 254 will be described later. The transaction execution unit 253 executes a transaction using the data set acquired by the internal query issuing unit 254.

The storage processing unit 25 _k of data manipulation unit 251 may be realized by hardware such as a semiconductor integrated circuit, or by an application program or program code recorded on a recording medium such as a nonvolatile memory or an optical disk It may be realized. Such a program or program code causes a computer having a processor such as a CPU to execute the processing of the data operation unit 251. Such a program or program code causes a real computer or a virtual computer having a processor such as a CPU to execute all or part of the processing of the functional blocks 252 to 254 of the data operation unit 251.

In addition, the storage device 255 includes a recording medium such as a volatile memory or a non-volatile memory (for example, a semiconductor memory or a magnetic recording medium), a circuit and a control program for writing and reading data on the recording medium, Can be configured. The storage area of the storage constituting the storage device 255 may be configured in advance on a predetermined storage area of the recording medium, or may be configured on an appropriate storage area that is allocated during system operation.

The operation of the distributed database management system 10 having the above configuration will be described below.

FIG. 4 is a flowchart schematically showing a procedure of transaction processing by the data operation unit 251 of the storage processing unit 25 _k . Referring to FIG. 4, in the data operation unit 251, the query analysis unit 252 analyzes the query given from the queue unit 250 (step S10). At this time, the query analysis unit 252 optimizes the query according to the structure of the partial database stored in the storage device 255 based on the analysis result, and generates an execution plan.

Thereafter, the transaction execution unit 253 determines whether a data set necessary for executing the transaction is stored in the partial database in the storage apparatus 255 (step S11).

When it is determined that the data set necessary for executing the transaction is stored in the partial database in the storage apparatus 255 (NO in step S11), the transaction execution unit 253 executes the execution plan generated by the query analysis unit 252. By executing a transaction in accordance with the data, data operations such as data search, insertion, update or deletion for the partial database are performed (step S12). Here, the transaction means one unit of work including processing such as search and update of the database 41, and is called atomicity (ATOMICITY), consistency (CONSISTENCY), isolation (ISOLATION), and durability (DURABILITY). This process satisfies the ACID characteristics. When the transaction process ends normally (YES in step S13), the transaction is committed (step S14). Then, the transaction execution unit 253 transmits the transaction execution result (query result) to the router 24 (step S17).

On the other hand, when a transaction or system failure occurs and the transaction does not end normally (NO in step S13), the transaction execution unit 253 executes roll forward (step S15). That is, the transaction execution unit 253 confirms log information in a period from a certain time point of a regularly set check point to a failure time point. If there is an uncommitted transaction during this period, the transaction execution unit 253 reflects the execution result of the transaction in the partial database based on the log information. In addition, the transaction execution unit 253 returns the state of the partial database to the state before starting the processing of the uncommitted transaction, that is, rolls back (step S16). Thereafter, the transaction execution unit 253 transmits the transaction execution result (query result) to the query server 20A via the router 24 (step S17). The query server 20A transmits this query result to the client terminal T1 via the load balancer 11.

On the other hand, when the transaction execution unit 253 determines in step S11 that the data set necessary for executing the transaction is not stored in the partial database in the storage apparatus 255 (YES in step S11), the internal query issuance is issued. A data acquisition request for the data set is issued to the unit 254. In response to this data acquisition request, the internal query issuing unit 254 generates an internal query (step S20), and issues a data transfer request for the data set to the router 24 (step S21). Here, the data transfer request includes an internal query. The internal query may be described in a database language that defines data operations such as data search, insertion, update, or deletion of data in the database, or a format that can be executed in the system (for example, an analysis tree such as an AST format). It may be described in a format or a series of processing procedures consisting of microinstructions).

For example, the storage processing unit 25 _1, when the data transfer request is issued from the internal query issuing unit 254 (step S21), and the router 24, another storage unit 25 ₂ of the data transfer request to the data server 22 in ₁ To 25 _M or the router 24 of another data server 22 ₂ to 22 _N. If the router 24 transfers the data transfer request to the other storage processor ₂₅ 2 ~ 25 _M of the data server 22 _1, the storage processing unit ₂₅ 2 ~ 25 _M, respectively, in response to the data transfer request, data The operation unit 251 performs transaction processing based on the internal query on the partial database managed by the operation unit 251 to perform data operation (mainly search operation).

Figure 5 is a flowchart schematically illustrating a processing procedure by the data manipulation unit 251 which has received the data transfer request from the storage unit 25 _1. Referring to FIG. 5, first, the query analysis unit 252 analyzes the internal query given from the queue unit 250 (step S30). At this time, the query analysis unit 252 optimizes the internal query according to the structure of the partial database stored in the storage device 255 based on the analysis result, and generates an execution plan.

Thereafter, the transaction execution unit 253 performs data operations on the partial database by executing a transaction according to the execution plan generated by the query analysis unit 252 (step S31). When the transaction process ends normally (YES in step S32), the transaction is committed (step S33).

The transaction execution section 253 transmits the transaction execution result (query result) through the router 24 to the storage processing unit 25 ₁ (step S36). That is, the transaction execution section 253, when the storage device 255 has successfully acquired the data set transfers the data set via a router 24 to the storage processing unit 25 _1. On the other hand, the data manipulation unit 251, when the storage device 255 fails to get the data set, that it has failed to acquire the data set through the router 24 notifies the storage processing unit 25 _1.

On the other hand, when a failure occurs in the transaction or system and the transaction does not end normally (NO in step S32), the transaction execution unit 253 executes roll forward (step S34), and further executes rollback. (Step S35). Thereafter, the transaction execution section 253 transmits the transaction execution result (query result) through the router 24 to the storage processing unit 25 ₁ (step S36).

Returning to the flowchart of FIG. 4, the storage processing unit 25 _1, when an internal query issuing unit 254 has successfully acquired the data set from one of the storage processing unit ₂₅ 2 ~ 25 _M (YES in step S22) is The transaction execution unit 253 executes a transaction using the data set (step S12). Thereafter, the above steps S13 to S17 are executed.

On the other hand, the storage processing unit 25 _1, when an internal query issuing unit 254 fails to get the data set (NO in step S22), the transaction execution section 253, the query result comprising indicating a failure of the execution of data manipulation Is sent to the query server 20A via the router 24. The query server 20A transmits this query result to the client terminal T1 via the load balancer 11.

The query result is transmitted to the client terminal T1 via any one of the

query servers

20A, 20B, and 20C. At this time, since the query server also transmits the query result to the management server 30, the management server 30 can update the management table 30T based on the query result.

Next, various communication sequences showing the operation of the distributed database management system 10 will be described.

FIG. 6 is a diagram schematically illustrating an example of a communication sequence. Referring to FIG. 6, first, when the query server 20A receives a query from the client terminal T1 via the load balancer 11, the query analysis unit 21A of the query server 20A analyzes the received query, and based on the analysis result. , Convert the incoming query into a parse tree query optimized for a specific database structure. Next, the query analysis unit 21A determines the

data servers

22 _i and 22 _j to which the query is to be transmitted based on the analysis result of the query. The query server 20A transmits the query to the

data servers

22 _i and 22 _j .

In the data server 22 _i , SP (storage processing units) 25 _m ,..., 25 _n data operation units 251 each analyze and optimize a query to generate an execution plan. On the other hand, in the data server 22 _j , similarly, the data operation units 251 of SPs (storage processing units) 25 _q ,..., 25 _r each analyze and optimize a query to generate an execution plan. Here, when the query analysis unit 21A of the query server 20A has already executed query optimization according to the structure of the partial database managed by each data operation unit 251, the data operation unit 251 There is no need for optimization.

Thereafter, in SP25 _m ,..., 25 _n , 25 _q ,..., 25 _r , the transaction execution unit 253 executes a transaction according to the execution plan and performs data manipulation, and the execution result (query Result) to the router 24. The router 24 of the data server 22 _i aggregates the query results received from the SPs 25 _m ,..., 25 _n and transmits them to the query server 20A. On the other hand, the router 24 of the data server 22 _j also aggregates the query results received from the SPs 25 _q ,..., 25 _r and transmits them to the query server 20A. The query server 20A aggregates the query results transmitted from the

data servers

22 _i and 22 _j, and transmits the results to the client terminal T1.

As shown in FIG. 6, the distributed database management system 10 of the present embodiment, a plurality of storage processing unit _{_{25 m, ..., 25 n,}} 25 q, ..., is 25 _r, self each managed Data operations on partial databases can be executed in parallel.

For example, when a query about data operation of tuple (record) insertion, deletion, or update arrives in the distributed database table from the client terminal T1, the storage processing units 25 _m ,..., 25 _n , 25 _q ,. ., 25 _r, respectively, can be executed in cooperation with the data manipulation in parallel to the sub-database managed by itself to the table.

From the client terminal T1 to data manipulation of selection of a table of the distributed database (calculation of extracting a tuple that matches a specific condition from the tuples constituting the table and generating a new table from the extracted tuples) When the query arrives, the storage processing units 25 _m ,..., 25 _n , 25 _q ,..., 25 _r cooperate in parallel with the data operations for the partial database tables that they manage. And can be executed. The query server 20A can configure a new table in which these execution results (query results) are aggregated, and can transmit information on the new table to the client terminal T1. The

routers

24 and 24 of the

data servers

22 _i and 22 _j each have a function of aggregating a plurality of execution results (query results) and transmitting the aggregation results to the query server 20A. If the routers 24 of the

data servers

22 _i and 22 _j aggregate the execution results and transmit the aggregation results to the query server 20A, the query server 20A efficiently uses the aggregation results received from the

routers

24 and 24. Query results can be aggregated.

Further, as shown in FIG. 3, since one partial database stored in the storage device 255 is allocated to each storage processing unit 25 _k , lock (exclusive control) for the partial database is eliminated as much as possible. can do.

Therefore, the distributed database management system 10 can achieve high throughput.

Furthermore, since query optimization is executed by the preceding

query servers

20A, 20B, and 20C of the distributed database management system 10, the subsequent storage processing units 25 ₁ to 25 _M need not necessarily execute query optimization. There is no advantage. Each of the storage processing units 25 ₁ to 25 _M has a function of optimizing the query according to the structure of the partial database managed by itself. If most of the storage processing units 25 ₁ to 25 _M store the partial database structure having the same structure, the

query servers

20A, 20B, and 20C in the previous stage collectively perform optimization according to the partial database structure having the same structure. And can be executed.

Next, FIG. 7 is a diagram schematically showing another example of the communication sequence. First, when the query server 20A receives a query from the client terminal T1 via the load balancer 11, the query analysis unit 21A of the query server 20A analyzes the received query, and identifies the received query based on the analysis result. Convert to parse tree format query optimized for database structure. Next, the query analysis unit 21A determines the

data servers

22 _i and 22 _j to which the query is to be transmitted based on the analysis result of the query. Then, the query server 20A transmits the query to the

routers

24 and 24 of the

data servers

22 _i and 22 _j .

In the data server 22 _i , SP (storage processing units) 25 _m ,..., 25 _n data operation units 251 each analyze and optimize a query to generate an execution plan. On the other hand, in the data server 22 _i , similarly, the data operation units 251 of the SPs (storage processing units) 25 _q ,..., 25 _r each analyze and optimize a query to generate an execution plan. Here, when the query analysis unit 21A of the query server 20A has already executed query optimization according to the structure of the partial database managed by each data operation unit 251, the data operation unit 251 There is no need for optimization.

Thereafter, in SP25 _m ,..., 25 _q ,..., 25 _r , the transaction execution unit 253 executes a transaction according to the execution plan and performs data manipulation, and the execution result (query result) is displayed. Transmit to the router 24.

On the other hand, in SP25 _n , the transaction execution unit 253 determines that the data set necessary for executing the transaction is not stored in the partial database in the storage apparatus 255 (YES in step S11 in FIG. 4). Then, the transaction execution unit 253 issues a data acquisition request for the data set to the internal query issuing unit 254.

For example, the transaction execution unit 253 selects a selection operation (a data operation for extracting a tuple that matches a specific condition and generating a new table from the extracted tuple) or a join operation (join operation: multiple columns). Data operation to create a new table by joining), but the tuples and columns required for the selection and join operations do not exist in the partial table managed by the self, the data of these tuples and columns A set data acquisition request is issued to the internal query issuing unit 254.

As illustrated in FIG. 7, the internal query issuing unit 254 of the SP 25 _n issues an internal query in response to the data acquisition request, and transmits a data transfer request including the internal query to the SP 25 _m via the router 24. . In this case, the SP 25 _m analyzes and optimizes the transferred internal query and executes the data operation. The SP 25 _m can supply the data set obtained by the data operation as a query result to the SP 25 _n via the router 24.

Thereafter, the transaction execution unit 253 of the SP 25 _n executes a data operation using the data set acquired by the internal query issuing unit 254 and transmits the execution result (query result) to the router 24.

As shown in FIG. 8, the internal query issuing unit 254 of the SP 25 _n transmits a data transfer request including the internal query to the SP 25 _q of the data server 22 _j via the router 24 in response to the data acquisition request. May be. In this case, SP25 _q performs data manipulation by analyzing and optimizing the transferred internal query. Then, the SP 25 _q can supply the query result to the SP 25 _n via the router 24.

Then, as shown in FIG. 7, the router 24 of the data server 22 _i aggregates the query results received from the SPs 25 _m ,..., 25 _n and transmits them to the query server 20A. On the other hand, the router 24 of the data server 22 _j also aggregates the query results received from the SPs 25 _q ,..., 25 _r and transmits them to the query server 20A. The query server 20A aggregates the query results transmitted from the

data servers

22 _i and 22 _j, and transmits the results to the client terminal T1.

As shown in FIG. 7 and FIG. 8, in the distributed database management system 10 of this embodiment, the storage processing unit 25 _n of the data server 22 _i uses a data set that is insufficient to execute data operations to other storages. It can be acquired from the processing unit 25 _m (FIG. 7) or the storage processing unit 25 _q (FIG. 8). Since the storage processing unit 25 _n executes data operations using the acquired data sets, distributed processing can be efficiently executed in the entire storage processing units 25 ₁ to 25 _M. Therefore, even when there is a deficient data set, the distributed database management system 10 can achieve high throughput.

FIG. 9 is a diagram schematically showing still another example of the communication sequence. In the communication sequence of FIG. 9, when there is an insufficient data set when the storage processing unit 25 _n executes a data operation, the router 24 of the data server 22 _i sends a data transfer request (internal query) to the data server. At the same time as being transferred to the storage processing unit 25 _m in 22 _i , it is also transferred to the router 24 in another data server 22 _j . The router 24 in the data server 22 _j transfers the data transfer request (internal query) to the storage processing unit 25 _q according to the routing table RTL. At this time, the data transfer request may be transferred to a plurality of storage processing units 25 _q ,..., 25 _r . As illustrated in FIG. 9, the storage processing unit 25 _n acquires data sets that are query results from the

storage processing units

25 _m and 25 _q , and executes data operations using these data sets.

FIG. 10 is a diagram schematically showing still another example of the communication sequence. In the communication sequence of FIG. 10, when there is an insufficient data set when the storage processing unit 25 _n executes a data operation, the router 24 of the data server 22 _i sends a data transfer request (internal query) to an external data At the same time as being transferred to the router 24 in the server 22 _j , it is also transferred to the router 24 in the external data server 22 _k . The router 24 in the data server 22 _j transfers the data transfer request (internal query) to the storage processing unit 25 _q according to the routing table RTL. In parallel, the router 24 in the data server 22 _k transfers the data transfer request (internal query) to the storage processing unit 25 _t according to the routing table RTL.

Thereafter, as shown in FIG. 10, the

storage processing units

25 _q and 25 _t transmit the data sets as query results to the storage processing unit 25 _n in the data server 22 _i via the

routers

24 and 24, respectively. . The storage processing unit 25 _n acquires data sets that are query results from the

storage processing units

25 _q and 25 _t, and executes data operations using these data sets.

By the way, FIG. 7 shows a mode in which only one storage processing unit 25 _m transmits a deficient data set to the storage processing unit 25 _n in the data server 22 _i . However, the configuration is limited to this mode. is not. In the data server 22 _i , there may be a form in which a plurality of storage processing units 25 _m ,..., 25 _u transmit a deficient data set to the storage processing unit 25 _n . In this case, the router 24 of the data server 22 _i aggregates the deficient data sets transmitted from the plurality of storage processing units 25 _m ,..., 25 _u to form a new table. It has a function of transmitting to the storage processing unit 25 _n datasets table via a router 24. As will be described later, the partial database can be composed of a group of actual data stored in the storage area of the storage device 255, a reference table, and a plurality of intermediate identifier tables (see FIGS. 14 to 15). When a new table is configured by aggregating data sets of this kind of partial database, entity data having the same value is not transferred redundantly, so that it is possible to reduce the amount of data transferred within the same data server 22 _i . Become.

In the case of FIG. 8, the data server 22 _j shows a mode in which only one storage processing unit 25 _q transmits a deficient data set to the storage processing unit 25 _n via the router 24 of the data server 22 _i . However, it is not limited to this form. In the data server 22 _j , a configuration in which a plurality of storage processing units 25 _q ,..., 25 _r transmit a deficient data set to the storage processing unit 25 _n via the

routers

24, 24 of the

data servers

22 _j , 22 _i. There is also a possibility. In this case, the router 24 of the data server 22 _j aggregates the deficient data sets transmitted from the plurality of storage processing units 25 _q ,..., 25 _r to form a new table, and creates the new table. It has a function of transmitting to the storage processing unit 25 _n datasets table via a router 24. When sub-database shown in FIG. 14 is used, it is possible to reduce the amount of data transfer between the data server 22 _j, 22 _i by the router 24 of the data server 22 _j to aggregate data set of sub-databases.

In the case of FIG. 9, an insufficient data set is transmitted from the storage processing unit 25 _m in the data server 22 _{i to} the storage processing unit 25 _n in the data server 22 _i via the router 24, and the storage processing in the data server 22 _j is performed. The deficient data set is also transmitted from the unit 25 _q via the router 24. The router 24 of the data server 22 _i has a function of aggregating these data sets to form a new table and transmitting the data set of the new table to the storage processing unit 25 _n . When sub-database shown in FIG. 14 is used, by the data server 22 _i of the router 24 is to aggregate the data sets of the partial database, the amount of data transferred from the router 24 of the data server in 22 _i to the storage unit 25 _n Can be reduced. In the case of FIG. 10, the storage processing unit 25 _n of the data server 22 _i may receive the storage processing unit ₂₅ q, 25 respectively missing data set from _t in two data servers ₂₂ j, 22 _k through the router 24 To do. In this case, when the sub-database shown in FIG. 14 is used, by the data server 22 _i of the router 24 is to aggregate the data set of sub-databases, from the router 24 of the data server in 22 _i storage unit to 25 _n The amount of data transfer can be reduced.

In addition, when there are a plurality of deficient data sets, the storage processing unit 25 _n may execute the data operation after acquiring all of the data sets, or acquire only a part of the data sets. At this stage, data operation using the part may be executed. In the communication sequence of FIG. 9, the storage processing unit 25 _n executes data operations after acquiring all of the data sets that are query results from the storage processing unit 25 _m and the storage processing unit 25 _q , respectively. Instead, the storage processing unit 25 _n may execute a data operation using only the first data set immediately after acquiring the first data set from the storage processing unit 25 _m , and then perform storage processing. After obtaining the second data set from the unit 25 _q , the data operation using the second data set may be executed.

Next, a preferred example of the structure of the partial database constituting the distributed database will be described below.

FIG. 11 is a diagram schematically showing an example of the structure of the partial database. As shown in FIG. 11, this partial database structure includes a substantial data group stored in the storage area DA0 in the storage apparatus 255 and a reference stored in a storage area different from the storage area DA0 in the storage apparatus 255. Table (identifier table) RT0.

The reference table RT0 has five tuples defined in the row direction and five attribute fields TID, Val1, Val2, Val3, and Val4 defined in the column direction. In the first embodiment, for convenience of explanation, the number of tuples in the reference table RT0 is five. However, the number is not limited to this, and the number of tuples can be set to several tens to several millions, for example. . The number of attribute fields TID, Val1, Val2, Val3, and Val4 is not limited to five.

Unique tuple identifiers (TID) R1, R2, R3, R4, and R5 are assigned to the five tuples of the reference table RT0, respectively. Data identifiers VR11, VR12,..., VR43 each having a fixed length in an area defined by these tuples and attribute fields Val1, Val2, Val3, Val4 (area where the tuples and attribute fields Val1, Val2, Val3, Val4 intersect). Is stored. That is, the attribute field Val1 includes data identifiers VR11, VR12, VR13, VR14, and VR15 in areas corresponding to the tuple identifiers R1, R2, R3, R4, and R5, respectively. The attribute field Val2 includes the tuple identifiers R1, R2, and R3. , R4, R5 include data identifiers VR21, VR22, VR23, VR23, VR24, respectively, and attribute field Val3 includes data identifiers VR31, VR3, R4, R5, R4, R5, respectively. VR32, VR33, VR34, and VR35, and attribute field Val4 includes data identifiers VR41, VR41, VR41, VR42, and VR43 in areas corresponding to tuple identifiers R1, R2, R3, R4, and R5, respectively.

The values of the data identifiers VR11 to VR43 can be calculated using a hash function. The hash function is an operator that outputs a fixed-length bit string in response to the bit string input of the actual data. The output value (hash value) of this hash function may be used as the values of the data identifiers VR11 to VR34. The transaction execution unit 253 converts the search character string into a hash value, searches the reference table RT0 for a data identifier having a value that matches the hash value, and stores entity data corresponding to the found data identifier from the storage area DA0. You can find out. At this time, the transaction execution unit 253 searches the reference table RT0 including only the fixed-length data group not including the variable-length data, so that the character string can be searched at high speed.

As the names (attribute names) of the attribute fields Val1, Val2, Val3, for example, “Store name”, “Region”, “Sales”, “Year / month” can be set. The database structure shown in FIG. 11 can be generated from a real table that is a collection of entity data. FIG. 12 is a diagram schematically illustrating an example of the real table ST. By fixing the entity data such as “Store A”, “Store B”, “Kyushu”, etc. in the real table ST of 5 rows and 4 columns (converting the value of the entity data into a hash value), the fixed data shown in FIG. Long data identifiers VR11, VR12,..., VR34 can be generated.

The data identifiers VR11 to VR43 each have a value that uniquely represents the actual data in the storage area DA0. Therefore, the transaction execution unit 253 can search the data identifiers VR11 to VR43, and can access variable-length entity data corresponding to the data identifiers VR11 to VR43 based on the search result. In this specification, “substantially unique” means that the uniqueness in data operation for the partial database is satisfied.

FIG. 13 (A) and FIG. 13 (B) are diagrams showing a logical data structure constituting the partial database. The data structure shown in FIG. 13A has a header area at the beginning and an allocation management table at the end. In addition, an area for storing the entity data group is provided between the header area and the allocation management table.

FIG. 13B is a schematic diagram illustrating an example of a conversion table included in the header area. This conversion table is a table that defines the correspondence between the data identifiers VR11 to VR43 and the storage areas of these data identifiers VR11 to VR43. In this conversion table, as shown in FIG. 13B, an area Fid in which data identifiers VR11 to VR34 are stored, and position data A11 to A43 indicating storage areas of these data identifiers VR11 to VR34 are stored. Region Fa is provided.

As shown in FIG. 11, since the storage area DA0 of the entity data D11 to D43 and the storage areas of the data identifiers VR11 to VR43 each uniquely representing the entity data D11 to D43 are completely separated from each other, It is possible to improve the efficiency of database update processing, improve search speed, and improve portability.

For example, when a part of the actual data group in the storage area DA0 is updated, added, or deleted, the reference table RT0 and the conversion table in FIG. Can do. As the entity data is updated, added, or deleted, the partial database is updated to the minimum necessary. Therefore, even when the partial database is frequently updated, it is possible to execute such update efficiently and at high speed. .

In the conversion table of FIG. 13B, duplication of data identifiers having the same value is eliminated (that is, the values of any two data identifiers in the conversion table are always different). By using, entity data having the same value can be stored in the storage area DA0 without overlapping. In other words, since the entity data group constituting the partial database can be compressed and stored in the storage area DA0, the storage area DA0 can be used efficiently.

Next, another preferred example of the partial database structure will be described below.

FIG. 14 is a diagram schematically showing the structure of the partial database. As shown in FIG. 14, this database structure includes an entity data group stored in the storage area DA3 of the storage device 255, a reference table RT1 stored in a storage area different from the storage area DA3, and the first to first data. 3 intermediate identifier tables IT41, IT42, IT43.

FIG. 15A is a diagram showing a schematic configuration of the reference table RT1. The reference table RT1 has a plurality of tuples defined in the row direction, and four attribute fields TID, Col1Ref, Col2Ref, and Col3Ref defined in the column direction. For example, the number of tuples in the reference table RT1 can be set to several tens to several millions. The number of attribute fields TID, Col1Ref, Col2Ref, and Col3Ref is not limited to four.

Unique tuple identifiers (TID) R1, R2, R3, R4,... Are assigned to the tuples of the reference table RT1, respectively. CRV11, CRV12,..., CRV31,..., CRV31,..., CRV31,. Is stored. The values of the reference identifiers CRV11 to CRV31 can be calculated using the same hash function as that of the data identifier of the first embodiment. That is, the output values of the hash functions for the inputs of the data identifiers VR11 to VR31 may be the values of the reference identifiers CRV11 to CRV31, respectively.

FIGS. 15B to 15D are diagrams schematically showing the structures of the first to third intermediate identifier tables IT41, IT42 and IT43. The first intermediate identifier table IT41 has a plurality of tuples defined in the row direction and two attribute fields Col1 and Val defined in the column direction. The attribute field Col1 includes fixed-length reference identifiers CRV11, CRV12,..., And the attribute field Val includes fixed-length data identifiers VR11, VR12,.

The second intermediate identifier table IT42 has a plurality of tuples defined in the row direction and two attribute fields Col2 and Val defined in the column direction. The attribute field Col2 includes fixed-length reference identifiers CRV21, CRV22,..., And the attribute field Val includes fixed-length data identifiers VR21, VR22,.

The third intermediate identifier table IT43 has a plurality of tuples defined in the row direction and two attribute fields Col3 and Val defined in the column direction. The attribute field Col3 includes fixed-length reference identifiers CRV31, CRV32,..., And the attribute field Val includes fixed-length data identifiers VR31, VR32,.

Each of the first to third intermediate identifier tables IT41, IT42, IT43 does not have a plurality of reference identifiers having overlapping values (that is, the values of any two reference identifiers in each intermediate identifier table are Therefore, it has a data structure that eliminates redundancy. In other words, each of the intermediate identifier tables IT41, IT42, IT43 is a table that prescribes a one-to-one correspondence between the reference identifier and the data identifier so as to eliminate duplication of the correspondence. As shown in FIG. 15A, reference identifiers CRV12, CRV12, CRV11, CRV11,... Are stored in the column of the attribute field Col1Ref of the reference table RT1. As shown in FIG. 15B, the intermediate identifier table IT41 corresponding to the attribute field Col1Ref includes these reference identifiers CRV12, CRV12, CRV11, CRV11,..., And data identifiers VR12, VR12, VR11, VR11,. It is a table that defines the correspondence with .. In the intermediate identifier table IT41, the overlapping correspondence relationship is excluded (for example, the correspondence relationship between the reference identifier CRV12 and the data identifier VR12 is not defined redundantly). Similarly, as shown in FIG. 15C and FIG. 15D, the intermediate identifier table IT42 corresponding to the attribute field Col2Ref and the intermediate identifier table IT43 corresponding to the attribute field Col3Ref are respectively duplicated correspondences. The relationship has been eliminated.

The transaction execution unit 253 can search the reference identifiers CRV11 to CRV33 and the data identifiers VR11 to VR33, and use this search result to access variable-length entity data. Since the storage area DA3 has a conversion table similar to the conversion table shown in FIG. 13A, the transaction execution unit 253 can access the entity data based on the search result.

As described above, each of the first to third intermediate identifier tables IT41, IT42, IT43 has a data structure excluding redundancy. Accordingly, the storage processing unit 25 _n of the data server 22 _i lacks a data set for executing a data operation, and the storage processing unit 25 _n has a storage processing unit 25 _m (FIG. 7) having a partial database having the structure of FIG. ) And the storage processing unit 25 _q (FIG. 8), if the intermediate identifier tables IT41, IT42, IT43 are used, it is not necessary to repeatedly transfer data sets having the same value. There is an advantage that the transfer amount of the set can be reduced.

For example, the storage processing section 25 _m, if a one column data transfer request of the data set of the attribute field Col1Ref reference table RT1 of FIG. 15 (A), the storage processing portion 25 _m is a fixed length The reference identifiers CRV12, CRV12, CRV11, CRV11,... Are transmitted, and the reference identifiers CRV11, CRV12,... And the entity data D11, D12,. And send it. In this case, the values of the reference identifiers CRV12, CRV12, CRV11, CRV11,. The transfer amount is small.

The intermediate identifier tables IT41, IT42, IT43 are each configured in units of columns. Therefore, when the storage processing unit 25 _i executes a join operation (join operation: a data operation for joining a plurality of columns to generate a new table), there are other insufficient data sets necessary for the join operation. Even when data is transferred from the storage processing unit 25 _j to the storage processing unit 25 _i , there is an advantage that the data transfer amount can be reduced.

All of the storage processing units 25 ₁ to 25 _M may use the same hash function for calculating the reference identifier or the data identifier, or may use different hash functions. However, when a different hash function is used for each storage processing unit, for example, there is a possibility that the hash value of the data identifier or the reference identifier differs for the entity data having the same value between the

storage processing units

25 _q and 25 _r. is there. As described above, the router 24 has a function of aggregating data sets transferred from the plurality of

storage processing units

25 _q and 25 _r to form a new table. When performing this aggregation, the router 24 has a function of adjusting inconsistencies between the data identifier and the reference identifier. FIG. 16 is a diagram for explaining the aggregation / adjustment function of the router 24.

As shown in FIG. 16, a storage processing unit of the data server 22 _j ₂₅ q, ₂₅ _r are each dataset in response to a data transfer request from the storage unit 25 _n of the data server 22 _i DSa, router 24 DSb Send to. One data set DSa is data of tables RTa, Ca1, and Ca2 as shown in FIG. 16, and the other data set DSb is data of tables RTb, Cb1, and Cb2, as shown in FIG. The router 24 of the data server 22 _j aggregates the data sets DSa and DSb to form new tables RTd, Cd1 and Cd2, and the data sets DSd of the new tables RTd, Cd1 and Cd2 are stored in the data server 22 _i. Forward to.

The reference table RTa has the same structure as the reference table RT1 shown in FIG. The table Ca1, Ca @ 2 is configured with an intermediate identifier table storage processing unit 25 _q. The table Ca1 is a table that defines a one-to-one correspondence between the reference identifiers CRV11, CRV12, and CRV13 and the entity data values “AA”, “AB”, and “AC”, and the table Ca2 is the reference identifier CRV21. It is a table which prescribes | regulates the one-to-one correspondence between the value "AD" of entity data. Similarly, the reference table RTb has the same structure as the reference table RT1 shown in FIG. Table Cb1, Cb2 is configured with an intermediate identifier table storage processing unit 25 _r. The table Cb1 defines a one-to-one correspondence between the reference identifiers CRV11 and CRV12 and the entity data values “BA” and “AA”, and the table Cb2 includes the reference identifier CRV22 and the entity data value “AD”. It is a table which prescribes | regulates the one-to-one correspondence with ".

As shown in FIG. 16, different reference identifiers CRV11 and CRV12 are used for the same entity data value “AA” in the table Ca1 and the table Cb1. Further, the table Ca2 and the table Cb2 use different reference identifiers CRV21 and CRV22 for the same entity data value “AD”. In such a case, when the router 24 aggregates the data sets DSa and DSb to form the reference table RTd and the tables Cd1 and Cd2, the router 24 is unique with respect to the same entity data value “AA”. A reference identifier CRV11 is assigned, and a unique reference identifier CRV21 is assigned to the same entity data value “AD”. Thereby, the mismatch of a reference identifier can be eliminated.

More specifically, for example, the following procedure can be adopted. First, the router 24 checks the inconsistency of the reference identifier for the same actual data value between the data sets DSa and DSb. The results of this inspection, if there is inconsistency of the reference identifier, the router 24 may use a hash function that is used by the storage processing unit 25 _q, 25 one of the storage processing unit 25 _q of _r The reference identifiers of the tables RTb, Cb1, and Cb2 are updated. At this time, the router 24 may create a hash value conversion table and update the reference identifiers of the tables RTb, Cb1, and Cb2 according to the conversion table. Then, the router 24 aggregates the updated tables RTb, Cb1, Cb2 and the tables RTa, Ca1, Ca2 to form new tables RTd, Cd1, Cd2. Thereafter, the tables RTb, Cb1, Cb2 and the tables RTa, Ca1, Ca2 are discarded.

As described above, the embodiments of the present invention have been described with reference to the drawings. However, these are exemplifications of the present invention, and various configurations other than the above can be adopted. For example, the above-described embodiment has a configuration suitable for executing a transaction on a distributed database, but is not limited to this. As described above, a transaction is a process that satisfies the ACID characteristics, but the present invention can also be applied to a data operation when any of these ACID characteristics is not satisfied.

In the above embodiment, the distributed database management system 10 includes the three

query servers

20A, 20B, and 20C as shown in FIG. 1, but is not limited thereto. Each of the data servers 22 ₁ to 22 _N has a plurality of storage processing units 25 ₁ to 25 _M. However, the data server 22 _i is not limited to this, and any one of the data servers 22 _i has a single storage processing. You may have a part. The basic functions of the data servers 22 ₁ to 22 _N are the same, but the hardware configuration incorporated in the data servers 22 ₁ to 22 _N is not necessarily the same.

Further, as described above, the router 24 has a function of aggregating a plurality of query results (data sets), but the router 24 may not execute this aggregation in order to reduce processing time. .

This application claims priority based on Japanese Patent Application No. 2009-040777 (filing date: February 24, 2009) filed with the Japan Patent Office, the entire disclosure of which is incorporated herein by reference. Incorporation “herein” by “reference”.

Claims

A distributed database management system for performing data operations on a distributed database,
A query reception unit for receiving a query;
A plurality of storage processing units that cooperate to execute data operations on the distributed database based on the received query;
With
Each of the plurality of storage processing units
A storage device storing one of a plurality of partial databases constituting the distributed database;
A data operation unit that performs a data operation based on the query for the partial database stored in the storage device;
A distributed database management system.
The distributed database management system according to claim 1,
The data operation unit of the first storage processing unit among the plurality of storage processing units, when a data set necessary for executing a data operation based on the query is not stored in its own partial database, Issuing a data transfer request for the data set to one or more second storage processing units different from the first storage processing unit among the plurality of storage processing units,
The distributed database management system, wherein the data operation unit of the second storage processing unit acquires the data set from its own partial database and transfers it to the first storage processing unit in response to the data transfer request.
The distributed database management system according to claim 2,
The router further performs routing between the plurality of storage processing units and the query receiving unit, and further controls a data transfer between any storage processing units of the plurality of storage processing units,
A distributed database that aggregates the data sets transferred from the plurality of second storage processing units to form a new table, and transfers the data sets of the new table to the first storage processing unit; Management system.
The distributed database management system according to claim 2 or 3,
The data operation unit of the first storage processing unit generates an internal query as the data transfer request,
The distributed data base management system, wherein the data operation unit of the second storage processing unit acquires the data set by executing a data operation based on the internal query on its own partial database.
5. The distributed database management system according to claim 1, wherein the query is one or more data selected from search, insertion, update, and deletion of data in the database. A distributed database management system written in a database language that regulates operations.
6. The distributed database management system according to claim 5, wherein the data operation unit is
A query analyzer that analyzes internal queries;
A transaction execution unit that performs the data operation by executing a transaction based on the analysis result by the query analysis unit;
A distributed database management system.
The distributed database management system according to claim 6, wherein the query analysis unit performs optimization on the internal query according to a data structure of the partial database stored in the storage device. Database management system.
The distributed database management system according to claim 1, wherein the query reception unit includes a query analysis unit that analyzes and optimizes the received query. system.
The distributed database management system according to any one of claims 1 to 8,
The partial database is
Multiple entity data,
An identifier table storing a fixed-length data identifier that uniquely represents the entity data itself in an area defined by at least one tuple defined in a row direction and at least one attribute field defined in a column direction;
A conversion table representing a correspondence relationship between position data indicating a storage area of each of the plurality of entity data and the plurality of data identifiers;
A distributed database management system.
10. The distributed database management system according to claim 9, wherein a storage area allocated to the identifier table and a storage area allocated to the entity data are different from each other.
11. The distributed database management system according to claim 9 or 10, wherein the value of the data identifier is an output value of a hash function that outputs a fixed-length bit string with respect to the input of the entity data. .
The distributed database management system according to any one of claims 9 to 11,
There are a plurality of the identifier tables,
The partial database further includes a reference table having a set of reference identifiers each uniquely representing a data identifier in the plurality of identifier tables;
The distributed data base management system, wherein the data operation unit executes the data operation using the reference table and the identifier table.
The distributed database management system according to claim 12, wherein each identifier table defines a one-to-one correspondence between the reference identifier and the data identifier so as to eliminate duplication of the correspondence. Distributed database management system.
A plurality of storage processing units that cooperatively execute data operations on the distributed database based on the query, and each of the storage processing units stores one of a plurality of partial databases constituting the distributed database; A distributed database management method in a distributed database management system configured to include a storage device,
(A) In the first storage processing unit among the plurality of storage processing units, when a data set necessary for executing a data operation based on the query is not stored in the partial database, the plurality of storage units Issuing a data transfer request for the data set to one or a plurality of second storage processing units different from the first storage processing unit among the processing units;
(B) in the second storage processing unit, acquiring the data set from the partial database in response to the data transfer request, and transferring the data set to the first storage processing unit;
(C) in the first storage processing unit, executing the data operation using the data set transferred from the second storage processing unit;
A distributed database management method comprising:
The distributed database management method according to claim 14, comprising:
In the step (a), an internal query is generated as the data transfer request,
In the step (b), a distributed database management method in which the data set is acquired by executing a data operation based on the internal query on the partial database.
16. The distributed database management method according to claim 15, further comprising the step of executing optimization on the internal query according to the data structure of the partial database stored in the storage device. .
The distributed database management method according to any one of claims 14 to 16, further comprising a step of receiving the query and analyzing and optimizing the received query. .
A distributed database management method according to any one of claims 14 to 17, comprising:
The partial database is
Multiple entity data,
An identifier table storing a fixed-length data identifier that uniquely represents the entity data itself in an area defined by at least one tuple defined in a row direction and at least one attribute field defined in a column direction;
A conversion table representing a correspondence relationship between position data indicating a storage area of each of the plurality of entity data and the plurality of data identifiers;
A distributed database management method.
The distributed database management method according to claim 18, comprising:
There are a plurality of the identifier tables,
The partial database further includes a reference table having a set of reference identifiers each uniquely representing a data identifier in the plurality of identifier tables;
The distributed database management method, wherein the data operation is executed using the reference table and the identifier table.
The distributed database management method according to claim 19, wherein each identifier table defines a one-to-one correspondence between the reference identifier and the data identifier so as to eliminate duplication of the correspondence. Distributed database management method.