WO2014000825A1

WO2014000825A1 - A method for maintaining one or more table values for a number of tables in a database system

Info

Publication number: WO2014000825A1
Application number: PCT/EP2012/062770
Authority: WO
Inventors: Ulrich Arndt; Lars HAMANN
Original assignee: DATA2KNOWLEDGE GmbH
Current assignee: DATA2KNOWLEDGE GmbH
Priority date: 2012-06-29
Filing date: 2012-06-29
Publication date: 2014-01-03
Anticipated expiration: 2014-12-29

Description

A method for maintaining one or more table values for a number of tables in a database system

Description

The invention refers to a method for maintaining one or more table values for a number of tables in a database system.

Information in database systems is stored based on tables comprising data entries for a plurality of rows and at least one column. It happens quite often that data is replicated in different instances of one database system or in different database systems. Hence, there is a need to compare the replicated data in an easy and efficient manner.

Document US 5,819,268 describes a method and system for comparing tables of a database where a predetermined function is applied to each row in a table. The resulting functional values of the rows are combined by the XOR operator resulting in a table value. The contents of two tables can be compared in an easy way by comparing the corresponding table values. Document US 5,819,268 does not disclose how a table value can be efficiently updated in case that changes are performed in the corresponding table. It is an object of the invention to provide a method for maintaining one or more table values for a number of tables in a database system, in which table values representing the content of the tables are updated in an easy and efficient way.

This object is solved by the method according to claim 1 and the database system according to claim 13.

The method of the invention maintains one or more table values for a number of ta- bles in a database system. Each table of the number of tables comprises data entries for a plurality of rows and a number of columns. A table value for each table is stored in the system, said table value being a concatenation of functional values for each row by an operator being associative, commutative and having an inverse. The functional values are the result of applying a predetermined function to row data, said row data being based on the data entries of the columns for the corresponding row. In a preferred embodiment, the method is used for a relational database system. Particularly, the method of the invention is applied to an Oracle database system or the IBM database system db2. Moreover, the method may also be applied to open source database systems, like mySQL. However, the method of the invention may also be used in any other database system.

The table value pre-stored for a respective table is designed such that tables which have the same data entries - irrespective of the sorting of the rows - have the same table value. Furthermore, table entries referring to tables with at least one different data entry shall have in all cases or in most cases different table values. To do so, the predetermined function used for calculating the functional values for each row shall be substantially collision free. Preferably, a hash function, particularly SHA-1 or MD5, and/or an encryption function, particularly AES256 or DES, are used as predetermined functions. In order to update a table value, the following steps are performed in the method of the invention: i) In case that a row is added to a respective table, a functional value for the added row is determined by applying the predetermined function to the row data of the added row, where the functional value for the added row is concatenated with the table value by the operator. ii) In case that a row is deleted from a respective table, the functional value for the deleted row (before deletion) is concatenated with the table value by the inverse of the operator. iii) In case that a row is changed in a respective table, a functional value for the changed row is determined by applying the predetermined function to the row data of the changed row, where the functional value for the row before change is concatenated with the table value by the inverse of the operator and where the functional value for the changed row is concatenated with the table value by the operator.

In the method of the invention, there is no particular order in which the operations of different steps i) to iii) or within one of the steps i) to iii) are to be performed. It is only essential that the operations are performed for the table values, irrespective whether any intermediate steps are performed which change the table values. E.g. in step iii), after concatenating the functional value for the row before change with the table value by the inverse of the operator, other operations referring to steps i) and ii) or operations for other changed rows may be performed before the functional value for the changed row is concatenated with the table value by the operator.

In a particularly preferred embodiment, the functional value for each row in a respective table is the bit sequence, where the operator is the XOR operator combining corresponding bits of two bit sequences and/or where the operator is the addition modu- lo (2 ^{1 eng} ) operator combining two bit sequences, where bitlength refers to the bitlength of a bit sequence. The XOR operator is an easy and straightforward implementation of an operator being associative, commutative and having an inverse. Particularly, the inverse of the XOR operator is the XOR operator itself.

In another embodiment of the invention, the row data to which the predetermined function is applied comprises the data entries of each row. However, each data entry in a row may also be converted in a predetermined and preferably standardised data format, where the row data comprises the converted data entries. By using such a predetermined data format, it is possible to compare tables in different data formats, e.g. tables from different database systems. In one embodiment, the data entries represent numeric values and/or dates (according to ISO) and/or time stamps (according to ISO) which are converted in corresponding strings and/or the data entries represent bytes which are converted in a base64 encoding. E.g., the data entries may be integers which are converted in corresponding number strings. Particularly, it may happen that integers which are usually stored as 4 byte values have a sign information which is stored differently in tables of different database systems. E.g., the sign is stored in the first bit of the integer in one system and in the last bit of the integer in the other system. By converting those integers in number strings, a universal transformation of the data content is achieved such that the contents of the tables of different database systems can be compared based on the table values.

In another embodiment of the invention, the functional value for the deleted row in step ii) and/or the functional value for the row before change in step iii) are deter- mined in the respective steps ii) and/or iii) by applying the predetermined function to the row data of the deleted row and/or the row before change. However, it is also possible that the functional value for the deleted row and/or the functional value for the row before change in the respective steps ii) and/or iii) are pre-stored in the database system. Hence, the method can use those pre-stored functional values in steps ii) and/or iii) and needs not determine those values. In another embodiment of the invention, the number of tables includes one or more tables which are partial tables of a bigger table, where a partial table includes a subset of rows and/or columns of the bigger table. Hence, for each partial table, steps i) to iii) of the inventive method are performed. In one embodiment, each partial table is stored in a different logical or physical memory partition of the database system. Preferably, a partial table only includes data entries referring to so-called core content of the bigger table. Core content refers to the real logical information contained in a table and differs from the technical content and metadata included in the table. Technical content belongs to the table maintenance process. An example of technical content is a timestamp field which stores the information when a row was last modified by a process. Metadata refers to generic information with respect to the table, e.g. to the table layout (columns, column data types, valid values, etc.). The table value updated in the method of the invention may be stored in the metadata of the table.

In another embodiment of the invention, the bigger table for which one or more partial tables exist is distributed over several subentities in the database system, each subentity including a partial table of the bigger table, each partial table comprising a subset of rows of the bigger table and all partial tables comprising the same columns. Hence, the method of the invention may be used in a parallel database system including several subentities. A subentity refers to a separate computing unit processing a part of the data of the database system.

In order to provide a table value for a bigger table distributed over several subenti- ties, an overall table value for the bigger table is calculated in a preferred embodiment of the invention by concatenating the table values for the partial tables by the operator. This concatenation is preferably performed by a central processing unit in the database system. However, the concatenation may also be done by the subentities in parallel. Besides the above method, the invention also refers to a database system including one or more table values for a number of tables, where each table of the number of tables comprises data entries for a plurality of rows and a number of columns, where a table value is stored for each table in the database system, said table value being a concatenation of functional values for each row by an operator being associative, commutative and having an inverse, said functional values being the result of applying a predetermined function to row data, said row data being based on the data entries in the corresponding row. The database system includes a computing means configured to calculate an updated table value for each table based on steps i) to iii) according to the method of the invention. In a preferred embodiment, the database system is configured to perform one or more of the above-described preferred variants of the method according to the invention.

Moreover, the invention refers to a computer program product directly loadable into the internal memory of a digital computer comprising software code portions for performing the method of the invention or one or more preferred embodiments of this method when the product is run on a computer.

Furthermore, the invention refers to a computer program for controlling a computer to perform the method of the invention or one or more preferred embodiments of this method.

Embodiments of the invention will now be described in detail with respect to the accompanying drawings wherein:

Fig. 1 shows a flow chart illustrating the steps performed in one embodiment of the invention;

Fig. 2 shows another flow chart illustrating the concatenation of table hashes to a final table hash according to an embodiment of the invention; Fig. 3 shows the calculation of a table hash for an example table; and

Fig. 4 shows an update of the table hash of Fig. 3 in case that a row is deleted from the table.

The method of the invention updates table values for corresponding tables in a database system when changes are performed in the tables. In the embodiment described in the following, this table value which is also referred to as table hash is a concatenation of hash values calculated for each row in the table by the XOR operator. However, instead or additional to the XOR operator, also addition modulo (2^bltlength) may be used as the operator, where bitlength refers to the bitlength of a hash value calculated for a row. The hash values for each row are determined based on a predetermined hash function. The calculation of the table value being the input of the inventive method will be described later on in detail with respect to Fig. 3.

In the flow chart of Fig. 1, the update of the table value for one specific table in a database system is illustrated. In step S 1 , it is checked, if a new row not included in the table beforehand has been added to the table. If so, the row hash value for this new row is calculated by applying the above mentioned hash function to the data entries in the corresponding row (step S2). Thereafter, the new row hash value is added to the table hash in step S3. Here and in the following, the operation "add" refers to the concatenation of the corresponding row hash value with the table hash by the XOR operator. As a result, an updated table hash is obtained for the table with the added row.

In case that no new row has been added in step SI, it is checked in step S4 whether a row update is performed. The term "row update" means that an already existing row has been changed. If so, the old row hash value of the row before the update is calculated in step S5. Thereafter, the old row hash value is substracted from the table hash in step S6. Here and in the following, the operation "substract" means that the corresponding row hash value is concatenated with the table hash by the inverse of the XOR operator. As commonly known, the inverse of the XOR operator is the XOR operator itself. Thereafter, in step S7, the updated row hash value, i.e. the row hash value for the changed row, is calculated by applying the hash function to the data entries of the changed row. Eventually, this updated row hash value is added to the table hash in step S8 resulting in an updated table hash.

In case that the change in the table is not a row update, step S4 proceeds with step S9 where it is determined that the change is a deletion of a row. In step S10, the row hash value of the deleted row is calculated before the delete is performed by applying the hash function to the data entries of the deleted row. Thereafter, in step Sl l, the row hash value for the deleted row is substracted from the table hash resulting in an updated table hash.

As illustrated in Fig. 1, an easy maintenance of the table hash for a table in a data- base system is achieved. The corresponding updated table hashes may then be used in order to compare the tables of different database systems. I.e., in case that the table hashes of two tables are identical, the data inside the tables correspond to each other. A correspondence between the data of two tables means that both tables include the same number of rows with the same data entries in the corresponding columns, irre- spective of the sorting of the rows. Several use cases will be described later on referring to the processing of the table values updated according to the invention.

Fig. 2 describes the calculation of a table hash in a parallel database system including several separate subentities forming the lowest level of parallel instances. According to Fig. 2, the system comprises subentities 1, 2, ..., n, where each subentity includes a partial table of an overall table, each partial table comprising a subset of rows of the overall table. Each subentity processes its partial table separately based on the method as shown in Fig. 1. I.e., each subentity has pre-calculated table hashes which are updated in case of changes in the corresponding partial tables. In order to calcu- late the table hash of the overall table distributed over the subentities, the pre- calculated table hashes of the subentities are requested by a central processing unit being responsible for all subentities in the database system (step S101). As a consequence, each subentity 1, 2, ... , n provides its pre-calculated table hash to the central processing unit as indicated by steps SI 02, SI 03 and SI 04. The processing unit adds all local table hashes to the final table hash by concatenating the corresponding local table hashes by the XOR operator (step 105). Eventually, in step 106, a final table hash is returned which may then be used to compare the table with corresponding tables of other database systems.

Fig. 3 illustrates the calculation of the initial table hash which is updated according to the embodiment as described above. As an example, Fig. 3 shows a simple table T stored in a database system including twelve rows R0, Rl, Rl l as well as three columns CI, C2 and C3. For clarity reasons, only some of the rows are indicated by respective reference numerals. This is also the case for the functional values FVl, FV2 etc. mentioned later on. Column CI refers to a calendar date indicated by the string "calendar date" in row R0. Column C2 refers to a day of the year indicated by the string "day of year" in row R0. Column C3 refers to a week of the year indicated by the string "week of year" in row R0. In the embodiment described herein, the core information included in rows Rl to Rl 1 are used for calculating a corresponding table value in the form of a table hash. This core information refers to the actual logi- cal content belonging to the table which usually forms the basis when comparing two tables. The metadata included in row R0 are in most cases not relevant when comparing tables and, thus, are not involved in the calculation of the table hash.

As indicated by arrow Al in Fig. 3, for each of the rows Rl to Rl 1 , a corresponding functional value FVl, FV2, FV11 is calculated. This functional value is a sequence of bits for each row where only a part of the sequence is shown for the respective row. In the embodiment described herein, each sequence refers to a hash value derived by applying the hash function SHA-1 to all data entries in the corresponding row. Based on all functional values FVl to FVl 1 , a table value in the form of an overall hash value HV is calculated as indicated by arrow A2. To do so, each column of corresponding bits in the array of functional values is concatenated by the XOR operator. The XOR operator applied to a column of bits has the value 1, if the number of bit values "1" is odd in the column. Otherwise, the result of the XOR operator is "0". As a result of the XOR operation, the final hash value HV is obtained for which only a partial section of corresponding bits is shown.

Fig. 4 illustrates the update performed for the hash value HV in case that row R3 is deleted from the table T. To do so, the hash value HV is combined with the functional value FV3 by the XOR operator as indicated by arrow A3. As a result, the updated table hash HV is obtained as indicated by arrow A4. This table hash corresponds to the updated table T'. This correspondence is indicated by arrow A5. In the table T', the row R3 has been deleted. The rest of the table is identical to the table T as shown in Fig. 3.

According to Fig. 4, an update based on the deletion of a row in the table T is shown. When performing an addition of a row, the corresponding functional value based on the hash function is calculated for the row to be added and combined with the hash value HV by the XOR operator. In case that a change in a row is performed, the functional value of the row before change is combined with the hash value HV by the inverse XOR operator being identical to the XOR operator. The hash value obtained by this operation is thereafter combined by the XOR operator with the functional value of the row after change which is determined by applying the hash function to the data entries of the row.

As described in the foregoing, the invention provides an easy and straightforward mechanism in order to update table values for corresponding tables when rows are inserted, deleted or updated. The table values may be used in different scenarios of a database system. Particularly, the table content of two tables from different physical instances of one database may be compared by comparing the corresponding table values of the tables. Furthermore, end-to-end processes between different database systems, e.g. OLTP processes (OLTP = Online Transaction Processing) and data warehouse processes, may be checked by comparing the corresponding table values. In such a use case, the data entries of the tables are converted in a common data format for both systems before calculating the table values to allow system independent comparisons. In another use case, system initializations may be checked by comparing table values with expected values. Furthermore, end-to-end processes from files to database content may be checked by comparing the table value derived for a file with a corresponding table value in the database system. Moreover, the table values can be used for archive optimization. To do so, table values are pre-calculated on a partition level. With each archive, the table values are stored in a separate table and a history is built up for the partition table values. A new archive job first checks for which partition the table value has changed since the last archive and only for those partitions the data will be archived. This guarantees that all data is archived with a minimum of data storage and network traffic. The pre- calculated table values of a partition may also be used to compare data of two systems and decide if a backup of partitions in the secondary system is needed or not. Identical table values indicate that one backup is sufficient for both systems. As a consequence, the backup volume can be reduced. Furthermore, the table values stored with an archive can be used in order to validate the recovery of the archive.

Claims

Patent Claims

1. A method for maintaining one or more table values for a number of tables (T) in a database system, where each table (T) of the number of tables (T) comprises data entries for a plurality of rows (Rl, Rl l) and a number of columns (CI, C2, C3), where a table value (HV) is stored for each table (T), said table value (HV) being a concatenation of functional values (FVl, FV11) for each row (Rl, ..., Rl 1) of the table (T) by an operator being associative, commutative and having an inverse, said functional values (FVl, FV11) being the result of applying a predetermined function to row data, said row data being based on the data entries in the corresponding row (Rl, Rl l), wherein an updated table value (HV) is calculated for each table (T) as follows:

i) in case that a row (Rl, ..., Rl 1) is added to a respective table (T), a functional value (FVl , ... , FVl 1) for the added row (Rl , ... , Rl 1) is determined by applying the predetermined function to the row data of the added row (Rl, Rl l), where the functional value (FVl, FV11) for the added row (Rl, Rl l) is concatenated with the table value (HV) by the operator;

ii) in case that a row (Rl, Rl l) is deleted from a respective table (T), the functional value (FVl, FV11) for the deleted row (Rl, R2, Rl l) is concatenated with the table value (HV) by the inverse of the operator;

iii) in case that a row (Rl, ..., Rl 1) is changed in a respective table (T), a functional value (FVl, FV11) for the changed row (Rl, Rl l) is determined by applying the predetermined function to the row data of the changed row (Rl, ..., R11), where the functional value (FVl, ..., FVl 1) for the row (Rl, Rl l) before change is concatenated with the table value (HV) by the inverse of the operator and where the functional value (FVl, ... , FVl 1) for the changed row (Rl , ... , Rl 1) is concatenated with the table value (HV) by the operator.

2. The method according to claim 1, wherein the predetermined function is a hash function, particularly SHA-1 or MD5, and/or an encryption function, particularly AES256 or DES.

3. The method according to claim 1 or 2, wherein the functional value (FV1, FV11) for each row (Rl, Rl l) is a bit sequence, where the operator is the XOR operator combining corresponding bits of two bit sequences and/or where the operator is the addition modulo (2^bltlength) operator combining two bit sequences, where bitlength refers to the bitlength of a bit sequence.

4. The method according to claim 1 or 2, wherein the row data comprises the data entries of each row (Rl , ... , Rl 1).

5. The method according to one of the preceding claims, wherein each data entry in a row (Rl , Rl l) is converted in a predetermined data format, where the row data comprises the converted data entries.

6. The method according to claim 5, wherein the data entries represent numeric values and/or dates and/or time stamps which are converted in corresponding strings and/or wherein the data entries represent bytes which are converted in a base64 encoding.

7. The method according to one of the preceding claims, wherein the functional value (FV1, FV11) for the deleted row (Rl, Rl l) in step ii) and/or the functional value (FV1, FV11) for the row (Rl, Rl l) before change in step iii) are determined in the respective steps ii) and/or iii) by applying the predetermined function to the row data of the deleted row (Rl , ... , Rl 1) and/or the row (Rl, ..., Rl 1) before change.

8. The method according to one of the preceding claims, wherein the functional value (FV1, FV11) for the deleted row (Rl, Rl l) in step ii) and/or the functional value (FV1, FV11) for the row (Rl, Rl l) before change in step iii) are pre-stored in the database system.

9. The method according to one of the preceding claims, wherein the number of tables (T) includes one or more tables (T) which are partial tables of a bigger table, a partial table including a subset of rows (Rl, Rl l) and/or columns (CI , C2, C3) of the bigger table, where each partial table is preferably stored in a different logical or physical memory partition in the database system.

10. The method according to claim 9, wherein a partial table only includes data entries referring to the core content of the bigger table.

11. The method according to claim 9 or 10, wherein the bigger table is distributed over several subentities in the database system, each subentity including a par- tial table of the bigger table, each partial table comprising a subset of rows of the bigger table and all partial tables comprising the same columns.

12. The method according to claim 1 1, wherein an overall table value for the bigger table is calculated by concatenating the table values (HV) for the partial tables by the operator.

13. Database system including one or more table values for a number of tables (T), where each table (T) of the number of tables (T) comprises data entries for a plurality of rows (Rl , ... , Rl 1 ) and a number of columns (C 1 , C2, C3), where a table value (HV) is stored for each table (T), said table value (HV) being a concatenation of functional values (FV1, ..., FV11) for each row (Rl, ..., Rl 1) by an operator being associative, commutative and having an inverse, said functional values (FV1, FV11) being the result of applying a predetermined function to row data, said row data being based on the data entries in the corre- sponding row (Rl , ... , Rl 1), wherein the database system includes a computing means configured to calculate an updated table value (HV) for each table (T) as follows:

i) in case that a row (Rl, ..., Rl 1) is added to a respective table (T), a functional value (FV1 , ... , FV11) for the added row (Rl , ... , Rl 1) is determined by applying the predetermined function to the row data of the added row (Rl, Rl l), where the functional value (FV1, FV11) for the added row (Rl, Rl l) is concatenated with the table value (FFV) by the operator;

ii) in case that a row (Rl, Rl l) is deleted from a respective table (T), the functional value (FV1, FV11) for the deleted row (Rl, R2, Rl l) is concatenated with the table value (FFV) by the inverse of the operator;

iii) in case that a row (Rl, ..., Rl 1) is changed in a respective table (T), a functional value (FV1, FV11) for the changed row (Rl, Rl l) is determined by applying the predetermined function to the row data of the changed row (Rl, ..., R11), where the functional value (FV1, ..., FV11) for the row (Rl, Rl l) before change is concatenated with the table value (HV) by the inverse of the operator and where the functional value (FV1, ... , FV11) for the changed row (Rl , ... , Rl 1) is concatenated with the table value (HV) by the operator.

The system according to claim 13, wherein the computing means is arranged to perform a method according to one of claims 2 to 12.

Computer program product directly loadable into the internal memory of a digital computer, comprising software code portions for performing the method according to one of claims 1 to 12 when said product is run on a computer.

A computer program for controlling a computer to perform a method according to one of claims 1 to 12.