Detailed Description
It should be understood by those skilled in the art that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention, and the some embodiments are intended to explain the technical principles of the present invention and are not intended to limit the scope of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive effort, based on the embodiments provided by the present invention, shall still fall within the scope of protection of the present invention.
It should be noted that the logic and/or steps represented in the flowcharts or otherwise described herein, for example, may be considered as a ordered listing of executable instructions for implementing logical functions, and may be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
Before describing the embodiment of the present application in detail, the database in this embodiment is first described to better understand the database environment on which this embodiment depends, where the database environment not only supports storage and operation of large objects, but also provides enough flexibility to perform complex rewriting tasks.
The database of this embodiment is built with a data base table, which may be a user base table, created according to the creation statement. The data base table has at least one large object column, and at least a portion of the large object data items in the large object column are partitioned into a plurality of storage slices for storage. The data items in the large object column may be BLOBs, clibs/NCLOB, or other types of variable length binary data, character data, external data, etc. The data base table may be a user base table, created from the creation statement. The number of large object columns may be set according to business requirements, for example, as one column, two columns, or more.
In a large object mode preset in the database, each large object column or each large object column partition is provided with a storage management table space, and each storage management table space stores an object and a storage slice table corresponding to the large object column or the large object column partition, wherein the object and the storage slice table are used for storing description information and data of the storage slice.
In one example, the step of creating the data base table includes obtaining a creation instruction and generating the data base table according to the creation instruction, generating a column object identifier for each large object column or partition of each large object column of the data base table, and naming the objects and the storage slice table according to the column object identifiers to obtain the storage table object identifiers. The create instruction may use SQL (Structured Query Language), for example, assuming a simple case that the user has created a business table user_biz_data containing two large object columns col_x and col_y, (and other regular data columns col_a and col_b) and the user has not set the large object columns to partition, the create statement may be:
KLOB may be a large object data type defined by the database of this embodiment, which includes KBLOB, KCLOB, KNCLOB proprietary base types, corresponding to the conventional BLOB, CLOB, NCLOB types, respectively. In the database of this embodiment, kbob, KCLOB and KNCLOB can be set as one of the alternative implementations behind BLOB, CLOB and NCLOB using existing Domain type mechanisms. This means that when the user designates the use of the BLOB, CLOB or NCLOB type at the time of creating a table or defining a column, the proprietary base type kbob, KCLOB or KNCLOB of the present embodiment may actually be selected for use instead.
A business table (base table) of user_biz_data is built in the user, wherein the business table comprises two large object columns col_x and col_y which are not partitioned. The present solution will automatically generate UUIDs (Universally Unique Identifier, column object identifiers, or so-called universal unique identification codes) for these two large object columns.
The large object Schema preset for the database may be a specialized Schema (Schema) created in the database, which in some embodiments may be named sys_ klob _schema, which defines the organization and structure of the database. Each large object column in the user base table and its column partition, the database system may build a set of tables and indexes in the Schema described above to store and manage large objects. The object itself will be saved in these tables in small memory slices (Chunk). It should be noted that the large object column may be partitioned as needed, and the large object column partition may be considered as a part of the large object column, where each large object column partition has an object and a storage slice table. In some embodiments, each large object column has one object and storage slice table without the need for partitioning.
For example, sys_ klob _ SCH SCHEMA may set a matching object and a storage slice table for two non-partitioned large object columns col_x and col_y, respectively, named tab_ colx _uuid, tab_ coly _uuid, respectively. The object and the memory slice table tab_ colx _uuid, tab_ coly _uuid generate globally unique identifiers with randomly generated UUIDs, respectively. The globally unique identifier may be in the format of a prefix (e.g., tab—as described above) in combination with a UUID to satisfy naming requirements of the name of the database object. The object and the object record in the storage slice table are the first storage slice record of the object. In this embodiment, the UUID is used to identify a large object column or a large object column partition, and the data table, index, and sequence may be respectively combined with the UUID by respective prefixes to generate respective names.
And recording description information of the object and the stored slice table in a large object list metadata table preset in a database system table. For managing large objects, the database system builds a set of dedicated system tables for storing the correspondence between the large object columns in the user table and these database objects, as well as the storage parameters unique to the large object columns. The large object column metadata table may be named sys klob column, one of a set of system tables that are configured specifically for large objects for database systems.
The large object column metadata table sys_ klob _column may also record an object identifier of the base table (for identifying the data base table, which may be named rel_ oid, used when looking up metadata of the data base table), an identification of the large object column (which may be a column number of the large object column in the base table, used to locate the position of the large object column, which may be named attr_no, and in cooperation with rel_ oid, one column of the database may be uniquely determined, which may be used when looking up table column metadata). And the description information of the object and the storage slice table recorded in the large object column metadata table includes a storage table object identifier (which may be named as tab_ OID, that is, OID of the object and the storage slice table, where each large object column has a corresponding tab_ OID without partitioning, and if the large object column has a partition, each large object column partition has a corresponding tab_ OID).
The large object control block is used for replacing the position corresponding to the large object data item in the data base table, and at least the description information of the large object data item and the data item object identifier are recorded in the large object control block. That is, in the final data base table, the location of the large object data item is a control block, which may be set to different data structures according to the size of the large object data item, and at least the included content includes description information of the large object data item and the data item object identifier. Wherein the description information of the large object data item is used for describing the type, various flag bits, version and the like of the large object data item. The data item object identifier is a globally unique fixed identifier of the object, is an identifier of a UUID type, is randomly generated when the object is created, and is kept unchanged in the whole life cycle of the object.
The set of system tables configured by the database system specifically for large objects may further include a storage parameter history table (which may be named sys_ klob _store_parameters), a duplication task history table (which may be named sys_ klob _ dedup _tasks), and a system metadata table (which may be named pg_attribute) in the database system table also records other basic attributes of the large object column consistent with other conventional data types.
In some embodiments, the object and the storage table object identifier tab_ oid of the storage slice table and the storage parameters are recorded in a storage parameter history table sys_ klob _store_params preset in the database system table, and the storage parameters may include a storage parameter version number, a storage option, an encryption parameter, a task identifier, and the like. The data of the storage parameter history table sys_ klob _store_params can be used when storage options such as compression, encryption, etc. of a large object are modified or existing data needs to be rewritten.
In some embodiments, the record object and the storage table object identifier tab_ oid of the storage slice table and the duplication elimination task information in the duplication elimination task history table sys_ klob _ dedup _tasks preset in the database system table, where the duplication elimination task information includes a duplication elimination task serial number and a duplication elimination task identifier, which are used to determine an execution state of the duplication elimination task.
The original system table of the database system table also stores metadata of the table and the index itself used for storing the large object, which is basically consistent with metadata records of other common data types, and is not described herein.
Each storage management table space also stores a primary key index of the object and the storage slice table. The primary key index is used for searching the large object data item and/or the storage slice in the object and storage slice table, and the large object column metadata table also records the information of the primary key index. The primary key index may generally be in the form of a hash index. The searching efficiency is improved by establishing an index.
A reference column is also arranged in the object and storage slice table. The reference column is used to record the reference relationship of the storage slice with other storage slices and/or the reference relationship of the large object data item with other large object data items. By utilizing the reference relations, repeated storage of the same large object data item and/or storage slice can be avoided, and storage space resources are saved. Further, each storage management table space may also store a reference column index of the object and storage slice table. The reference column index is used for inquiring the reference relation between the object and the storage slice table, and the large object column metadata table is also recorded with the information of the reference column index. The storage efficiency of searching the reference relation is improved by establishing the reference column index.
It takes a lot of execution time to rewrite data and to row weights for large objects when modifying storage options. In order to minimize the impact on the user's daily business, the above operations are generally performed progressively in the background. To ensure proper execution of the above operations, in some embodiments, a data overwrite task sequence (store_seq_ < col/part uuid >) and a duplication task sequence (dedup _seq_ < col/part uuid >) may be created for the object and the storage slice table. The data rewriting task sequence is used for the rewriting process of the record object and the storage slice table, and the duplication eliminating task sequence is used for the duplication eliminating process of the record object and the storage slice table, so that the two sequences are utilized to track the tasks of rewriting data and duplication eliminating of a large object when the storage options are modified respectively.
In this embodiment, in order to achieve efficient storage and management of large objects, particularly when processing TB-level large objects, a storage environment design based on tables and indexes is adopted. The method avoids complex transformation of the underlying storage implementation mechanism of the database, and sets a storage management table space for each large object column or each large object column partition in a large object mode preset in the database.
Still, the description will be given taking the example of creating the service table user_biz_data, which includes two large object columns col_x and col_y. FIG. 1 is a schematic diagram of a table structure in a data table with large object columns, according to one embodiment of the invention.
When a user builds a data base table (service table) user_biz_data, the database system generates UUIDs (colx _uuid and coly _uuid) for the col_x and col_y large object columns that it contains, respectively. When generating database objects, the corresponding UUID is prefixed, on the one hand because some UUID text forms have first characters that are numbers and do not conform to the database object naming rules, and on the other hand, because a large object column or column partition is used to store a set of database objects, including tables, indexes, sequences (sequences), etc., where sequences (sequences) are indeed objects in the database that are used to generate an increment Sequence value, typically to provide a unique identifier for a record in a table. In the scheme, serial numbers are respectively generated for the re-writing task and the re-arranging task, and the main purpose is to identify the execution sequence of the tasks. Since these sequence numbers are generated within a particular column or column partition, and considering that the number of times these two types of tasks are performed is typically not too large, the likelihood of the sequence values wrapping around (i.e., restarting after the sequence values reach an upper limit) is indeed small. This ensures that within these ranges the serial number is unique. In addition, these serial numbers are not globally unique identifications like UUIDs. UUIDs are typically used to generate globally unique identifiers where the collision probability of UUIDs is very low. While the unique identifier generated by the sequence is valid only in a particular context or range beyond which it may no longer be unique. If col_x and col_y are partitioned, each partition has a respective UUID. In addition, the normal data type columns col_a and col_b are stored according to the original storage logic.
The structure of the whole database comprises a system metadata mode pg_category schema, an application program mode schema and a large object mode sys_ klob _schema, wherein the system metadata mode pg_category schema is database metadata, and the application program mode schema and the large object mode sys_ klob _schema are mainly user service data.
In the system table, a relation attribute table (or referred to as a system metadata table, which may be named as pg_attribute) is used for and holds general data types of databases and general basic attributes of the data table (including basic attributes of large object columns consistent with other general data types), a large object column metadata table sys_ klob _column, a storage parameter history table sys_ klob _store_params, and a duplication task history table sys_ klob _ dedup _tasks as specific storage attributes of large data. The large object column metadata table sys_ klob _column records the Oid of the table and index created, and the storage attributes specific to the large object column. The base table OID (rel OID) and large object column number (attr no) fields in the table are used to associate with the original system table in the database, indicating the attributes of which column in which data base table this data record is.
In order to track the execution state of the tasks for a long time, such as overwriting existing data, when the storage options of the large object column are modified, and ensure recovery from the interrupted tasks, the storage parameter history table sys_ klob _store_params and the duplication task history table sys_ klob _ dedup _tasks are used for recording the history of the storage parameters and the duplication task parameters.
The large object schema sys_ klob _schema includes col_x data storage table space and col_x data storage table space, and if the large object column has partitions, each partition has a respective data storage table space. Taking col_x data storage table space as an example, in addition to performing data storage on col_x objects and the storage slice table, indexes (which may be hash indexes) idx_ < colx _uuid >, ref_ < colx _uuid > are respectively established for the main keys and the reference columns of the objects and the storage slice table.
The col_x object and the storage slice table may use tab_prefix plus colx _uuid as names, for storing object records and storage slice records, both of which use randomly generated UUID as global unique identification. The object record is essentially the first stored slice (Chunk) record (Chunk) of the object. tab_ < colx _uuid > primary key index idx_ < colx _uuid > can be used to quickly find objects or memory slices Chunk. In the case of supporting object deduplication, allowing objects to reference other objects, chunk to reference other Chunk, tab_ < colx _uuid > to reference column index ref_ < colx _uuid > may be used to expedite query operations of reference relationships.
In the base table user_biz_data, the location of the large object data item will be replaced by the large object control block, which will contain the actual storage location of the large object data item. In some embodiments, if the data size of the large object data item is small, the small data size data item may also be directly stored in the large object control block according to the user selection for optimizing performance.
FIG. 2 is a schematic diagram of a system table in a data table with large object columns, according to one embodiment of the invention. The system table related to the large object column is mainly composed of 4 large object column metadata tables sys_ klob _column, a relation attribute table pg_attribute, a storage parameter history table sys_ klob _store_params, and a large object column heavy task history table sys_ klob _ dedup _tasks.
The relationship attribute table pg_attribute is used to record table column metadata, and as with other columns of conventional data types, the basic attributes of large object columns are also stored in this table. The column number attr_no, where the data base table OID rel_ OID and the large object column in the base table, may locate the large object column position in the data base table.
The large object column metadata table sys_ klob _column is used for storing the unique attributes of the large object column and the column partition thereof, and is associated with the relation attribute table pg_attribute through two fields of the data base table OID rel_ OID and the column number attr_no of the large object column in the base table. In the case that the large object column is not partitioned, each record in the relationship attribute table pg_attribute has a corresponding record in the large object column metadata table sys_ klob _column. In the case of large object column partitions, each partition will also have a corresponding record in the large object column metadata table sys_ klob _column.
Specifically, the large object column metadata table sys_ klob _column may include the following attributes:
the data base table OID rel_ OID is used when the metadata related to the base table is searched;
Column number attr_no of large object column in base table, in cooperation with rel_ oid, can determine a table column in uniquely determined database, used when looking up table column metadata;
Object and storage slice table OID tab_ OID for storing large object and OID of its storage slice table, each large object column corresponding to such a table in case of large object column not being partitioned, each partition corresponding to such a table in case of large object column partition;
the main key index OID idx_ < colx _uuid >, OID of object and index of main key column unit_id of storage slice table (which can be Hash index), each large object column corresponds to one main key index in case of large object column not partitioning, and each partition corresponds to such one index in case of large object column partitioning;
The index OID ref_ < colx _uuid > refers to the OID of the index (Hash index) of the object and the storage slice application object/storage slice identification column ref_id, and in the case of large object columns without partitioning, each large object column corresponds to such an index, and in the case of large object column partitioning, each partition corresponds to such an index;
The large object column can be placed in different table spaces with other data, or each column partition can be placed in different table spaces, and under the condition of user's needs, an independent table space can be designated for each large object column (under the condition of no partition) or each large object column partition (under the condition of partition);
Storing a parameter version number sequence OID store_seq_ OID, storing the OID of the parameter version number sequence store_seq_ < col/part uuid > for a large object column or a large object column partition, wherein the integer generated by the sequence store_seq_ is used for identifying the version of the large object storage parameter and the sequence thereof;
The duplicate-removal task sequence number sequence OID dedup _seq_ OID is the OID of the large object column or the large object column partition duplicate-removal task sequence number sequence dedup _seq_ < col/part uuid >, and the integer generated by the duplicate-removal task sequence dedup _seq is used for identifying the version of the large object duplicate-removal task parameter and the sequence thereof, and each large object column (in the case of non-partition) or the large object column partition (in the case of partition) corresponds to one sequence object;
the physical size of the storage slice Chunk is chunk_size, the size of the storage slice Chunk for recording large objects can be used as a unit, the size of the Chunk is not limited by the size of an underlying storage page, and can be selected from 2K to 1G, and the Chunk size is independently set for each large object column (under the condition of no partition) or each large object column partition (under the condition of partition), so that the requirements of different large data columns are met;
the current storage option lob flags, which is a current storage option, may be a 32-bit string, including storage options in terms of storage location, compression, encryption, and the like, and the storage parameter history table sys_ klob _store_params also completely records each version of storage option, so that in order to accelerate certain operations, the current storage option lob flags maintains a redundant field, and repeated storage is performed;
The current storage parameter version number storage ver is the latest sequence number generated by the large object column/partition storage parameter version number sequence and is used for identifying which record in the system table sys_ klob _store_params the current storage parameter corresponds to together with tab_ oid;
dedup _ord, the current heavy task number is the latest sequence number generated by the large object column or the large object column partition heavy task sequence number sequence and is used for identifying which record in the large object column heavy task history table sys_ klob _ dedup _tasks the current heavy task corresponds to together with tab_ oid;
partition policy partstrat, for recording a partition policy, which may include h hash partitions, l list partitions, r range partition tables, the attribute only having a value in a row of the object corresponding to the storage slice partition table;
column number partnatts in partition key, only in the rows of the object corresponding to the Chunk partition table;
Default partition partdefid the OID of the pg_class entry for the default partition of this partition table, which is zero if this partition table has no default partition. Only in the row where the object corresponds to the Chunk partition table;
partition key column number vector partattrs is an array of partnatts values in length, indicating which columns are part of the partition key. For example, the value 13 indicates that the first and third table columns make up the partition key. A zero in this array indicates that the corresponding partition key column is an expression rather than a simple column reference, only in the row of the object corresponding to the memory slice partition table;
Partition key operator class OID vector partclass, for each column in the partition key, this field contains the OID of the operator class to be used, only with a value in the row of the object corresponding to the memory slice partition table;
Partition key column ordering rule OID vector partcollation, for each column in the partition key, this field contains the OID of the ordering rule to be used for the partition, with a corresponding value of zero if the column is not an orderable data type, only in the row of the object corresponding to the memory slice partition table.
The partition key list expression tree partexprs, the expression tree of partition key columns (in nodeToString () expression) that is not a simple column reference, is a list with one element for each zero entry in partattrs. This field is empty if all partition key columns are simple column references. Only in the row where the object corresponds to the Chunk partition table;
the internal representation relpartbound of a single partition boundary, which is the internal representation of a partition boundary, has values only in the row to which the column partition corresponds.
The large object column specific storage parameter history table sys_ klob _store_params is used for storing all history versions of each large object column specific storage parameter in the database, and is associated with records in the sys_ klob _column table through the tab_ oid column. Such historical data is used when it is necessary to rewrite existing data to modify storage options such as compression, encryption, etc. of large objects.
Specifically, the large object column specific storage parameter history table sys_ klob _store_params may include the following attributes:
object and storage slice table OID tab OID;
A storage parameter version number store_ver generated from the sequence of the above tab_ oid being store_seq_ < col/part uuid >;
Storage options lob flags, which may be used to identify off-line storage (including table storage, file storage, OSS (Object Storage Service, object storage service) storage, etc.), whether on-line storage is available (available or not), whether encrypted (encrypted or not), compression level (not compressed, low-level compressed, medium-level compressed, high-level compressed), encryption algorithm identification (encryption algorithm type), where the content of the identification occupies one or more identification bits of storage option lob flags, respectively, and in some embodiments, storage option lob flags may be a 32-bit string, formed by concatenating the above identification bits.
The expiration time of the key used for encrypting the large object column is UTC time key_ expr, UTC time (Coordinated Universal Time ) can be adopted;
the key enc_key is a key for encrypting a large object column, is stored in a ciphertext form and is encrypted by an upper-level key;
The encryption and compression task identifier enc_task_id is an identifier of the encryption and compression task of the large object column, can be randomly generated when the encryption and compression task of the large object column is created by adopting an identifier in the form of a UUID, is registered in a hash table in a memory, and represents that the process for executing the encryption and compression task is exited if the identifier recorded in the hash table cannot be found in the encryption and compression task registration table in the current memory.
The tab oid and store ver fields of the large object column specific storage parameter history table sys klob store params together uniquely identify a storage parameter version of a large object column or large object column partition in the library.
The large object row heavy task history table sys_ klob _ dedup _tasks stores parameters of each heavy task row of each large object row in the database, and is related to records in the sys_ klob _column table through the tab_ oid column to judge the execution state of the heavy task.
Specifically, the large object column weight task history table sys_ klob _ dedup _tasks may include the following attributes:
the object and the stored slice table OID tab_ OID have the same meaning as tab_ OID of the other system tables;
The heavy task sequence number dedup _ord is generated by a sequence of which the object and the storage slice table OID are large object columns or large object columns and the heavy task sequence number sequence dedup _seq_ < col/part uuid >;
The unique identifier dedup _task_id of the duplication elimination task process group is an identifier in the form of UUID, is randomly generated when a duplication elimination task of a large object row is created and is registered in a hash table in a memory, and if the identifier recorded in the table cannot be found in the duplication elimination task registration table in the current memory, the process for executing the duplication elimination task is represented to be exited.
The established special system table records the object and the description information of the stored slice table. The special system tables are used for storing the corresponding relation between the large object columns in the user tables and the database objects for storing the large objects and the unique storage parameters of the large object columns, so that the data management is more efficient and accords with the storage management characteristics of the large objects.
In the solution of the present embodiment, a large object control block is used in the data base table instead of the location of the corresponding large object data item. The large object control block may include both an intra-row large object storage structure and an extra-row large object storage structure.
FIG. 3 is a schematic diagram of an alternative large object control block in a data table with large object columns in accordance with one embodiment of the invention. FIG. 3 illustrates an off-line large object storage structure. The large object control block varattrib _4b includes a file header varattrib _4b.va_header, and data varattrib _4b.va_data. Wherein the data varattrib _4b.va_data includes description information (LOB object control block description) of the large object data item and a data item object identifier (LOB object fixed unique identification).
The description information of the large object data item may describe a control block type (in-line or out-of-line), whether there is a flag bit of data in the line, a control block format version number, and the like. The above content may occupy one or more identification bits of the description information, and in some embodiments, the description information of the large object data item may be a 32-bit string, which is formed by concatenating the identification bits corresponding to the above description.
The data item object identifier is an identification of the UUID type, which is randomly generated at the time of object creation, and remains unchanged throughout the life of the object.
The large-object-outside-line storage structure only comprises LOB object control block description words and LOB object fixed unique identification, and the two fields are kept unchanged in the whole life cycle of the object, so that updating of a data base table when updating the content of the large object is avoided.
FIG. 4 is a schematic diagram of an alternative large object control block in a data table with large object columns in accordance with one embodiment of the invention. FIG. 4 illustrates an intra-row large object storage structure.
The in-row large object storage structure is for large object data items having a small data volume (e.g., within 32 k), and stores large object data in a control block (the maximum size of the large object data is within a set range, e.g., up to 32 k). The LOB object storage option and LOB data are further included in the data varattrib _4b.va_data in addition to the LOB object control block description word and the LOB object fixed unique identification. The LOB object storage option may be a bit string and may be in the same format as the LOB flags field in the sys_ klob _column and sys_ klob _store_params of the system table. LOB data contains data for large objects.
By differentiating between the intra-row large object storage structure and the extra-row large object storage structure, storage requirements of large object data items of different data sizes are satisfied, and only data exceeding a certain data size (e.g., 32 k) is stored outside the row.
The object and storage slice table is a table that actually stores large object data. During database operation, each large object column or column partition is mated with a created object and stored slice table. The present embodiment optimizes and improves the object and stored slice table, and has obvious advantages, which are obviously different from the existing postgreSQL.
In the database scheme of PostgreSQL, the large object store table pg_ largeobject is a system table that is used only to store slice data, containing 3 fields loid, pageno, data. Wherein loid is a unique identifier of a large object, the type is OID (4-byte unsigned integer), and the unique identifier is in the range of a database. pageno identifies the page number (starting from 0) of the chunk data in the large object, of type 4 byte integer. data is Chunk data, the volume is not more than 1/4 of the size of a memory page of a database, and the size is not more than 2K for common 8K memory page settings.
In the database of PostgreSQL, the total capacity of the large objects in the whole database is limited by the upper limit of the data volume of a single table, and the total number of the objects in the whole database is limited by the value range of OID. If a data type with a larger range of values, such as int8, is used in the user service table, the number of large object identifiers may not be sufficient. Moreover, the identification of this OID type can only ensure uniqueness within one database, and if a large object is to be imported into another database, the identification must be reassigned, which is disadvantageous for tracking the large object. Each row of a large data object holds data for one page of one large object, starting with a byte offset (pageno x LOBLKSIZE) inside the object. Allowing sparse storage, pages may be lost and may be shorter than LOBLKSIZE bytes (even if not the last page). The missing area in a large object will be read as 0. That is, this scheme does not support insertion or deletion operations among large objects, but only addition and replacement of contents, whose behavior patterns are different from editing operations that are generally understood.
FIG. 5 is a schematic diagram of an object and storage slice table in a data table with large object columns, according to one embodiment of the invention. In the solution of this embodiment, the object and the record stored in the storage slice table are of two types. The first type is an object record that contains object metadata and data within the first 32K (which may be set to other values, for example) for accelerating the operation of the partial function. Objects of sizes exceeding 32K (as exemplified, other values may be set) also contain a storage slice index block for finding which storage slices the user-written data is located in. The second type is a storage slice record, which contains storage slice metadata and data of above 32K. A substantial number (above 32K) of objects will consist of one object record and several storage slice records. The size of the memory slice may be freely chosen within 1G for each large object column or column partition. Smaller memory slice data sizes are suitable if the average size of the objects in the column is smaller, and larger memory slice sizes are suitable if the average size of the objects in the column is larger. For large objects, for example, of the TB class, the largest memory slice size (1 GB) may be used.
Specifically, the object and storage slice table may include attributes of an object fixed unique identification uint_id, a partition key part_key used inside the system, a storage parameter version number store_ver, a reference object identification ref_id, a reference object count ref_count, an object weight removal feature value dedup _ feat, data prefatch within the first 32k of the object, and an additional data field ext_data.
The object fixes the unique identification uint_id, which is a unique identification of the UUID type, which is the object identification for the object record and the storage slice identification for the storage slice. The UUID type of identification can be considered to be unique in any case, and even if an object is imported into other databases, the identification can be kept unchanged without collision.
The part_key is a partition key used inside the system, and may be a signed integer. When the user selects the partition key with the other columns in the base table as large object columns, the original partition setting made by the user will be translated into an internal list partition form containing the partition key used internally, and this field may be NULL when the user selects the Hash partition based on the object/storage slice identification.
Store_ver is the storage parameter version number of the object, and NULL in the storage slice record, and this field can also be used to distinguish the object record from the storage slice record;
ref_id, for an object record, the identity of the referenced object (UUID) if there is a value, representing that the object is identical to the referenced object, and for a storage slice record, the identity of the referenced storage slice (UUID) if there is a value, representing that the storage slice is identical to another storage slice. In the case that this field is not NULL, both the prefetch and ext_data fields of the record are NULL. In addition, a reference relationship may exist only when one object is identical to another object. If only part of the memory slices are identical, only the reference relationships between the memory slices exist, and no reference relationship between objects exists.
Ref_count, which is the number of times the object is referenced by other objects, is represented in the object record, and the number of times the memory slice is referenced by other memory slices is represented in the memory slice record. If the object or storage slice is not referenced, then the field value is 0.
Dedup _ feat is the weight-removal characteristic value of the storage slice itself in the storage slice record, and is the weight-removal characteristic value of the whole object in the object record. This may be a block of binary data whose length is determined by the weight-removal eigenvalue algorithm. If the deduplication function is never enabled on the corresponding column or column partition, then this field is NULL (initial state). For small objects stored outside the row, if the DML operation changes the object content, when the DML operation happens, 1, if the row weight is not started currently, setting the field as NULL, 2, if the row weight is started currently, generating a new object characteristic value and filling the field, for large objects stored outside the row, if the DML operation changes the subordinate slice content, when the DML operation happens, 1, if the row weight is not started currently, setting the field as NULL, 2, if the row weight is started currently, generating a new slice characteristic value and filling the field. When the weight-eliminating task is executed, feature values are generated for the small object and the large object slices without feature values, and the field is filled in.
Prefatch, data within the first 32K of the object is stored according to the setting of the object storage parameter, possibly compressed and encrypted, and this field is always NULL in the stored slice record.
In the object record, the additional data field contains a storage slice index block, and the arrangement sequence of all storage slices of the object, the logical length of the original data and the unique identification are recorded. The logical length referred to herein means a length calculated in units of original data. For BLOB units are bytes, for CLOB/NCLOB units are characters. In the stored slice record, this field contains data of the portion above the large object 32K. Whether the slice index block or the slice data is stored, the slice index block or the slice data is stored according to the setting of the object storage parameter, and compression and encryption are possible. For example, in the record of the first storage slice of a large object, the additional data field may be used to store index records of all storage slices corresponding to the large object, that is, the storage slice index block, and in other storage slices of the large object except for the first storage slice, the additional data field is used to store data of the corresponding storage slice. That is, only the first storage slice of the large object is used for storing the storage slice index block in the additional data field, its own data is stored in the prefetch field, and the other storage slices store their own corresponding data contents in the additional data field, ext_data. Whether the slice index block or the slice data is stored, the slice index block or the slice data is stored according to the setting of the object storage parameter, and compression and encryption are possible. Under the above storage structure, the storage slice content modification mode of the scheme is obviously different from that of the PostgreSQL database in the prior art, can efficiently support the insertion and deletion operations in the middle of a large object, and adopts the same semantics as the character string insertion and deletion operations.
In the object record (i.e. the first data line of the object), the prefetch field is used to store the first slice of the object, i.e. the data with limited length of the object header, which can be read and written directly like a conventional field in some scenarios without passing through the file-type API, its upper limit of length is also low, which can not be set to be large like the subsequent slice, and the ext_data field is used to store the slice index block. In the slice record (i.e., the data line in which the second and subsequent slices are located), the prefetch field is not used, NULL is set, and the ext_data field is used to store the data content of the slice.
Through the design mode of the database, under the condition that a large object is split into a plurality of storage slices, firstly, the global unique identification of the large object is determined in a data base table (namely a user_biz_data table), the first storage slice where the large object is located is found in the object and storage slice table according to the global unique identification of the large object (the global unique identification of the large object is the slice unique identification of the first storage slice), then, the global unique identification of the target storage slice where the searched target data is located is determined through a storage slice index block recorded in the first storage slice, then, the target storage slice is found in the object and storage slice table through the global unique identification of the target storage slice, and finally, the specific data of the target storage slice is obtained in an ext_data column of the target storage slice.
In addition, the data of the target storage slice to be modified can also find the corresponding target storage slice and perform the operation according to the method, and then synchronously update the content of the storage slice index block in the additional data field of the first storage slice.
The index block of the storage slice is a binary data block and comprises positioning information of all storage slices of an object, wherein the binary data block sequentially comprises the following fields of a storage slice index library format description word, the number of storage slices, a storage slice length array and a storage slice identification array.
The storage slice index library format descriptor may be a signed integer used as a string of bits, some of which (e.g., the lower 8 bits) are storage slice index block format version numbers, and others may be reserved.
The total number of stored slices, i.e. the length of the array of stored slices at the back, is of the signed integer type. If this field is 0, then the representation does not contain any memory slices.
The length of the storage slice length array is the total number of the storage slices, the array element types are signed integers, and the storage slices are arranged in the sequence of the objects.
The length of the storage slice identification array is the total number of the storage slices, the array element type is the UUID number, and the storage slices are arranged in the sequence of the object.
As can be seen from fig. 5, in the stored slice index block, the metadata (length and identification) of the stored slices are not stored in rows but in columns, i.e., all stored slice length values are stored consecutively first, and then all stored slice identification values are stored consecutively. Since large objects are typically less updated, there will be many complex values of the length value of their stored slice, which helps to increase the compression rate.
In the scheme of the present embodiment, metadata (identification, length, etc.) of the storage slices are stored separately from the data, the actual length of the storage slices is variable, and the order of the storage slices in the large object is recorded by the storage slice index block. Therefore, when data is inserted or deleted in the middle of a large object, only the storage slices related to the inserted or deleted data need to be rewritten, and other storage slices are not moved, so that the data volume needing to be rewritten during updating is greatly reduced, and the performance is improved.
In contrast, the PostgreSQL database uses a fixed-length storage slice, where the storage slice metadata and data are stored in a record, and the storage slice sequence number indicates its position in the object, and if data is inserted or deleted from the middle, all the storage slices after the insertion/deletion point need to be rewritten, which is far less than the scheme of the present embodiment.
By adding reference objects/storage slices uint_id, ref_id, ref_count, the large object storage of the scheme of this embodiment supports a deduplication operation, i.e., simplifying duplicate objects or storage slices. FIG. 6 is a schematic diagram of memory slices referencing one another in a data table having large object columns, according to one embodiment of the invention. For example, for a certain large object data item a part of the memory slices is identical to the memory slices in another large object data item B, these memory slices are relatively independent, have a unique identification, but do not store the actual data, but rather preserve the identification of the memory slices of the object data item B. For example, when the memory slice 102 is the same as the memory slice 201 and the memory slices 104 and 105 are the same as the memory slice 204, the memory slice 102 stores the identification of the memory slice 201, and the memory slices 104 and 105 store the identification of the memory slice 204. The memory slice 201 and the memory slice 204 record reference counts, respectively.
Similarly, if a large object data item D is identical to another large object data item C, the large object data item D may store the identity of the large object data item C without having to store data. The large object data item C records the reference count. Of course, this is only for large object data items with a volume of less than or equal to 32K, and for larger large object data items, the probability of a perfect match is lower because of the cost of determining whether it is repeated as a whole, and it is only a better choice to rank the weights at the granularity of the storage slice.
The reference count of the record can avoid errors in data when the DML is executed during data modification.
The storage mode of the large object in the storage structure can be divided into three types, namely a small object stored in a row, a small object stored outside the row and a large object stored outside the row.
Wherein the metadata and data body of the mini-object stored in the row are stored in the data base table row in the form of an in-row control block. The small objects stored in the rows are compact in structure and small in occupied space.
Part of metadata of the small-sized objects stored outside the row is stored in the data base table row, and the rest of metadata and data bodies are stored in the object and storage slice table in the form of an outside-row object control block and an object record. The small objects stored off-line contain fields supporting other characteristics such as partitioning, deduplication, and progressive background tasks, in addition to the control blocks and the object data itself.
Part of metadata of a large object stored outside a row is stored in the row, the rest of metadata, the first 32K data and a storage slice index block are stored in one object record, and data above 32K are stored in a group of storage slice records. The large object stored outside the line comprises an object record and a plurality of storage slice records. The object record and the plurality of storage slice records are stored in the same table, using the same primary key column. The object record serves as an entry for accessing large object data.
It should be noted that, although the large object stored outside the row also has fields with other characteristics such as supporting partition, duplication elimination and progressive background task of the small object, the difference is that 1, in the object record of the large object, ref_id, ref_count and dump_ feat are not used, because the large object is not duplicated in whole, only slice level duplication is performed, 2, in the slice record of the large object, the store_ver is not used, because all contents of a single object adopt uniform storage parameters, and are in accordance with the general expectations of users, and preftch is also not used, because when the user reads and writes the specified fragment in the large object, a small piece of data of the slice head is not required to be acquired quickly.
As mentioned above, in existing database systems, because large objects may occupy a large amount of storage space, overwriting operations typically involve a large number of data reading, processing, writing, etc., which operations tend to be time consuming. When large object overwrite operations are performed, it is often necessary to lock the associated data row or table to prevent data from being modified by other transactions during the overwrite process. However, long-time locking can block access to other transactions, resulting in reduced concurrency performance of the database, which can adversely affect daily traffic.
In order to solve the above-described problems, the present invention provides a large object rewriting method of a database, fig. 7 is a flowchart of a large object rewriting method of a database according to an embodiment of the present invention, and the large object rewriting method of a database includes at least the following steps S701 to S703 as shown in fig. 7.
In step S701, an event is acquired that overwrites data in a large object column in the database, the large object column storing a large object data item of a large object data type.
In step S702, a rewriting subtask queue and a rewriting process group are established, the rewriting subtask queue is used for storing a plurality of rewriting subtasks divided based on rewriting events, and the rewriting process group includes a plurality of rewriting processes running in parallel.
In step S703, the rewriting subtasks in the rewriting subtask queue are allocated to the respective rewriting processes of the rewriting process group, and the rewriting subtasks allocated thereto are processed by each of the rewriting processes.
The large object rewriting method of the database can reduce the locking time, reduce the load of the database system, meet the optimization in daily business demands and the like through task division and parallel processing, remarkably improve the performance and stability of the database system and provide an effective solution for maintaining and managing the database in an actual business scene.
In some embodiments, to ensure that the rewrite task for the entire large object-column can be successfully and completely executed and to give the correct feedback after all relevant rewrite sub-tasks are completed, after the step of independently processing the rewrite sub-tasks assigned thereto by each rewrite process, the completion of all the rewrite processes may be obtained and when all the rewrite processes have been completed, the rewrite task for the large object-column is reported to completion.
In a specific example, the overwriting task may be divided into a number of overwriting sub-tasks by monitoring a session process performed by the entire overwriting task, and the divided overwriting sub-tasks may be performed in parallel by a group of overwriting processes under the monitoring of the session process. After each rewriting subtask is executed, the execution result is reported to the session process responsible for monitoring through the rewriting subtask completion notification queue. And after all the rewriting subtasks are executed, the session process responsible for monitoring displays the execution result to the user.
In some embodiments, the step of obtaining an event for overwriting data in a large object column in the database may be to monitor a modification instruction for the large object column, where the modification instruction may be from a user operation, an application request, or an automation script, and the modification instruction includes a modification of a data type, a modification of a storage parameter, and the like, and when the modification instruction is monitored, the event for overwriting data in the large object column is triggered.
Because different data types occupy different storage spaces in the database, and the data representation is different, after converting the data types, the database management system typically needs to modify the stored data to accommodate the new data types. When the user modifies the data type of a certain column, for example, change from int to long or double, and the system monitors the modification operation, an event of overwriting is triggered, and the data in the table needs to be overwritten.
In some embodiments, modifying storage parameters (e.g., compression, encryption, table space, etc.) also means that the storage format or location of the data changes, which also triggers overwriting of the data.
Fig. 8 is a flowchart of creating an overwrite subtask queue in a large object overwrite method of a database according to one embodiment of the present invention, and as shown in fig. 8, creating an overwrite subtask queue includes at least the following steps S801 to S803.
In step S801, an empty queue is initialized in the memory as a rewriting subtask queue.
In this step, the system creates a new, empty queue in its memory space for storing the subsequently divided plurality of overwriting sub-tasks. The operation of initializing the queue may prepare a portion of the work, ensuring that the queue is in a clean and ready state when tasks are added.
Step S802, dividing the rewriting task of the large object column into a plurality of rewriting subtasks according to the partition condition of the large object column.
In this step, the system splits the overall rewrite task into multiple smaller, more manageable subtasks based on the partitioning of the large object columns to process the subtasks in parallel, thereby increasing overall processing speed and efficiency. Each subtask may correspond to a data overwrite job for one or more partitions of a large object column.
In step S803, a plurality of rewriting subtasks are sequentially added to the rewriting subtask queue.
In this step, the system will add the subtasks to the previously initialized re-written subtask queue in a certain order (possibly in partition order, data size order, or other logical order). Thus, the queue forms a list of tasks to be processed, and subsequent processing logic can fetch tasks from the queue for execution.
Thus, by creating a queue containing multiple re-write subtasks, a basis can be provided for subsequent task scheduling and execution. This divide-and-conquer approach is very effective in handling large amounts of data, and can improve throughput and response speed of the system. Meanwhile, the functions of priority scheduling, concurrent processing and the like of the tasks are conveniently realized through queue management tasks, and the flexibility and the expandability of the system are further improved.
In one example, the step of dividing the rewrite task of the large object column into a plurality of rewrite subtasks according to the partition condition of the large object column may be to allocate one rewrite subtask for each large object column that is not partitioned and one rewrite subtask for each column partition of each large object column that is partitioned.
In particular, the system may determine whether a large object column is partitioned by examining metadata information of the database to see the configuration and properties of each large object column. For each large object column that is not partitioned, the system may assign it a separate rewrite sub-task that will be responsible for handling the data rewrite work for that large object column, as there is no partition, so no further splitting is needed. For each partitioned large object column, the system may further analyze its partition conditions, including information on the number of partitions, the size of each partition, the data distribution, etc. Based on analysis of the partition condition, the system can allocate a separate rewriting subtask for each partition, so that the data rewriting work of each partition can be ensured to be independently carried out, and parallel processing is realized.
Of course, the above examples are merely illustrative for explaining the concept of how the rewrite sub-task is partitioned according to the partition situation of the large object column. In practical applications, a database administrator or developer may adjust and optimize the task partitioning strategy according to specific circumstances to ensure the efficiency and reliability of the rewrite operation.
Fig. 9 is a flowchart of creating a rewrite process group in a large object rewrite method of a database according to one embodiment of the invention, and as shown in fig. 9, creating a rewrite process group may include the following steps S901 to S904.
Step S901, obtaining a queue length of a rewriting subtask queue and a parallelism parameter of a database rewriting task. The value of the queue length of the re-write subtask queue represents the number of re-write subtasks waiting to be processed in the current queue, and the value of the parallelism parameter of the database re-write tasks represents the maximum number of simultaneous operations of the re-write processes allowed by the database system.
Step S902, determining whether the queue length of the rewriting subtask queue is greater than the parallelism parameter of the database rewriting task, if yes, executing step S903, and if not, executing step S904.
In step S903, a rewriting process group having the number of processes equal to the parallelism parameter value of the database rewriting task is created. It will be appreciated that when the queue length of the re-write subtask queue is greater than the parallelism parameter of the database re-write task, meaning that there are more re-write subtasks to process, but the system resources (e.g., CPU cores or threads) may not be sufficient to process all re-write subtasks simultaneously, at which point the system will create a set of re-write processes equal to the parallelism parameter value, which ensures that the system will not attempt to create too many processes, resulting in exhaustion of system resources.
In step S904, a rewriting process group having the number of processes equal to the queue length value of the rewriting subtask queue is created. It will be appreciated that when the queue length of the re-write subtask queue is less than or equal to the database re-write task parallelism parameter, meaning that the system has sufficient resources to process all re-write subtasks simultaneously, the system will create a number of re-write processes equal to the queue length value of the re-write subtask queue, which ensures that all re-write subtasks can be processed in time without causing a task delay due to an insufficient number of processes.
In summary, in this way, the system is able to dynamically adjust the number of overwriting processes to accommodate different sizes of overwriting tasks and workloads, which helps to improve the response speed and overall performance of the system.
In the solution of this embodiment, the object types of the large object data items may be divided according to the storage manner, where the objects stored in the rows are defined as intra-row objects, for example, the small objects stored in the rows above, and the objects stored outside the rows are defined as extra-row objects, for example, the small objects stored outside the rows above and the large objects stored outside the rows above.
The data sizes of the small objects stored in the row and the small objects stored outside the row are smaller than a preset threshold, and the data size of the large objects stored outside the row is larger than the preset threshold and is divided into a plurality of storage slices for storage. The preset threshold may be 32K, but of course, may be other values, which are not limited in this embodiment.
Metadata and data bodies of the in-line objects are stored in the data table rows of the database, part of metadata of the out-of-line objects are stored in the data table rows of the database, and the rest of metadata and data bodies are stored outside the data table rows of the database.
For the in-line objects, because all the information is stored in the data table line in a centralized way, the rewriting process can quickly locate and modify the objects, so when the rewriting process processes the distributed rewriting subtasks, if the rewriting subtasks relate to the in-line objects, the rewriting process can directly rewrite the in-line objects in the data table line. This direct access and modification approach can significantly improve the efficiency of overwriting since no additional I/O operations are required to read or write data from other storage locations.
In the process of processing the in-line object, the rewriting process firstly acquires the object data of the in-line object, then restores the acquired object data to obtain the original object data, and finally rewrites the original object data according to the given storage parameters.
In the scheme of the embodiment, referring to fig. 3 and 4,2 types of control blocks are designed for high-efficiency access to large objects with large volume differences. The control block replaces the large object itself in the data base table row. For small objects (less than or equal to 32K), if the user sets the allowable row storage, the small objects can be directly stored in a control block, so that the read-write flow is simplified, and the read-write speed is improved. For large objects, only the version, attribute, and UUID of the large object of the control block are used to direct the database to find the large object stored outside the row.
In a specific example, the overwriting process may obtain object data from a control block of an in-line object, then decrypt and decompress the obtained object data according to the current storage parameter to obtain original object data, and finally compress and encrypt the original object data according to the given storage parameter.
For the off-line objects, since the data body is stored outside the data table line, the rewriting process cannot directly modify the objects, so when the rewriting process processes the distributed rewriting subtask, if the rewriting subtask involves the off-line object, the rewriting process can write the object identification of the off-line object into the off-line object temporary table pre-created in the database, then re-scan the off-line object temporary table, and rewrite the scanned off-line object. The processing mode allows the rewriting process to flexibly operate the off-line object, avoids traversing the whole database, and improves the rewriting efficiency.
When the off-line object is a small object, the step of rewriting the scanned large object data item may be to obtain object data in an object record of the large object data item, restore the obtained object data to obtain original object data, rewrite the original object data according to a given storage parameter, and update the storage parameter and the object record of the large object data item.
In a specific example, the overwriting process may read the first 32K data of the small object, that is, the prefetch field in the object record, then decrypt and decompress the object data in the prefetch field according to the current storage parameter to obtain the original object data, then compress and encrypt the original object data according to the given storage parameter, and finally update the storage parameter and the object record of the small object.
When the off-line object is a large object, the step of rewriting the scanned large object data item may be to obtain slice data in slice records of all storage slices of the large object, then restore the obtained slice data one by one to obtain original slice data, then rewrite the original slice data according to a given storage parameter, and finally update the slice records of the storage slices.
In a specific example, the overwriting process may read the ext_data field in the slice record of the stored slice, then decrypt and decompress the slice data in the ext_data field according to the current storage parameter to obtain the original slice data, then overwrite the original slice data according to the given storage parameter, and finally update the slice record of the stored slice.
After updating the slice record of the rewritten storage slice, the object data in the object record of the large object can be acquired, the acquired object data is restored to obtain the original object data, the original object data is rewritten according to the given storage parameters, and the storage parameters and the object record of the large object are updated.
In a specific example, after updating the slice record of the rewritten storage slice, the rewriting process reads the first 32K data of the large object, that is, the prefetch field in the object record, then decrypts and decompresses the object data in the prefetch field according to the current storage parameter to obtain the original object data, then compresses and encrypts the original object data according to the given storage parameter, and finally updates the storage parameter and the object record of the large object.
In the scheme of the embodiment, the ext_data field of the object record contains a storage slice index block, and the storage slice index block records the arrangement sequence, the original data logic length and the unique identifier of all storage slices of the large object data item. The logical length referred to herein means a length calculated in units of original data, the unit being bytes for BLOB and characters for CLOB/NCLOB. In the slice record of the storage slice, the ext_data field contains data of the portion above the large object 32K. Whether the slice index block or the slice data is stored, the slice index block or the slice data is stored according to the setting of the object storage parameter, and compression and encryption are possible. That is, for a large object stored outside a row, the index block, the data in the object record, and the data in the slice record are all compressed and encrypted with the storage parameters of the large object.
In some embodiments, the step of acquiring slice data in slice records of all storage slices of the large object data item may be to acquire a storage slice index block of the large object data item, restore the acquired storage slice index block to obtain an original index block, and acquire slice data in slice records of all storage slices of the large object data item according to the original index block.
In the scheme of the embodiment, the object record and the storage slice record are placed in the same object and storage slice table, UUIDs are uniformly adopted as the primary keys, and therefore two targets of the object and the storage slice can be quickly searched by only using one primary key index. An object record, just like a special memory slice, is an entry for accessing large object data. When the data after 32K is needed to be accessed, UUIDs of storage slices to be accessed can be searched through a storage slice index block stored in an ext_data field of the object record, and then the storage slice records are searched by using a main key index on the table. Since the whole process only needs to make equality query and does not need to make range query, a Hash index more efficient than a B-Tree index can be utilized.
Overall, the process of overwriting large objects according to a given storage parameter can be roughly divided into three phases, a preparation phase, an in-row object overwriting phase, and an out-of-row object overwriting phase, wherein
1. The preparation stage is to query and buffer the storage parameters of all versions of the large object column or column partition, so as to query the parameters used in the actual storage of each object in the subsequent operation.
2. And a stage of rewriting the in-line object, namely, rewriting the in-line object according to the condition scanning data base table of a given large object column or column partition, and recording the unit_id of the out-of-line object in an out-of-line object temporary table by the way, wherein the operation of the stage can be executed in a large transaction.
3. And in the stage of rewriting the objects outside the line, the temporary table of the objects outside the line is scanned sequentially, object data are rewritten one by one, each object starts an independent transaction, all storage slice records are rewritten first, then the object records are rewritten, and then the transaction is submitted. After all the off-line objects are processed, deleting the off-line object temporary table.
For new objects inserted by the user during execution of the task described above, the latest storage parameters will be used for storage, so that no further overwriting is required.
The large object overwriting method of the present embodiment supports overwriting of the stock data after modification of storage parameters (including encryption and compression) of the large object column or column partition, and also supports data de-duplication in units of the large object column or column partition. For tasks on each large object column or column partition, the transaction isolation level is Read Committed (Read Committed) to be performed as an independent transaction per object. In the object and storage slice table, each object record contains a field store_ver of the current storage parameter, which is used for explaining the storage parameter of the data of the object. In the storage parameter history table, the storage parameter of each large object column or column partition is stored, and contains a value set each time since its creation. Referring to fig. 2, the storage parameter history table has an encryption & compression task identification field enc_task_id of UUID type for identifying the data rewriting task corresponding to this version of storage parameter, and is randomly generated when creating the large object column encryption and compression task, and registered in a hash table in the memory. If the record in the table identifies that the encryption and compression task in the current memory is not found in the table, then the table's tab oid and store ver fields together uniquely identify a storage parameter version of a large object column or column partition in the library, representing that the process to perform the task has been exited.
In the scheme of the embodiment, the rewriting task is executed in the form of a background low-priority task and is split into small matters, and even if a database is stopped in the execution process or a user re-initiates the task with different parameter settings, the data can be ensured to be processed according to the latest settings due to the fact that the historical parameters of the tasks are recorded.
In addition, in the case that the database supports multi-user concurrent access, the execution state of the background progressive task needs to be detected to avoid task conflict. For this purpose, when a background progressive task is created, a UUID is randomly generated as its task identity. The task identification of all background progressive tasks being executed will be registered in a Hash table in the memory. The key of the Hash table is a task identifier, and the value is attribute information of the task. The task identity may also be recorded in a large object-specific storage parameter history table (for data overwriting tasks) or a duplication elimination task history table (for duplication elimination tasks). After the task is normally executed or actively canceled by the user, the task entry is deleted from the Hash table.
Taking a data rewriting task as an example, when the system reads the object data, if the storage parameter is not the latest version, the system can find the corresponding record in the special storage parameter history table of the large object according to the OID and the storage parameter version number of the storage table, reads the ID of the data rewriting task in the record, and compares the ID with the Hash table of the data rewriting task in the memory. If this task ID can be found from the Hash table, it is indicated that the relevant task process is still executing, otherwise, it is indicated that the relevant task has stopped, possibly due to an unexpected shutdown, or due to a user specifying a new storage parameter, it is cancelled halfway. If the overwriting task finds that the overwriting task using different version of storage parameters is executing on the same large object column or column partition in the executing process, the storage parameter version numbers are compared. The overwriting process with the small storage parameter version number actively exits, and the overwriting process with the large storage parameter version number continues to execute.
The task management strategy combines the uniqueness of UUID, the quick query capability of the Hash table and the logical judgment of version number comparison, and provides an efficient and reliable task conflict avoidance mechanism for the multi-user concurrent access environment of the database. This helps to ensure the stability and performance of the database system and to meet the demands of daily business.
The flowcharts provided by this embodiment are not intended to indicate that the operations of the method are to be performed in any particular order, or that all of the operations of the method are included in all of each case. Furthermore, the method may include additional operations. Additional variations may be made to the above-described methods within the scope of the technical ideas provided by the methods of the present embodiments.
It is to be understood that in some embodiments, portions may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
The present embodiment also provides a computer-readable storage medium 20 and a computer device 30. Fig. 10 is a schematic diagram of a computer-readable storage medium 20 according to one embodiment of the present invention, and fig. 11 is a schematic diagram of a computer device 30 according to one embodiment of the present invention.
A computer-readable storage medium 20, on which the above-mentioned computer program 21 is stored, which computer program 21, when executed by a processor 32, implements the steps of the large object overwriting method of the database of any of the above-mentioned embodiments. The computer device 30 may include a memory 31, a processor 32, and a computer program 21 stored on the memory 31 and running on the processor 32.
The computer program 21 for carrying out operations of the present invention may be assembly instructions, instruction set architecture (Instruction Set Architecture, ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, configuration data for an integrated circuit, or source or object code written in any combination of one or more programming languages and procedural programming languages. The computer program 21 may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (Local Area Network, LAN) or a wide area network (Wide Area Network, WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, in order to perform the aspects of the invention, electronic circuitry, including, for example, programmable logic circuitry, field-Programmable gate array (FPGA) GATE ARRAY, or Programmable logic array (Programmable Logic Array, PLA), may be implemented by using state information of the computer-readable program instructions to personalize the electronic circuitry.
For the purposes of this description of embodiments, a computer-readable storage medium 20 is a tangible apparatus that can retain and store a computer program 21, which can be any means that can contain, store, communicate, propagate, or transport the program 21 for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable storage medium 20 include a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, and any suitable combination of the foregoing.
The computer device 30 may be, for example, a server, a desktop computer, a notebook computer, a tablet computer, or a smartphone. In some examples, computer device 30 may be a cloud computing node. The computer device 30 may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer device 30 may be implemented in a distributed cloud computing environment where remote processing devices coupled via a communications network perform tasks. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.
The computer device 30 may comprise a processor 32 adapted to execute stored instructions, a memory 31 providing temporary storage for the operation of the instructions during operation. Processor 32 may be a single-core processor, a multi-core processor, a computing cluster, or any number of other configurations. Memory 31 may include Random Access Memory (RAM), read only memory, flash memory, or any other suitable storage system.
The computer device 30 may also include a network adapter/interface and an input/output (I/O) interface. The I/O interface allows data to be input and output with external devices that may be connected to the computer device. A network adapter/interface may provide communications between computer devices and a network, typically shown as a communications network.
By now it should be appreciated by those skilled in the art that while a number of exemplary embodiments of the invention have been shown and described herein in detail, many other variations or modifications of the invention consistent with the principles of the invention may be directly ascertained or inferred from the present disclosure without departing from the spirit and scope of the invention. Accordingly, the scope of the present invention should be understood and deemed to cover all such other variations or modifications.