[go: up one dir, main page]

WO2022266889A1 - Methods for the self-sufficient maintenance of database object storage - Google Patents

Methods for the self-sufficient maintenance of database object storage Download PDF

Info

Publication number
WO2022266889A1
WO2022266889A1 PCT/CN2021/101862 CN2021101862W WO2022266889A1 WO 2022266889 A1 WO2022266889 A1 WO 2022266889A1 CN 2021101862 W CN2021101862 W CN 2021101862W WO 2022266889 A1 WO2022266889 A1 WO 2022266889A1
Authority
WO
WIPO (PCT)
Prior art keywords
cleanup
cooperative
statistics
blocks
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2021/101862
Other languages
French (fr)
Inventor
Ronen Grosman
Sherman Lau
Jienan YAO
Kristian LEJAO
Jonathan Qicheng HU
Kevin Gu
Vipul Sharma
Zijie ZHANG
Qiang Li
Qi Wu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to PCT/CN2021/101862 priority Critical patent/WO2022266889A1/en
Publication of WO2022266889A1 publication Critical patent/WO2022266889A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Definitions

  • the present disclosure pertains to the field of storage maintenance methods and in particular to storage maintenance methods that reduce the overall storage footprint of an online transactional processing (OLTP) database system.
  • OTP online transactional processing
  • One or more of these methods includes an approach to database cleanup that balances reducing the amount of redundant data in a database with the impact of removing this redundant data.
  • Database management includes removing (cleaning) data that is no longer valid.
  • An aspect of the present disclosure provides a method for database management. This method can be executed by a processor. The method includes receiving a user operation. The method further includes evaluating a free space map (FSM) to determine if there are sufficient blocks with free space to execute the received user operation. The method further includes performing a cooperative cleanup when the FSM indicates there are insufficient blocks with free space. The method further includes executing the received user operation.
  • FSM free space map
  • a technical benefit is that if the method can perform cooperative cleanup to removed dead records, the space freed by removing these dead records can be recycled and the record does not need to be extended.
  • the method further includes performing the cooperative cleanup further comprises evaluating statistics.
  • a technical benefit is that cooperative cleanup is adaptive because it is based on evaluating statistics.
  • evaluating statistics comprises evaluating a dead tuple percentage statistic.
  • Atechnical benefit is that cooperative cleanup is performed based on an evaluation of the dead tuple percentage statistic.
  • cooperative cleanup further comprises scanning one or more candidate blocks for cleanup.
  • a technical benefit is that blocks that are selected for evaluation based on evaluation of statistics are then scanned to determine if they are candidates for cleanup.
  • cooperative cleanup further comprises evaluating statistics to determine which of the one or more candidate blocks for cleanup are scanned.
  • a technical benefit is that candidate blocks for cleanup are scanned to determine if they are candidates for cleanup.
  • evaluating statistics comprises evaluating a prune success rate statistic.
  • a technical benefit is that the number of candidate blocks for pruning is based on the previous number of successfully pruned blocks.
  • evaluating statistics further includes balancing which candidate blocks for cleanup are scanned with minimizing the effect of scanning candidate blocks for cleanup.
  • a technical benefit is that performance is not overly affected by cooperative cleanup.
  • evaluating statistics further comprises locking candidate blocks for cleanup before scanning the candidate blocks for cleanup.
  • a technical benefit of locking is that it prevents two or more threads from accessing the blocks storing the record at the same time.
  • cooperative cleanup further comprises evaluating a transaction identifier and array size.
  • a technical benefit is that the candidate block start index is determined by evaluating the transaction identifier and array size.
  • cooperative cleanup further comprises eliminating concurrent pruning of the candidate blocks for cleanup by temporarily invalidating a first candidate block of the candidate blocks for cleanup.
  • evaluating statistics further comprises evaluating the prune success rate statistic using a total number of scanned blocks and a total number of pruned blocks.
  • a technical benefit of cooperative cleanup’s evaluation of the dead tuple percentage statistic and a prune success rate statistic is that this evaluation makes cooperative cleanup adaptive.
  • cooperative cleanup further comprises marking successfully cleaned candidate blocks for cleanup in the FSM.
  • marking successfully cleaned candidate blocks can be recycled.
  • cooperative cleanup further comprises marking successfully cleaned candidate blocks for cleanup in the FSM when an update probability is less than a random value.
  • Atechnical benefit is that this random value can be used to control the tradeoff between space utilization and performance.
  • cooperative cleanup further comprises determining the update probability using free space of a relation and a block size and a fill factor and a user defined probability.
  • a technical benefits are constant cleaning instead of spikey performance and integrated space maintenance versus asynchronous or user driven maintenance.
  • cooperative cleanup further comprises unlocking the locked index before executing the received user operation.
  • a technical benefit is that the unlocked index can be recycled and can later be a candidate for pruning by another relation.
  • cooperative cleanup further comprises one or more candidate blocks for cleanup including index records, data records, or both.
  • a technical benefit is that lifetime of an index record can be determined so that it doesn’ t remain after the associated data record has been cleaned.
  • cooperative cleanup further comprises use of a potential empty block queue and an available block queue when cleaning index records.
  • a technical benefit is that potentially empty block queue can track potentially empty blocks and these blocks can then be moved from this queue into the available block queue for reuse during a new block request.
  • cooperative cleanup further comprises receiving the user operation including an insert, delete, or update operation.
  • a technical benefit is that cooperative cleanup can be performed when an insert, a delete, and an update operation is requested.
  • a further aspect of the present disclosure is a device that includes a processor and non-transient memory.
  • the processor executes instructions to implement a statistics cache (SC) for database management.
  • SC statistics cache
  • the SC includes a new statistics cache stored in non-transitory memory.
  • the SC also includes a lightweight thread that loads new statistics into the new statistics cache.
  • the SC also includes an active statistics cache stored in the non-transitory memory and cooperative cleanup uses statistics read from the active statistics cache to manage the database.
  • a technical benefit is that statistics are read from the active statistics cache and as a result limit the interaction, and also limit the impact on performance, by reading data placed in the active statistics cache instead of accessing data directly from disk files used to store the statistics.
  • the lightweight thread loads the new statistics into the new statistics cache every configurable time period.
  • a technical benefit is that the frequency of the lightweight thread loading these statistics can be configured to reduce context switching time during thread execution.
  • the active statistics cache includes new statistics copied from the new statistics cache after loading the new statistics into the new statistics cache completes.
  • a technical benefit is that input/output operations and can be reduced and this also stops the statistics collector process from being flooded with information requests.
  • FIG. 1 illustrates a flow chart representation of a prior art database cleanup method.
  • FIG. 2 illustrates a flow chart of a database cleanup method according to an embodiment of this disclosure.
  • FIG. 3 illustrates a lightweight thread used to load statistics into the new statistics cache according to an embodiment of this disclosure.
  • FIG. 4 illustrates a record according to an embodiment of this disclosure.
  • FIG. 5 illustrates a cooperative cleanup method according to an embodiment of this disclosure.
  • FIG. 6 illustrates a flow chart for returning cleaned blocks to the free space map according to an embodiment of this disclosure.
  • FIG. 7 is a graph illustrating the database size difference between prior art database cleanup and a database cleanup method according to an embodiment of this disclosure.
  • FIG. 8 illustrates another database cleanup method according to an embodiment of this disclosure.
  • FIG. 9 is a block diagram of an electronic device 952 within a computing and communications environment 590 that may be used for implementing devices and methods in accordance with representative embodiments of the present disclosure.
  • Various embodiments of this disclosure use methods to remove database data that can no longer be accessed. This data can be removed in a way that reduces the impact of the data removal on database performance. This disclosure provides these methods.
  • OLTP is a category of data processing that is used by transaction oriented database management tasks.
  • OLTP database processing consists of a large number of transactions requested by a large number of users.
  • Database data can be organized into a table of columns and rows. Each row can represent a set of related data and the row can be commonly referred to as a record or tuple by those skilled in the art.
  • a relation can include one or more records.
  • a block is a fixed amount of memory that can store record data.
  • page can be used to refer to an in memory representation of a block.
  • Relations can be allocated one or more blocks to store record data and a free space map (FSM) can be read to evaluate the amount of available space in the relation. This available space can be referred to as free space by those skilled in the art.
  • FSM free space map
  • a computer process can use one or more user operations to insert, update, or delete data in one or more database records.
  • the user operation can include the data and also the address of the record where the data is to be updated or amended.
  • the user operation can include the address of the record where the data is to be deleted.
  • the database records can be stored by an electronic device and the electronic device can include one or more processors and non-transitory memory.
  • the user operation can be received by the electronic device and the electronic device’s processor can process the user operation by executing instructions that can be stored in the electronic device’s non-transitory memory. When executed, these instructions can instruct the processor on how to access the addressed record and how to insert, update or delete the desired record’s user data.
  • User operations can include insert, update, and delete operations and these operations are used to insert new data into a record, update data in a record, or delete record data. Before performing these operations, the FSM can be accessed to evaluate if a relation has enough free space to hold the data to be inserted or updated. Enough free space can be referred to as one or more sufficient blocks to store the data.
  • relation When the relation has insufficient blocks to store the data to be inserted or updated, unless the relation includes one or more records that can no longer be accessed and can be cleaned up, one or more additional blocks are allocated to the record. Records that can no longer be accessed are known as dead records by those skilled in the art and those skilled in the art can also refer to the percentage of dead records in relation to the total number of records as a dead record percentage or a dead tuple percentage.
  • the allocation of additional blocks to the relation can be referred to as relation extension and cleaning up a relation can refer to pruning blocks allocated to the one or more dead records in the relation.
  • Cooperative cleanup is relation cleanup that can reuse pruned blocks to store inserted or updated record data. Those skilled in the art will understand that pruning can mean physically deleting records from the database.
  • Cleaning database records can require the use of locking as a form of concurrency control.
  • Concurrency control by locking a database record is achieved by preventing two or more threads from accessing the blocks storing the record at the same time. The record is unlocked and can be accessed by the next user operation when the user operation that caused the database record to be locked completes.
  • One or more methods included in this disclosure can use cooperative cleanup to reduce the impact of cleanup time on performance by balancing the number of database records cleaned with the impact of cleaning time. These methods can also reduce locking duration, increase system predictability, promote reuse before extending the record, and promote concurrency. These methods can be used, provided that they are performed when the dead tuple percentage exceeds a predefined threshold, during insert, delete, or update operations or during a sequential table scan.
  • Methods included in this disclosure can collect statistics related to successful and unsuccessful attempts to cleanup each block. Cooperative cleanup can adaptively select blocks for cleaning, and also ensure exhaustive and distributed cleaning, based on these successful and unsuccessful cleanup statistics.
  • Cooperative cleanup can be performed when the entire table has more dead records than a predefined threshold. Therefore, determining if cooperative cleanup can be performed can include evaluating statistics such as the number of dead records (the dead tuple percentage statistic) . It should be appreciated that the dead tuple percentage statistic can be read from a statistics cache.
  • cooperative cleanup When cooperative cleanup is performed, an array of candidate blocks for cleanup can be created. These candidate blocks for cleanup may or may not have any dead records.
  • cooperative cleanup can include further evaluating statistics such as the prune success rate statistic to determine the number of blocks that may be scanned. As a result, not all candidate blocks in this array may be scanned to determine if they can be cleaned. It should be appreciated that the prune success rate statistic can be read from a statistics cache.
  • Cooperative cleanup then scans the first candidate block included in the array and attempts to prune the page of the candidate block. This page can be reused if it can be pruned and has enough space to store new data. Otherwise, the next candidate block in the array can be scanned.
  • a person skilled in the art can appreciate that cooperative cleanup’s evaluation of the dead tuple percentage statistic and a prune success rate statistic can make cooperative cleanup adaptive.
  • FIG. 1 illustrates a prior art method used to insert data into a database relation.
  • the insert method 110 begins by reading FSM 120 to evaluate if the current relation has sufficient blocks with free space to hold the data to be inserted. When there are sufficient blocks, inserted data is stored using free space 130. However, the relation is extended 140 if the record has insufficient blocks to store the data to be inserted.
  • FIG. 1 also illustrates how a background autovacuum daemon 115 process used by a PostgreSQL database can cleanup space occupied by dead tuples, and as a result mark the freed space in the FSM.
  • FIG. 2 illustrates a flow chart of a database cleanup method according to an embodiment of this disclosure.
  • FIG. 2 illustrates using cooperative cleanup 210 when insert 110, as a non-limiting example of a user operation, is received and the relation has insufficient blocks to store the data to be inserted.
  • insert 110 can begin by evaluating FSM 120 to determine if the relation has sufficient blocks to store the data to be inserted. If the relation has sufficient blocks, free record space 130 can be used.
  • cooperative cleanup 210 can be used to attempt cleanup based on an evaluation of the dead tuple percentage statistic and a determination that the relation has insufficient blocks. If cooperative cleanup is performed, an array of candidate blocks for cleaning can be created.
  • the prune success rate statistic can be evaluated before scanning these candidate blocks for cleanup to determine which candidate blocks included in the array are to be scanned.
  • the one or more cleaned candidate blocks can be used to store the record data.
  • one or more cleaned candidate blocks can be marked in the FSM so that they can be recycled and used to store data inserted by future user insert or update operations.
  • the relation can be extended 140 when cooperative cleanup 210 fails.
  • FIG. 3 illustrates an embodiment of this disclosure that can include a statistics cache GlobalMemoryContext 310 that can be stored in non-transient memory.
  • GlobalMemoryContext 310 can include two statistics caches –new statistics cache 330 and active statistics cache 320. These two caches can be included to allow statistics to be updated without locking the locations containing statistics read at the same time by backend threads 340a, 340b, 340c.
  • FIG. 3 also illustrates lightweight thread 350.
  • Light weight thread 350 can be used to update the contents of new statistics cache 330 with statistics read from statistics file 360.
  • a lightweight thread is known by those skilled in the art as a computer process that can be used to share resource information with other threads. This sharing can be done to reduce context switching time during thread execution.
  • This embodiment can include the ability to configure the frequency, or time period, of when lightweight thread 350 updates new statistics cache 330. It should be appreciated that while differences between new and active statistics cache are shown, in some embodiments these caches can be combined.
  • new statistics cache 330 can be copied into active statistics cache 320 so that cooperative cleanup backend threads 340a, 340b, and 340c can access the most up to date statistics.
  • This configuration of a new statistics cache 330 and an active statistics cache 320 can be more efficient than accessing the statistics file directly because direct access can require a large number of input/output operations and can also flood a statistics collector process (not shown) with information requests.
  • the insert operation can be a non-limiting example of a user operation that can use cooperative cleanup, and therefore also the evaluation of statistics read from active statistics file 320, to cleanup blocks allocated to a relation.
  • the dead tuple percentage statistic and also the previously collected prune success rate statistic can be evaluated to determine if the relation has sufficient candidate blocks that can be cleaned, scanned and then used to store one or more records to be inserted.
  • Cooperative cleanup can create an array of start blocks per relation to determine candidate blocks for cleanup.
  • the size of this array is known by those skilled in the art as the starting block array size (or more simply as the array size) .
  • the operation can be assigned a unique identifier that is known by those skilled in the art as a transaction identifier.
  • a candidate block start index related to these start blocks can be evaluated to determine the candidate block’s starting point. As a non-limiting example, if the transaction identifier is mapped to index 3, scanning can begin at block 2. Also, pruning of block 2 can be attempted.
  • block 2 can be marked in the FSM and the transaction identifier mapped to index 3 can be updated to the next block.
  • the next block can be block 22 due to the array size being 20 and also because the next element at index 3 would be (20 + value at the current index) .
  • the next transaction identifier mapped to index 3 can attempt to prune block 22 and won’ t repeat block 2.
  • the candidate block start index can be determined by evaluating the transaction identifier and the starting block array size and can be given by:
  • the one or more candidate blocks can be locked and scanned to determine if they can be cleaned.
  • start block array can be indexed by start block [1234 %20] . Since the result of 1234 %20 is 14, start block [14] can begin at the fourteenth element of the start block array.
  • the fourteenth element of the start block array in this non-limiting example is block 13.
  • One or more relations can undergo cooperative cleanup at the same time.
  • Each relation undergoing cooperative cleanup can have its own array of start blocks because relations do not share block numbers. As a result there will be no conflict across relations.
  • candidate blocks locked by the first cooperative cleanup cannot be accessed by one or more other cooperative cleanups operating on the same relation. Locking candidate blocks to restrict access has the effect of temporarily invalidating these locked candidate blocks.
  • temporarily invalidating candidate blocks can eliminate concurrent pruning.
  • Candidate blocks can be locked one at a time and then scanned adaptively to determine if the candidate block can be cleaned using the previously collected prune success rate statistic.
  • a person skilled in the art will understand “which of the locked candidate blocks can be scanned” can mean the number of candidate blocks in the array that can be scanned.
  • Candidate blocks can be adaptively evaluated for scanning because the prune success rate statistic can be evaluated using a total number of scanned blocks and a total number of pruned blocks which can be given by:
  • Adaptive evaluation can occur because as Equation (2) shows, as the number of pruned blocks increases, more candidate blocks can be scanned. Also, less candidate blocks can be scanned as the number of pruned blocks decreases. As a non-limiting example, if the total number of scanned blocks is 100 (scanned blocks statistic is 100) and the total number of pruned blocks is 50 (pruned blocks statistic is 50) , the prune success rate statistic is 50%. In another non-limiting example, the total number of scanned blocks statistic can again be 100 but if the total number of pruned blocks can be 75 (pruned blocks statistic is 75) , the prune success rate statistic is 75%. As a result, the number of candidate blocks that are scanned can be adaptive because in the first example the number of candidate blocks scanned is 50%whereas the number of candidate blocks scanned for the second example is 75%.
  • An array of start blocks for a relation can be created and can include block 420.
  • a second array of start blocks can be created for the relation and can include block 460.
  • Backend thread 1 410 can be related to a first cooperative cleanup of the relation.
  • Block 420 can be determined to be the first candidate block for cooperative cleanup as a result of evaluating a candidate block start index related to this relation.
  • candidate block 420 can be locked and then scanned to determine if candidate block 420 can be cleaned.
  • the next candidate block for this relation can be determined to be block 430.
  • Candidate block 430 can be locked and then scanned to determine if it can be cleaned.
  • the next candidate block can be block 440 and this block can be locked and then scanned to determine if it can be cleaned.
  • Backend thread 2 450 can be related to a second cooperative cleanup of the relation as a result of evaluating the candidate block start index related to this record.
  • Block 460 can be determined to be the first candidate block for cooperative cleanup of the record.
  • Block 460 can be determined to be the first candidate block for cooperative cleanup as a result of evaluating a candidate block start index related to this record.
  • candidate block 460 can be locked and then scanned to determine if candidate block 460 can be cleaned.
  • the next candidate block for this record can be determined to be block 470.
  • Candidate block 470 can be locked and then scanned to determine if it can be cleaned.
  • the next candidate block can be block 480 which can be locked and then scanned to determine if it can be cleaned.
  • the next candidate block can be block 490 which can be locked and then scanned to determine if it can be cleaned. The cooperative cleanup of this record ends after candidate block 490 has been scanned.
  • FIG. 5 illustrates the cooperative cleanup process.
  • backend transaction identifier 530 which is associated with a record being inserted, deleted, or updated (as a non-limiting example with transaction identifier 1234) accesses GlobalMemoryContext 310 to access StartBlockMap 512 to generate a start block array.
  • FIG. 5 also illustrates start block array’s twenty elements which can include element 0 560, element 1 562, element 2 564, element 3 566, element 4 568, element 18 570, and element 19 572.
  • FIG. 5 also illustrates the memory blocks used to store records as block 0 540, block 1 542, block 2 544, block 3 546, block 4 548, block 20 550, block 21 552, block 22 554, block 23 556, and block 24 558.
  • FIG. 5 illustrates the candidate block start index determined by evaluating Equation (1) that results in the first candidate block for cooperative cleanup being determined to be mapped to start block [4] (which is also element 3 566) .
  • Start block [4] can be locked and temporarily invalidated.
  • Element 3 566 points to the first candidate block for cleanup, memory block 3 546.
  • the candidate block start index can be evaluated to determine which of the one or more candidate blocks for cleanup can be scanned by balancing the number of scanned candidate blocks with minimizing the effect of scanning these candidate blocks.
  • the effect of scanning that is minimized can include the time spent pruning blocks.
  • FIG. 5 illustrates, the evaluation of the candidate block start index results in block 3 546 being scanned by backend thread 596.
  • Backend transaction identifier 530 can also access GlobalMemoryContext 310 to access Active Statistics Cache 320 when status 340 indicates that statistics are available.
  • the prune success rate statistic can be read from Active Statistics Cache 320.
  • FIG. 5 illustrates that memory block 23 556 is the last block scanned as a result of the evaluation of the dead records statistic and the pruned block statistic.
  • FIG. 5 also illustrates that after block 23 556 is scanned and cleaned, it can be unlocked before being marked in the FSM 594. The received user operation can be executed once the candidate block is unlocked.
  • FIG. 5 also illustrates a second backend thread 592 related to the cooperative cleanup of a relation.
  • This second backend thread 592 is also mapped to element 3 556 but for this second backend thread, the candidate block for cleanup is memory block 23 556 plus the size of the start block array.
  • cooperative cleanup can be triggered when a record’s space consumption approaches 90%.
  • cooperative cleanup can be integrated in a user operation so that cooperative cleanup can be automatically performed along with the operation.
  • the integrated cooperative cleanup can free more space than the space required by the user operation. This integrated cooperative cleanup can result in reduced locking duration and increased system predictability.
  • updating the FSM as a result of this automatic cooperative cleanup performed every time an integrated operation is executed can be expensive in terms of the effects on performance relative to the amount of space freed. For example, the amount of freed space may be insignificant when there are only a few memory blocks with redundant record data.
  • FIG. 6 illustrates a method that can reduce the frequency of updating the FSM by using a probabilistic approach.
  • This probabilistic approach can update one branch of the FSM tree from the bottom up.
  • this method can result in the FSM marking memory blocks with outdated record data, there can be inaccuracy in the FSM tree because marking should result in successfully cleaned candidate blocks being recycled. This inaccuracy can be tolerated when faster performance is desired.
  • the method illustrated by FIG. 6 begins by determining if space has been freed as a result of pruning 610 where pruning can be a form of block cleaning. If cooperative cleanup pruning has freed space from one or more candidate blocks for cleanup, the update probability (P update ) 630 can be given by:
  • the next step is to draw a random number (rand) 640.
  • Rand can be compared to 100%-P update 650 and the cleaned candidate blocks can be returned to the FSM when the update probability is less than rand 660. Otherwise, pruning is considered to have completed 620.
  • Equation (3) can represent the reserved free space when the page pruning threshold is reached.
  • P update will be approximately 1 when no free space is reclaimed and greater than 1 when a larger space is freed by cleaning.
  • the FSM tree update can occur whenever the free space reclaimed by pruning is greater than 1.
  • the heuristic probability value can be used to control the tradeoff between space utilization and performance.
  • the technical benefits of this embodiment can include constant cleaning instead of spikey performance.
  • Another benefit is an integrated space maintenance versus asynchronous or user driven.
  • FIG. 7 illustrates that space is better utilized when embodiments of this disclosure are used than prior art background cleaning methods with comparable performance are used.
  • one or more candidate blocks can include index records, data records, or both.
  • index records can point to data records and data records can include record data.
  • the lifetime of an index record can be determined so that it doesn’ t remain after the associated data record has been cleaned. This lifetime index can be determined using minimum and maximum transaction identifiers (transaction IDs) included in each record. Since these minimum and maximum transaction IDs are the same for both the index record and data record, when the data record transaction ID is no longer used, the index record with the same transaction ID can also be cleaned.
  • transaction IDs minimum and maximum transaction identifiers
  • an embodiment can begin when structured query language (SQL) statements 810 either performs a delete index record 820 or insert index record 830.
  • SQL structured query language
  • update active record court 840 can be performed.
  • This embodiment can use two queues, potential empty block queue 860 and available block queue 870 when candidate blocks for cleanup include index records.
  • insert index record 830 either update active block court 840 or page split 850 can be performed.
  • page split 850 is performed, available block queue 870 can be accessed.
  • Potentially empty block queue 860 can track potentially empty blocks when update active record court 840 is performed. These blocks can then be recycled from this queue into the available block queue 870.
  • available block queue 870 can be searched for an available block. The index space is extended when one or more free blocks associated with the relation are not available.
  • This embodiment can include that inline cleanup of index records can be performed as a part of a regular user operation including insert, update, or delete.
  • Another technical advantage can be consistent recycling and reuse of empty pages independent of the target page.
  • FIG. 9 is a block diagram of an electronic device (ED) 952 illustrated within a computing and communications environment 950 that may be used for implementing the devices and methods disclosed herein.
  • the electronic device 952 typically includes a processor 954, such as a central processing unit (CPU) , and may further include specialized processors such as a graphics processing unit (GPU) or other such processor, a memory 956, a network interface 958 and a bus 960 to connect the components of ED 952.
  • ED 952 may optionally also include components such as a mass storage device 962, a video adapter 964, and an I/O interface 968 (shown in dashed lines) .
  • the memory 956 may comprise any type of non-transitory system memory, readable by the processor 954, such as static random-access memory (SRAM) , dynamic random-access memory (DRAM) , synchronous DRAM (SDRAM) , read-only memory (ROM) , or a combination thereof.
  • the memory 956 may include more than one type of memory, such as ROM for use at boot-up, and DRAM for program and data storage for use while executing programs.
  • the bus 960 may be one or more of any type of several bus architectures including a memory bus or memory controller, a peripheral bus, or a video bus.
  • the electronic device 952 may also include one or more network interfaces 958, which may include at least one of a wired network interface and a wireless network interface. As illustrated in FIG. 9, network interface 958 may include a wired network interface to connect to a network 974, and also may include a radio access network interface 972 for connecting to other devices over a radio link. The network interfaces 958 allow the electronic device 952 to communicate with remote entities such as those connected to network 974.
  • the mass storage 962 may comprise any type of non-transitory storage device configured to store data, programs, and other information and to make the data, programs, and other information accessible via the bus 960.
  • the mass storage 962 may comprise, for example, one or more of a solid-state drive, hard disk drive, a magnetic disk drive, or an optical disk drive.
  • mass storage 962 may be remote to the electronic device 952 and accessible through use of a network interface such as interface 958.
  • mass storage 962 is distinct from memory 956 where it is included and may generally perform storage tasks compatible with higher latency, but may generally provide lesser or no volatility.
  • mass storage 962 may be integrated with a heterogeneous memory 956.
  • electronic device 952 may be a standalone device, while in other embodiments electronic device 952 may be resident within a data center.
  • a data center is a collection of computing resources (typically in the form of servers) that can be used as a collective computing and storage resource. Within a data center, a plurality of servers can be connected together to provide a computing resource pool upon which virtualized entities can be instantiated. Data centers can be interconnected with each other to form networks consisting of pools computing and storage resources connected to each by connectivity resources.
  • the connectivity resources may take the form of physical connections such as ethernet or optical communications links, and in some instances may include wireless communication channels as well.
  • the links can be combined together using any of a number of techniques including the formation of link aggregation groups (LAGs) .
  • LAGs link aggregation groups
  • any or all of the computing, storage and connectivity resources can be divided between different sub-networks, in some cases in the form of a resource slice. If the resources across a number of connected data centers or other collection of nodes are sliced, different network slices can be created.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Method and apparatus for reducing the impact of database storage maintenance is provided. Statistics are evaluated to determine when dead database records are cleaned by balancing cleaning with minimizing the effect of cleaning on performance. The statistics evaluated are the dead tuple percentage and the prune success rate.

Description

METHODS FOR THE SELF-SUFFICIENT MAINTENANCE OF DATABASE OBJECT STORAGE
CROSS-REFERENCE TO RELATED APPLICATIONS
This is the first application filed for the present invention.
FIELD OF THE INVENTION
The present disclosure pertains to the field of storage maintenance methods and in particular to storage maintenance methods that reduce the overall storage footprint of an online transactional processing (OLTP) database system.
BACKGROUND
In current OLTP multi-version concurrency control (MVCC) databases, data is maintained for as long as it can be accessed by one or more active transactions. However, data is removed as soon as it cannot be accessed to reduce redundant data in the database. This removal process is known in the art as “cleanup” . A downside of cleanup is that it reduces performance. As a result, it is difficult to manage databases so that cleanup is performed enough to reduce redundant data while minimizing the impact of cleanup on performance.
Accordingly, there is a need for one or more methods that at least partially addresses one or more limitations of the prior art. One or more of these methods includes an approach to database cleanup that balances reducing the amount of redundant data in a database with the impact of removing this redundant data.
The foregoing background information is provided to reveal information believed by the applicant to be of possible relevance to the present disclosure. No admission is necessarily intended, nor should be construed, that any of the preceding information constitutes prior art against the present disclosure.
SUMMARY
Database management includes removing (cleaning) data that is no longer valid. An aspect of the present disclosure provides a method for database management. This method can be executed by a processor. The method includes receiving a user operation. The method further includes evaluating a free space map (FSM) to determine if there are sufficient blocks with free space to execute the received user operation. The method further includes performing a cooperative cleanup when the FSM indicates there are insufficient blocks with free space. The method further includes executing the received user operation. A technical benefit is that if the method can perform cooperative cleanup to removed dead records, the space freed by removing these dead records can be recycled and the record does not need to be extended.
In some embodiments, the method further includes performing the cooperative cleanup further comprises evaluating statistics. A technical benefit is that cooperative cleanup is adaptive because it is based on evaluating statistics.
In some embodiments, evaluating statistics comprises evaluating a dead tuple percentage statistic. Atechnical benefit is that cooperative cleanup is performed based on an evaluation of the dead tuple percentage statistic.
In some embodiments, cooperative cleanup further comprises scanning one or more candidate blocks for cleanup. A technical benefit is that blocks that are selected for evaluation based on evaluation of statistics are then scanned to determine if they are candidates for cleanup.
In some embodiments, cooperative cleanup further comprises evaluating statistics to determine which of the one or more candidate blocks for cleanup are scanned. A technical benefit is that candidate blocks for cleanup are scanned to determine if they are candidates for cleanup.
In some embodiments, evaluating statistics comprises evaluating a prune success rate statistic. A technical benefit is that the number of candidate blocks for pruning is based on the previous number of successfully pruned blocks.
In some embodiments, evaluating statistics further includes balancing which candidate blocks for cleanup are scanned with minimizing the effect of scanning  candidate blocks for cleanup. A technical benefit is that performance is not overly affected by cooperative cleanup.
In some embodiments, evaluating statistics further comprises locking candidate blocks for cleanup before scanning the candidate blocks for cleanup. A technical benefit of locking is that it prevents two or more threads from accessing the blocks storing the record at the same time.
In some embodiments, cooperative cleanup further comprises evaluating a transaction identifier and array size. A technical benefit is that the candidate block start index is determined by evaluating the transaction identifier and array size.
In some embodiments, cooperative cleanup further comprises eliminating concurrent pruning of the candidate blocks for cleanup by temporarily invalidating a first candidate block of the candidate blocks for cleanup. A technical benefit ensures there is no conflict when multiple cooperative cleanups attempt to clean the same relation.
In some embodiments, evaluating statistics further comprises evaluating the prune success rate statistic using a total number of scanned blocks and a total number of pruned blocks. A technical benefit of cooperative cleanup’s evaluation of the dead tuple percentage statistic and a prune success rate statistic is that this evaluation makes cooperative cleanup adaptive.
In some embodiments, cooperative cleanup further comprises marking successfully cleaned candidate blocks for cleanup in the FSM. A technical benefit is that marked successfully cleaned candidate blocks can be recycled.
In some embodiments, cooperative cleanup further comprises marking successfully cleaned candidate blocks for cleanup in the FSM when an update probability is less than a random value. Atechnical benefit is that this random value can be used to control the tradeoff between space utilization and performance.
In some embodiments, cooperative cleanup further comprises determining the update probability using free space of a relation and a block size and a fill factor and a user defined probability. A technical benefits are constant cleaning instead of spikey  performance and integrated space maintenance versus asynchronous or user driven maintenance.
In some embodiments, cooperative cleanup further comprises unlocking the locked index before executing the received user operation. A technical benefit is that the unlocked index can be recycled and can later be a candidate for pruning by another relation.
In some embodiments, cooperative cleanup further comprises one or more candidate blocks for cleanup including index records, data records, or both. A technical benefit is that lifetime of an index record can be determined so that it doesn’ t remain after the associated data record has been cleaned.
In some embodiments, cooperative cleanup further comprises use of a potential empty block queue and an available block queue when cleaning index records. A technical benefit is that potentially empty block queue can track potentially empty blocks and these blocks can then be moved from this queue into the available block queue for reuse during a new block request.
In some embodiments, cooperative cleanup further comprises receiving the user operation including an insert, delete, or update operation. A technical benefit is that cooperative cleanup can be performed when an insert, a delete, and an update operation is requested.
A further aspect of the present disclosure is a device that includes a processor and non-transient memory. The processor executes instructions to implement a statistics cache (SC) for database management. The SC includes a new statistics cache stored in non-transitory memory. The SC also includes a lightweight thread that loads new statistics into the new statistics cache. The SC also includes an active statistics cache stored in the non-transitory memory and cooperative cleanup uses statistics read from the active statistics cache to manage the database. A technical benefit is that statistics are read from the active statistics cache and as a result limit the interaction, and also limit the impact on performance, by reading data placed in the active statistics cache instead of accessing data directly from disk files used to store the statistics.
In some embodiments, the lightweight thread loads the new statistics into the new statistics cache every configurable time period. A technical benefit is that the  frequency of the lightweight thread loading these statistics can be configured to reduce context switching time during thread execution.
In some embodiments, the active statistics cache includes new statistics copied from the new statistics cache after loading the new statistics into the new statistics cache completes. A technical benefit is that input/output operations and can be reduced and this also stops the statistics collector process from being flooded with information requests.
BRIEF DESCRIPTION OF THE FIGURES
Further features and advantages of the present disclosure will be apparent from the following detailed description, taken in combination with the appended drawings, in which:
FIG. 1 illustrates a flow chart representation of a prior art database cleanup method.
FIG. 2 illustrates a flow chart of a database cleanup method according to an embodiment of this disclosure.
FIG. 3 illustrates a lightweight thread used to load statistics into the new statistics cache according to an embodiment of this disclosure.
FIG. 4 illustrates a record according to an embodiment of this disclosure.
FIG. 5 illustrates a cooperative cleanup method according to an embodiment of this disclosure.
FIG. 6 illustrates a flow chart for returning cleaned blocks to the free space map according to an embodiment of this disclosure.
FIG. 7 is a graph illustrating the database size difference between prior art database cleanup and a database cleanup method according to an embodiment of this disclosure.
FIG. 8 illustrates another database cleanup method according to an embodiment of this disclosure.
FIG. 9 is a block diagram of an electronic device 952 within a computing and communications environment 590 that may be used for implementing devices and methods in accordance with representative embodiments of the present disclosure.
Throughout the appended drawings, like features are identified by like reference numerals.
DETAILED DESCRIPTION
Various embodiments of this disclosure use methods to remove database data that can no longer be accessed. This data can be removed in a way that reduces the impact of the data removal on database performance. This disclosure provides these methods.
OLTP is a category of data processing that is used by transaction oriented database management tasks. OLTP database processing consists of a large number of transactions requested by a large number of users.
Database data can be organized into a table of columns and rows. Each row can represent a set of related data and the row can be commonly referred to as a record or tuple by those skilled in the art. A relation can include one or more records.
A block is a fixed amount of memory that can store record data. Those skilled in the art will understand that the term page can be used to refer to an in memory representation of a block. Relations can be allocated one or more blocks to store record data and a free space map (FSM) can be read to evaluate the amount of available space in the relation. This available space can be referred to as free space by those skilled in the art.
Users of a database can access and modify database information using user operations. As a non-limiting example, a computer process can use one or more user operations to insert, update, or delete data in one or more database records. In the case of an insert or update user operation, the user operation can include the data and also the address of the record where the data is to be updated or amended. In the case of a delete  user operation, the user operation can include the address of the record where the data is to be deleted. The database records, as a non-limiting example, can be stored by an electronic device and the electronic device can include one or more processors and non-transitory memory. The user operation can be received by the electronic device and the electronic device’s processor can process the user operation by executing instructions that can be stored in the electronic device’s non-transitory memory. When executed, these instructions can instruct the processor on how to access the addressed record and how to insert, update or delete the desired record’s user data. User operations can include insert, update, and delete operations and these operations are used to insert new data into a record, update data in a record, or delete record data. Before performing these operations, the FSM can be accessed to evaluate if a relation has enough free space to hold the data to be inserted or updated. Enough free space can be referred to as one or more sufficient blocks to store the data. When the relation has insufficient blocks to store the data to be inserted or updated, unless the relation includes one or more records that can no longer be accessed and can be cleaned up, one or more additional blocks are allocated to the record. Records that can no longer be accessed are known as dead records by those skilled in the art and those skilled in the art can also refer to the percentage of dead records in relation to the total number of records as a dead record percentage or a dead tuple percentage. The allocation of additional blocks to the relation can be referred to as relation extension and cleaning up a relation can refer to pruning blocks allocated to the one or more dead records in the relation. Cooperative cleanup is relation cleanup that can reuse pruned blocks to store inserted or updated record data. Those skilled in the art will understand that pruning can mean physically deleting records from the database.
Cleaning database records can require the use of locking as a form of concurrency control. Concurrency control by locking a database record is achieved by preventing two or more threads from accessing the blocks storing the record at the same time. The record is unlocked and can be accessed by the next user operation when the user operation that caused the database record to be locked completes.
One or more methods included in this disclosure can use cooperative cleanup to reduce the impact of cleanup time on performance by balancing the number of database records cleaned with the impact of cleaning time. These methods can also reduce  locking duration, increase system predictability, promote reuse before extending the record, and promote concurrency. These methods can be used, provided that they are performed when the dead tuple percentage exceeds a predefined threshold, during insert, delete, or update operations or during a sequential table scan.
Methods included in this disclosure can collect statistics related to successful and unsuccessful attempts to cleanup each block. Cooperative cleanup can adaptively select blocks for cleaning, and also ensure exhaustive and distributed cleaning, based on these successful and unsuccessful cleanup statistics.
Cooperative cleanup can be performed when the entire table has more dead records than a predefined threshold. Therefore, determining if cooperative cleanup can be performed can include evaluating statistics such as the number of dead records (the dead tuple percentage statistic) . It should be appreciated that the dead tuple percentage statistic can be read from a statistics cache. When cooperative cleanup is performed, an array of candidate blocks for cleanup can be created. These candidate blocks for cleanup may or may not have any dead records. However, cooperative cleanup can include further evaluating statistics such as the prune success rate statistic to determine the number of blocks that may be scanned. As a result, not all candidate blocks in this array may be scanned to determine if they can be cleaned. It should be appreciated that the prune success rate statistic can be read from a statistics cache. Cooperative cleanup then scans the first candidate block included in the array and attempts to prune the page of the candidate block. This page can be reused if it can be pruned and has enough space to store new data. Otherwise, the next candidate block in the array can be scanned. A person skilled in the art can appreciate that cooperative cleanup’s evaluation of the dead tuple percentage statistic and a prune success rate statistic can make cooperative cleanup adaptive.
FIG. 1 illustrates a prior art method used to insert data into a database relation. The insert method 110 begins by reading FSM 120 to evaluate if the current relation has sufficient blocks with free space to hold the data to be inserted. When there are sufficient blocks, inserted data is stored using free space 130. However, the relation is extended 140 if the record has insufficient blocks to store the data to be inserted.
FIG. 1 also illustrates how a background autovacuum daemon 115 process used by a PostgreSQL database can cleanup space occupied by dead tuples, and as a result mark the freed space in the FSM.
FIG. 2 illustrates a flow chart of a database cleanup method according to an embodiment of this disclosure. FIG. 2 illustrates using cooperative cleanup 210 when insert 110, as a non-limiting example of a user operation, is received and the relation has insufficient blocks to store the data to be inserted. As FIG. 2 illustrates, insert 110 can begin by evaluating FSM 120 to determine if the relation has sufficient blocks to store the data to be inserted. If the relation has sufficient blocks, free record space 130 can be used. However, cooperative cleanup 210 can be used to attempt cleanup based on an evaluation of the dead tuple percentage statistic and a determination that the relation has insufficient blocks. If cooperative cleanup is performed, an array of candidate blocks for cleaning can be created. The prune success rate statistic can be evaluated before scanning these candidate blocks for cleanup to determine which candidate blocks included in the array are to be scanned. When cooperative cleanup is successful, the one or more cleaned candidate blocks can be used to store the record data. Alternatively, one or more cleaned candidate blocks can be marked in the FSM so that they can be recycled and used to store data inserted by future user insert or update operations. However, the relation can be extended 140 when cooperative cleanup 210 fails.
Those skilled in the art will understand that caches can be used to limit the interaction, and therefore limit the impact on performance, by reading data placed in one or more caches instead of accessing data directly from disk files that can be used to store the statistics. FIG. 3 illustrates an embodiment of this disclosure that can include a statistics cache GlobalMemoryContext 310 that can be stored in non-transient memory. GlobalMemoryContext 310 can include two statistics caches –new statistics cache 330 and active statistics cache 320. These two caches can be included to allow statistics to be updated without locking the locations containing statistics read at the same time by  backend threads  340a, 340b, 340c.
FIG. 3 also illustrates lightweight thread 350. Light weight thread 350 can be used to update the contents of new statistics cache 330 with statistics read from statistics file 360. A lightweight thread is known by those skilled in the art as a computer process that can be used to share resource information with other threads. This sharing can be  done to reduce context switching time during thread execution. This embodiment can include the ability to configure the frequency, or time period, of when lightweight thread 350 updates new statistics cache 330. It should be appreciated that while differences between new and active statistics cache are shown, in some embodiments these caches can be combined.
Once all statistics included in statistics file 360 have been read into new statistics cache 330, the statistics included in new statistics cache 330 can be copied into active statistics cache 320 so that cooperative  cleanup backend threads  340a, 340b, and 340c can access the most up to date statistics. This configuration of a new statistics cache 330 and an active statistics cache 320 can be more efficient than accessing the statistics file directly because direct access can require a large number of input/output operations and can also flood a statistics collector process (not shown) with information requests.
The insert operation can be a non-limiting example of a user operation that can use cooperative cleanup, and therefore also the evaluation of statistics read from active statistics file 320, to cleanup blocks allocated to a relation. Before executing an insert operation, the dead tuple percentage statistic and also the previously collected prune success rate statistic can be evaluated to determine if the relation has sufficient candidate blocks that can be cleaned, scanned and then used to store one or more records to be inserted.
Cooperative cleanup can create an array of start blocks per relation to determine candidate blocks for cleanup. The size of this array is known by those skilled in the art as the starting block array size (or more simply as the array size) . When a user operation is requested, the operation can be assigned a unique identifier that is known by those skilled in the art as a transaction identifier. A candidate block start index related to these start blocks can be evaluated to determine the candidate block’s starting point. As a non-limiting example, if the transaction identifier is mapped to index 3, scanning can begin at block 2. Also, pruning of block 2 can be attempted. If block 2 is successfully pruned and also contains enough space to hold the data to be inserted, block 2 can be marked in the FSM and the transaction identifier mapped to index 3 can be updated to the next block. Furthering this non-limiting example where the start block array also includes 20 entries, the next block can be block 22 due to the array size being 20 and also because the next element at index 3 would be (20 + value at the current index) . As a result, the  next transaction identifier mapped to index 3 can attempt to prune block 22 and won’ t repeat block 2. The candidate block start index can be determined by evaluating the transaction identifier and the starting block array size and can be given by:
candidate block start index=transaction identifer%array size    (1)
where: %=modulo operator
In the above described non-limiting example, a start block array of 20 entries that can be initialized to start block = [0, 1, 2, 3, 4, 5, 6, …, 19] . After the candidate blocks have been determined, the one or more candidate blocks can be locked and scanned to determine if they can be cleaned.
Further to the previous non-limiting example, when the transaction identifier can be 1234, the start block array can be indexed by start block [1234 %20] . Since the result of 1234 %20 is 14, start block [14] can begin at the fourteenth element of the start block array. The fourteenth element of the start block array in this non-limiting example is block 13.
One or more relations can undergo cooperative cleanup at the same time. Each relation undergoing cooperative cleanup can have its own array of start blocks because relations do not share block numbers. As a result there will be no conflict across relations. However, to ensure there is no conflict when multiple cooperative cleanups clean the same relation by trying to prune the same candidate block (concurrent pruning) , candidate blocks locked by the first cooperative cleanup cannot be accessed by one or more other cooperative cleanups operating on the same relation. Locking candidate blocks to restrict access has the effect of temporarily invalidating these locked candidate blocks.
Those skilled in the art will appreciate that temporarily invalidating candidate blocks can eliminate concurrent pruning.
Candidate blocks can be locked one at a time and then scanned adaptively to determine if the candidate block can be cleaned using the previously collected prune success rate statistic. A person skilled in the art will understand “which of the locked  candidate blocks can be scanned” can mean the number of candidate blocks in the array that can be scanned. Candidate blocks can be adaptively evaluated for scanning because the prune success rate statistic can be evaluated using a total number of scanned blocks and a total number of pruned blocks which can be given by:
Figure PCTCN2021101862-appb-000001
Adaptive evaluation can occur because as Equation (2) shows, as the number of pruned blocks increases, more candidate blocks can be scanned. Also, less candidate blocks can be scanned as the number of pruned blocks decreases. As a non-limiting example, if the total number of scanned blocks is 100 (scanned blocks statistic is 100) and the total number of pruned blocks is 50 (pruned blocks statistic is 50) , the prune success rate statistic is 50%. In another non-limiting example, the total number of scanned blocks statistic can again be 100 but if the total number of pruned blocks can be 75 (pruned blocks statistic is 75) , the prune success rate statistic is 75%. As a result, the number of candidate blocks that are scanned can be adaptive because in the first example the number of candidate blocks scanned is 50%whereas the number of candidate blocks scanned for the second example is 75%.
The determination of candidate blocks and scanning is illustrated by FIG. 4. An array of start blocks for a relation can be created and can include block 420. A second array of start blocks can be created for the relation and can include block 460. Backend thread 1 410 can be related to a first cooperative cleanup of the relation. Block 420 can be determined to be the first candidate block for cooperative cleanup as a result of evaluating a candidate block start index related to this relation. As a result, candidate block 420 can be locked and then scanned to determine if candidate block 420 can be cleaned. The next candidate block for this relation can be determined to be block 430. Candidate block 430 can be locked and then scanned to determine if it can be cleaned. The next candidate block can be block 440 and this block can be locked and then scanned to determine if it can be cleaned. Cooperative cleanup ends after candidate block 440 has been scanned. Backend thread 2 450 can be related to a second cooperative cleanup of the relation as a result of evaluating the candidate block start index related to this record. Block 460 can be determined to be the first candidate block for cooperative cleanup of the record. Block 460 can be determined to be the first  candidate block for cooperative cleanup as a result of evaluating a candidate block start index related to this record. As a result of this evaluation, candidate block 460 can be locked and then scanned to determine if candidate block 460 can be cleaned. The next candidate block for this record can be determined to be block 470. Candidate block 470 can be locked and then scanned to determine if it can be cleaned. The next candidate block can be block 480 which can be locked and then scanned to determine if it can be cleaned. The next candidate block can be block 490 which can be locked and then scanned to determine if it can be cleaned. The cooperative cleanup of this record ends after candidate block 490 has been scanned.
FIG. 5 illustrates the cooperative cleanup process. When cooperative cleanup is performed, and the FSM indicates that there are insufficient blocks 120, backend transaction identifier 530 which is associated with a record being inserted, deleted, or updated (as a non-limiting example with transaction identifier 1234) accesses GlobalMemoryContext 310 to access StartBlockMap 512 to generate a start block array.
FIG. 5 also illustrates start block array’s twenty elements which can include element 0 560, element 1 562, element 2 564, element 3 566, element 4 568, element 18 570, and element 19 572. FIG. 5 also illustrates the memory blocks used to store records as block 0 540, block 1 542, block 2 544, block 3 546, block 4 548, block 20 550, block 21 552, block 22 554, block 23 556, and block 24 558.
As a non-limiting example, FIG. 5 illustrates the candidate block start index determined by evaluating Equation (1) that results in the first candidate block for cooperative cleanup being determined to be mapped to start block [4] (which is also element 3 566) . Start block [4] can be locked and temporarily invalidated. Element 3 566 points to the first candidate block for cleanup, memory block 3 546.
The candidate block start index can be evaluated to determine which of the one or more candidate blocks for cleanup can be scanned by balancing the number of scanned candidate blocks with minimizing the effect of scanning these candidate blocks. The effect of scanning that is minimized can include the time spent pruning blocks. As FIG. 5 illustrates, the evaluation of the candidate block start index results in block 3 546 being scanned by backend thread 596.
Backend transaction identifier 530 can also access GlobalMemoryContext 310 to access Active Statistics Cache 320 when status 340 indicates that statistics are available. The prune success rate statistic can be read from Active Statistics Cache 320. FIG. 5 illustrates that memory block 23 556 is the last block scanned as a result of the evaluation of the dead records statistic and the pruned block statistic. FIG. 5 also illustrates that after block 23 556 is scanned and cleaned, it can be unlocked before being marked in the FSM 594. The received user operation can be executed once the candidate block is unlocked.
FIG. 5 also illustrates a second backend thread 592 related to the cooperative cleanup of a relation. This second backend thread 592 is also mapped to element 3 556 but for this second backend thread, the candidate block for cleanup is memory block 23 556 plus the size of the start block array.
In an embodiment, cooperative cleanup can be triggered when a record’s space consumption approaches 90%.
In an embodiment, cooperative cleanup can be integrated in a user operation so that cooperative cleanup can be automatically performed along with the operation. In some embodiments, the integrated cooperative cleanup can free more space than the space required by the user operation. This integrated cooperative cleanup can result in reduced locking duration and increased system predictability. However, updating the FSM as a result of this automatic cooperative cleanup performed every time an integrated operation is executed can be expensive in terms of the effects on performance relative to the amount of space freed. For example, the amount of freed space may be insignificant when there are only a few memory blocks with redundant record data.
FIG. 6 illustrates a method that can reduce the frequency of updating the FSM by using a probabilistic approach. This probabilistic approach can update one branch of the FSM tree from the bottom up. However, since this method can result in the FSM marking memory blocks with outdated record data, there can be inaccuracy in the FSM tree because marking should result in successfully cleaned candidate blocks being recycled. This inaccuracy can be tolerated when faster performance is desired.
The method illustrated by FIG. 6 begins by determining if space has been freed as a result of pruning 610 where pruning can be a form of block cleaning. If cooperative  cleanup pruning has freed space from one or more candidate blocks for cleanup, the update probability (P update) 630 can be given by:
Figure PCTCN2021101862-appb-000002
where P is a user defined probability
The next step is to draw a random number (rand) 640. Rand can be compared to 100%-P update 650 and the cleaned candidate blocks can be returned to the FSM when the update probability is less than rand 660. Otherwise, pruning is considered to have completed 620.
The denominator of Equation (3) can represent the reserved free space when the page pruning threshold is reached. P update will be approximately 1 when no free space is reclaimed and greater than 1 when a larger space is freed by cleaning. As a result, the FSM tree update can occur whenever the free space reclaimed by pruning is greater than 1.The heuristic probability value can be used to control the tradeoff between space utilization and performance.
The technical benefits of this embodiment can include constant cleaning instead of spikey performance. Another benefit is an integrated space maintenance versus asynchronous or user driven.
FIG. 7 illustrates that space is better utilized when embodiments of this disclosure are used than prior art background cleaning methods with comparable performance are used.
In an embodiment, one or more candidate blocks can include index records, data records, or both. It should be appreciated that index records can point to data records and data records can include record data. The lifetime of an index record can be determined so that it doesn’ t remain after the associated data record has been cleaned. This lifetime index can be determined using minimum and maximum transaction identifiers (transaction IDs) included in each record. Since these minimum and maximum transaction IDs are the same for both the index record and data record, when  the data record transaction ID is no longer used, the index record with the same transaction ID can also be cleaned.
As illustrated by FIG. 8, an embodiment can begin when structured query language (SQL) statements 810 either performs a delete index record 820 or insert index record 830. When a delete index record 820 is performed, update active record court 840 can be performed. This embodiment can use two queues, potential empty block queue 860 and available block queue 870 when candidate blocks for cleanup include index records. When insert index record 830 is performed, either update active block court 840 or page split 850 can be performed. When page split 850 is performed, available block queue 870 can be accessed. Potentially empty block queue 860 can track potentially empty blocks when update active record court 840 is performed. These blocks can then be recycled from this queue into the available block queue 870. For new block requests, available block queue 870 can be searched for an available block. The index space is extended when one or more free blocks associated with the relation are not available.
The technical advantages of this embodiment are that neither a dedicated backend process nor a dedicated user command are needed to perform cleanup. This embodiment can include that inline cleanup of index records can be performed as a part of a regular user operation including insert, update, or delete. Another technical advantage can be consistent recycling and reuse of empty pages independent of the target page.
Figure 9 is a block diagram of an electronic device (ED) 952 illustrated within a computing and communications environment 950 that may be used for implementing the devices and methods disclosed herein. The electronic device 952 typically includes a processor 954, such as a central processing unit (CPU) , and may further include specialized processors such as a graphics processing unit (GPU) or other such processor, a memory 956, a network interface 958 and a bus 960 to connect the components of ED 952. ED 952 may optionally also include components such as a mass storage device 962, a video adapter 964, and an I/O interface 968 (shown in dashed lines) .
The memory 956 may comprise any type of non-transitory system memory, readable by the processor 954, such as static random-access memory (SRAM) , dynamic  random-access memory (DRAM) , synchronous DRAM (SDRAM) , read-only memory (ROM) , or a combination thereof. In an embodiment, the memory 956 may include more than one type of memory, such as ROM for use at boot-up, and DRAM for program and data storage for use while executing programs. The bus 960 may be one or more of any type of several bus architectures including a memory bus or memory controller, a peripheral bus, or a video bus.
The electronic device 952 may also include one or more network interfaces 958, which may include at least one of a wired network interface and a wireless network interface. As illustrated in FIG. 9, network interface 958 may include a wired network interface to connect to a network 974, and also may include a radio access network interface 972 for connecting to other devices over a radio link. The network interfaces 958 allow the electronic device 952 to communicate with remote entities such as those connected to network 974.
The mass storage 962 may comprise any type of non-transitory storage device configured to store data, programs, and other information and to make the data, programs, and other information accessible via the bus 960. The mass storage 962 may comprise, for example, one or more of a solid-state drive, hard disk drive, a magnetic disk drive, or an optical disk drive. In some embodiments, mass storage 962 may be remote to the electronic device 952 and accessible through use of a network interface such as interface 958. In the illustrated embodiment, mass storage 962 is distinct from memory 956 where it is included and may generally perform storage tasks compatible with higher latency, but may generally provide lesser or no volatility. In some embodiments, mass storage 962 may be integrated with a heterogeneous memory 956.
In some embodiments, electronic device 952 may be a standalone device, while in other embodiments electronic device 952 may be resident within a data center. A data center, as will be understood in the art, is a collection of computing resources (typically in the form of servers) that can be used as a collective computing and storage resource. Within a data center, a plurality of servers can be connected together to provide a computing resource pool upon which virtualized entities can be instantiated. Data centers can be interconnected with each other to form networks consisting of pools computing and storage resources connected to each by connectivity resources. The connectivity resources may take the form of physical connections such as ethernet or  optical communications links, and in some instances may include wireless communication channels as well. If two different data centers are connected by a plurality of different communication channels, the links can be combined together using any of a number of techniques including the formation of link aggregation groups (LAGs) . It should be understood that any or all of the computing, storage and connectivity resources (along with other resources within the network) can be divided between different sub-networks, in some cases in the form of a resource slice. If the resources across a number of connected data centers or other collection of nodes are sliced, different network slices can be created.
Although the present disclosure has been described with reference to specific features and embodiments thereof, it is evident that various modifications and combinations can be made thereto without departing from the disclosure. The specification and drawings are, accordingly, to be regarded simply as an illustration of the invention as defined by the appended claims, and are contemplated to cover any and all modifications, variations, combinations or equivalents that fall within the scope of the present disclosure.

Claims (21)

  1. A method for database management, the method executed by a processor, the method comprising:
    receiving a user operation;
    evaluating a free space map (FSM) to determine there are sufficient blocks with free space to execute the user operation;
    performing a cooperative cleanup when the FSM indicates there are insufficient blocks with free space; and
    executing the received user operation.
  2. The method of claim 1 wherein performing the cooperative cleanup further comprises evaluating statistics.
  3. The method of claim 2 wherein evaluating statistics comprises evaluating a dead tuple percentage statistic.
  4. The method of claim 2 wherein cooperative cleanup further comprises scanning one or more candidate blocks for cleanup.
  5. The method of any one of claims 1-4 wherein cooperative cleanup further comprises evaluating statistics to determine which of the one or more candidate blocks for cleanup are scanned.
  6. The method of any one of claims 1-5 wherein evaluating statistics comprises evaluating a prune success rate statistic.
  7. The method of any one of claims 1-6 wherein evaluating statistics further comprises balancing which of the one or more candidate blocks for cleanup are scanned with minimizing the effect of scanning the one or more candidate blocks for cleanup.
  8. The method of claim 2 wherein cooperative cleanup further comprises locking the candidate block of the one or more candidate blocks for cleanup before scanning the one or more candidate blocks for cleanup.
  9. The method of claim 2 wherein cooperative cleanup further comprises evaluating a transaction identifier and array size.
  10. The method of claim 2 wherein cooperative cleanup further comprises eliminating concurrent pruning of the one or more candidate blocks for cleanup by temporarily invalidating a first candidate block of the one or more candidate blocks for cleanup.
  11. The method of any one of claims 1-6 wherein evaluating statistics further comprises evaluating the prune success rate statistic using a total number of scanned blocks and a total number of pruned blocks.
  12. The method of claim 2 wherein cooperative cleanup further comprises marking successfully cleaned one or more candidate blocks for cleanup in the FSM.
  13. The method of claim 12 wherein cooperative cleanup further comprises marking successfully cleaned one or more candidate blocks for cleanup in the FSM when an update probability is less than a random value.
  14. The method of claim 13 wherein cooperative cleanup further comprises determining the update probability using free space of a relation and a block size and a fill factor and a user defined probability.
  15. The method of claim 8 wherein cooperative cleanup further comprises unlocking the locked index before executing the received user operation.
  16. The method of claim 2 wherein cooperative cleanup further comprises one or more candidate blocks for cleanup including one of:
    index records;
    data records; and
    index records and data records.
  17. The method of claim 16 wherein cooperative cleanup further comprises use of a potential empty block queue and an available block queue when cleaning index records.
  18. The method of claim 1 wherein cooperative cleanup further comprises receiving the user operation including one of an insert, a delete, and an update operation.
  19. A device including a processor and non-transient memory, which when executed by the processor, implements a statistics cache (SC) for database management, the SC comprising:
    a new statistics cache stored in the non-transitory memory;
    a lightweight thread that loads new statistics into the new statistics cache; and
    an active statistics cache that is stored in the non-transitory memory wherein a cooperative cleanup uses statistics read from the active statistics cache to manage the database.
  20. The device of claim 19 wherein the lightweight thread loads the new statistics into the new statistics cache every configurable time period.
  21. The SC of claim 20 wherein the active statistics cache includes the new statistics copied from the new statistics cache when loading the new statistics into the new statistics cache completes.
PCT/CN2021/101862 2021-06-23 2021-06-23 Methods for the self-sufficient maintenance of database object storage Ceased WO2022266889A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/101862 WO2022266889A1 (en) 2021-06-23 2021-06-23 Methods for the self-sufficient maintenance of database object storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/101862 WO2022266889A1 (en) 2021-06-23 2021-06-23 Methods for the self-sufficient maintenance of database object storage

Publications (1)

Publication Number Publication Date
WO2022266889A1 true WO2022266889A1 (en) 2022-12-29

Family

ID=84545023

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/101862 Ceased WO2022266889A1 (en) 2021-06-23 2021-06-23 Methods for the self-sufficient maintenance of database object storage

Country Status (1)

Country Link
WO (1) WO2022266889A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12430320B2 (en) 2023-05-05 2025-09-30 Google Llc Adaptive auto garbage collector in MVCC database system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2249080A1 (en) * 1998-09-25 2000-03-25 Nelson Hop Hing Method for efficiently searching free space in a relational database
US20080275840A1 (en) * 2007-05-03 2008-11-06 Louis Burger Selective database statistics recollection
US20140317048A1 (en) * 2013-04-22 2014-10-23 Sap Ag Enhanced transactional cache
US20190294602A1 (en) * 2017-01-09 2019-09-26 Tencent Technology (Shenzhen) Company Limited Data scrubbing method and apparatus, and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2249080A1 (en) * 1998-09-25 2000-03-25 Nelson Hop Hing Method for efficiently searching free space in a relational database
US20080275840A1 (en) * 2007-05-03 2008-11-06 Louis Burger Selective database statistics recollection
US20140317048A1 (en) * 2013-04-22 2014-10-23 Sap Ag Enhanced transactional cache
US20190294602A1 (en) * 2017-01-09 2019-09-26 Tencent Technology (Shenzhen) Company Limited Data scrubbing method and apparatus, and computer readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12430320B2 (en) 2023-05-05 2025-09-30 Google Llc Adaptive auto garbage collector in MVCC database system

Similar Documents

Publication Publication Date Title
US10922297B2 (en) Garbage collection for in-memory row storage
US9767131B2 (en) Hierarchical tablespace space management
US10372669B2 (en) Preferentially retaining memory pages using a volatile database table attribute
US8336051B2 (en) Systems and methods for grouped request execution
US8825959B1 (en) Method and apparatus for using data access time prediction for improving data buffering policies
US12430266B2 (en) Local page writes via pre-staging buffers for resilient buffer pool extensions
US20220342888A1 (en) Object tagging
CN104573112A (en) Page query method and data processing node for OLTP cluster database
US10558636B2 (en) Index page with latch-free access
US10747773B2 (en) Database management system, computer, and database management method
US12056114B2 (en) Long-lived statements optimizations in a database system
WO2022266889A1 (en) Methods for the self-sufficient maintenance of database object storage
KR101806394B1 (en) A data processing method having a structure of the cache index specified to the transaction in a mobile environment dbms
US10416901B1 (en) Storage element cloning in presence of data storage pre-mapper with multiple simultaneous instances of volume address using virtual copies
CN117472961A (en) Method for establishing column-type memory buffer area facing analysis type database
US12093257B2 (en) Data processing device, data processing method, and data processing program based on a historical plan tree
Zhang et al. PhatKV: Towards an efficient metadata Engine for KV-based File Systems on Modern SSD
CN118051502B (en) Index processing method, device and equipment of database and readable storage medium
EP4468166A1 (en) Cooperative memory management in database management systems
US20250363084A1 (en) Adaptive Data Storage Management
CN118132598B (en) Database data processing method and device based on multi-level cache
Park et al. Speculative Multi-Level Access in LSM Tree-Based KV Store
CN119759899A (en) Index compression method, device, computing device cluster and storage medium
CN117931066A (en) Disk space allocation method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21946389

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21946389

Country of ref document: EP

Kind code of ref document: A1