CN112328592A - Data storage method, electronic device and computer readable storage medium - Google Patents
Data storage method, electronic device and computer readable storage medium Download PDFInfo
- Publication number
- CN112328592A CN112328592A CN202011104762.8A CN202011104762A CN112328592A CN 112328592 A CN112328592 A CN 112328592A CN 202011104762 A CN202011104762 A CN 202011104762A CN 112328592 A CN112328592 A CN 112328592A
- Authority
- CN
- China
- Prior art keywords
- data
- partition
- transferred
- time
- merging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2477—Temporal data queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/174—Form filling; Merging
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application provides a data storage method, electronic equipment and a computer readable storage medium, and relates to the technical field of computers. When the storage requirement of the dump data exists, judging whether the data to be dumped meets the storage condition of storing to the latest table partition or not by acquiring the time information of the latest table partition of the data table in the database, comparing the time information with the time information of the data to be dumped and comparing the residual storage space of the latest table partition with the data amount of the data to be dumped, storing the data to be dumped to the latest table partition when the storage condition is met, creating a new table partition in the data table when the storage condition is not met, and storing the data to be dumped to the newly-built table partition. The method and the device aim at the new incoming data of the data table, judge whether the new incoming data is placed in a new table partition or an existing table partition, enable the data stored in each table partition in the data table to be close to a reasonable partition size as much as possible, and can reduce the waste of system I/O to the maximum extent.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data storage method, an electronic device, and a computer-readable storage medium.
Background
With the development of internet technology and the maturity of application Software, an innovative Software application model Software-as-a-Service (SaaS) is beginning to be created. The third party vendor may obtain the corresponding software service by purchasing the Saas service. In the existing Saas service, a data analysis model provided by a Saas service provider is preset, an algorithm of an analysis index is also preset, and a plurality of third party manufacturers share the same set of data resources to perform data analysis. The data resources are collected with data to be analyzed of a plurality of third party manufacturers.
As shown in FIG. 1, a plurality of third party vendors operate a plurality of Application programs (APPs), APPs1~APPiThe data of a plurality of APPs are all stored in a plurality of databases Db appointed by the SaaS service provider1~Dbj. How to store data of multiple APPs into data tables of multiple databases generally has two ways: the first method is that hash Operation is carried out according to identification information APP _ ID of APP, modular Operation (modular Operation) is carried out on the hash Operation result to find a corresponding database, and then APP data are stored in a data table in the database; the second method is to pre-establish the mapping relationship between the identification information APP _ ID of the APP and the identification information Database _ ID of the Database, and then store the APP data into the corresponding data table in the Database according to the mapping relationship. Analysis services provided by SaaS are generally analyzed on the basis of a timeline, and thus a data table is often partitioned according to time. Because the data of each APP cannot be estimated, the problem that the size of the partition cannot be estimated exists when the data table is partitioned. If too many table partitions with small partitions exist in the data table, a large amount of resources are consumed for establishing Input/Output (I/O) subsequently when the data of the table partitions are read, the existing analysis service generally adopts second-level response, the data stored in the table partitions can not be close to a reasonable size simply by time partitioning the data table, and the response speed of the analysis service is reduced.
Disclosure of Invention
In view of the above, there is a need for a data storage method, an electronic device and a computer-readable storage medium, which can make the data stored in the table partition approach a reasonable size and reduce the waste of system I/O to the maximum extent.
The first aspect of the embodiment of the present application discloses a data storage method, which is applied to a database, where the database includes at least one data table, and the data storage method includes: responding to the storage requirement of the data to be transferred and stored, and determining a data table corresponding to the data to be transferred and stored in the database; acquiring time information and residual storage space of a latest table partition of the data table, and time information and data volume of the data to be transferred; and the time information of the latest table partition is not matched with the time information of the data to be stored and/or the residual storage space of the latest table partition is smaller than the data volume of the data to be stored, a new table partition is created in the data table, and the data to be stored is stored to the newly created table partition.
By adopting the technical scheme, the data to be transferred and stored extracted at each time can be stored by establishing a new table partition for the data to be transferred and stored when the time information of the table partition is not matched with the time information of the data to be transferred and stored and/or the residual storage space of the table partition is smaller than the data amount of the data to be transferred and stored, so that the data stored in the table partition is as close to the reasonable partition size as possible, and the data time information stored in the table partition is matched with the time information of the table partition, thereby reducing the waste of system I/O to the maximum extent and facilitating the subsequent data analysis.
In a possible implementation manner, the data to be dumped includes data attributes, each data table in the database includes a table attribute, the table attribute is used to indicate a data attribute of data that can be stored in the data table, and the determining a data table in the database corresponding to the data to be dumped includes: and determining a data table corresponding to the data to be transferred according to the data attribute of the data to be transferred and the table attribute of each data table in the database.
By adopting the technical scheme, the data table corresponding to the data to be transferred and stored extracted each time can be determined, and the subsequent judgment of whether the data to be transferred and stored is a newly-built table partition or an existing table partition in the corresponding data table is facilitated.
In one possible implementation, the data to be dumped is stored in a first server, and the database is deployed in a second server, and the method further includes: and extracting the data to be transferred and stored from the first server by using a preset data extraction tool, and sending the data attribute of the data to be transferred and stored to the second server.
By adopting the technical scheme, the data stored by the first server can be transferred to the second server, so that the second server can provide the specified service function conveniently.
In one possible implementation, the method further includes: setting a time granularity of a table partition of the data table according to a Service Level Agreement (SLA) of the second server; and setting the partition size of the table partition of the data table according to the hardware configuration information of the second server, wherein the hardware configuration information at least comprises disk read/write (I/O) performance.
By adopting the technical scheme, the time granularity of the SLA setting table partition based on the analysis service provided by the second server can be realized, and the table partition is set to be a reasonable partition size according to the hardware resource of the second server.
In one possible implementation, the method further includes: and if the time information of the latest table partition and the time information of the data to be transferred do not belong to the same time granularity, determining that the time information of the latest table partition is not matched with the time information of the data to be transferred.
By adopting the technical scheme, whether the time information of the latest table partition is matched with the time information of the data to be transferred can be judged based on the time granularity of the table partition.
In a possible implementation manner, the time information of the data to be transferred includes a start time and an end time, the data amount of the data to be transferred, which is extracted from the first server by the preset data extraction tool each time, is smaller than the partition size of the table partition, and the start time and the end time of the data to be transferred belong to the same time granularity.
By adopting the technical scheme, the situation that the data to be transferred and stored cannot be stored in a newly-built table partition or an existing table partition in the data can be avoided.
In one possible implementation, the method further includes: and if the time information of the latest table partition is matched with the time information of the data to be transferred and stored and the residual storage space of the latest table partition is larger than or equal to the data volume of the data to be transferred and stored, storing the data to be transferred and stored in the latest table partition.
By adopting the technical scheme, the data to be transferred and stored extracted each time can be judged to be placed in a newly-built table partition or an existing table partition, so that the data stored in the table partition is as close to the reasonable partition size as possible, and the waste of system I/O can be reduced to the maximum extent.
In a possible implementation manner, the time information of the to-be-transferred data and the latest table partition both include a start time and an end time, and after the to-be-transferred data is stored in the latest table partition, the method further includes: and if the ending time of the data to be transferred is greater than the ending time of the latest table partition, updating the ending time of the latest table partition into the ending time of the data to be transferred.
By adopting the technical scheme, the end time of the latest table partition can be updated according to the end time of the data to be transferred.
In a possible implementation manner, the obtaining time information of a latest table partition of the data table includes: and acquiring the starting time and the ending time of the latest table partition of the data table by using a preset database query statement.
By adopting the technical scheme, the time information of the latest table partition of the data table can be acquired by utilizing the database query statement.
In one possible implementation, creating a new table partition in the data table includes: creating the new table partition in the data table based on the time information of the data to be dumped.
By adopting the technical scheme, the new table partition can be created in the data table based on the time information of the data to be transferred, so that the time information of the table partition corresponds to the time information of the data to be transferred.
In one possible implementation manner, the data storage method further includes: determining n table partitions capable of performing partition merging in the data table according to a preset partition merging rule, wherein the n table partitions are sequentially arranged according to a time sequence, and the preset partition merging rule defines a merged partition threshold; sequentially performing partition merging attempts on m adjacent table partitions taking each table partition as a starting table partition, wherein m and n are positive integers, and m is smaller than n; if the sum of the partition sizes of the m adjacent table partitions is less than or equal to the partition threshold, merging the m adjacent table partitions into a new table partition, and continuing to perform partition merging attempts on the m adjacent table partitions taking the new table partition as a starting table partition; and if the sum of the partition sizes of the m adjacent table partitions is larger than the partition threshold value, abandoning the partition combination of the m adjacent table partitions.
By adopting the technical scheme, the table partitions can be combined into the table partitions more suitable for analysis or query service, the hardware resource occupancy rate in response to the analysis or query service can be reduced, and the response time of the analysis or query service is shortened.
In a possible implementation manner, the preset partition merging rule further defines merging time information, and the determining n table partitions capable of performing partition merging in the data table according to the preset partition merging rule includes: and screening the table partitions of which the time information falls into the merging time information from the data table.
By adopting the technical scheme, the table partitions which can be subjected to partition combination can be screened from the designated data table.
In one possible implementation, the database is deployed at a second server, and the method further includes: and setting the merging time information according to the SLA of the second server.
By adopting the technical scheme, the table partition size can be dynamically adjusted based on the time granularity of the service provided by the server.
A second aspect of the embodiments of the present application discloses a data storage method, which is applied to a database, where the database includes at least one data table, and the data storage method includes: determining n table partitions capable of performing partition merging in the data table according to a preset partition merging rule, wherein the n table partitions are sequentially arranged according to a time sequence, and the preset partition merging rule defines a merged partition threshold; sequentially performing partition merging attempts on m adjacent table partitions taking each table partition as a starting table partition, wherein m and n are positive integers, and m is smaller than n; if the sum of the partition sizes of the m adjacent table partitions is less than or equal to the partition threshold, merging the m adjacent table partitions into a new table partition, and continuing to perform partition merging attempts on the m adjacent table partitions taking the new table partition as a starting table partition; and if the sum of the partition sizes of the m adjacent table partitions is larger than the partition threshold value, abandoning the partition combination of the m adjacent table partitions.
By adopting the technical scheme, the table partitions can be combined into the table partitions more suitable for analysis or query service, the hardware resource occupancy rate in response to the analysis or query service can be reduced, and the response time of the analysis or query service is shortened.
In a third aspect, an embodiment of the present application provides a computer-readable storage medium, which includes computer instructions, and when the computer instructions are executed on an electronic device, the electronic device is caused to execute the data storage method according to the first aspect or the second aspect.
In a fourth aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor and a memory, where the memory is configured to store instructions, and the processor is configured to call the instructions in the memory, so that the electronic device executes the data storage method according to the first aspect or the second aspect.
In a fifth aspect, embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to execute the data storage method according to the first aspect or the second aspect.
A sixth aspect provides an apparatus having functionality to implement the behavior of the first electronic device in the method provided in the first or second aspect. The functions may be implemented by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-described functions.
It should be understood that the computer-readable storage medium of the third aspect, the electronic device of the fourth aspect, the computer program product of the fifth aspect, and the apparatus of the sixth aspect all correspond to the method of the first aspect or the second aspect, and therefore, the beneficial effects achieved by the apparatus can refer to the beneficial effects in the corresponding methods provided above, and are not described herein again.
Drawings
Fig. 1 is a schematic view of a scenario of data storage of multiple APP data in the prior art.
Fig. 2 is an application scenario diagram of a data storage method according to an embodiment of the present application.
Fig. 3 is an interaction diagram illustrating data exchange between a first server and a second server according to an embodiment of the present application.
Fig. 4 is a schematic view of a scenario in which a table partition is created in a data table according to an embodiment of the present application.
Fig. 5 is a schematic flowchart of a data storage method according to an embodiment of the present application.
Fig. 6 is an interaction diagram illustrating data exchange between a data table management apparatus and an analysis database according to an embodiment of the present application.
Fig. 7 is a schematic view of a scenario for performing table partition merging in a data table according to an embodiment of the present application.
Fig. 8 is a schematic flowchart of a data storage method according to an embodiment of the present application.
Fig. 9 is a schematic structural diagram of a possible second server according to an embodiment of the present disclosure.
Detailed Description
In the present application, "at least one" means one or more, "and" a plurality "means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, e.g., A and/or B may represent: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The terms "first," "second," "third," "fourth," and the like in the description and in the claims and drawings of the present application, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
An application scenario diagram of the data storage method provided by the embodiment of the present application is exemplarily described below with reference to fig. 2.
The first storage area 100 is a storage area of the first server 200, the second storage area 110 is a storage area of the second server 210, and the first server 200 and the second server 210 can communicate through a network.
The first memory area 100 is used for storing source data, which may refer to data to be analyzed. For example, the user data may be developed Application (APP) user data and login data collected by a plurality of third-party vendors, developed website user data and browsing data collected by a plurality of third-party vendors, and the like.
The first storage area 100 may be a storage area on a server owned by a third party manufacturer, or may be a storage area on another server (intermediary server) not owned by the third party manufacturer, and the server of the third party manufacturer may upload the recorded operation data to the intermediary server periodically or aperiodically.
The source data stored in the first storage area 100 can be imported to the second storage area 110 through a data storage facility. The second storage area 110 is constructed with a database (a database is a collection of data tables for storing data), such as the analysis database 12. The source data stored in the first storage area 100 may be stored in the analysis database 12. The data of the analysis database 12 may be used to provide data analysis services. For example, the second storage area 110 is a data storage area of the analysis platform, APP data collected by a third party vendor is imported into an analysis database of the analysis platform, and an operator of the third party vendor can perform big data analysis on the APP data collected by the third party vendor on the analysis platform to adjust a market policy according to an analysis result, or optimize the APP to improve user experience and the like. The Analysis platform may be implemented based on a Relational on-Line Analysis Processing (ROLAP) engine, or based on other data Analysis engines, which is not limited herein.
Optionally, in some embodiments of the present application, the data storage tool may be an Extract Transform and Load (ETL) tool, or other storage tools/plug-ins that can implement the process of transferring the source data stored in the first storage area 100 to the second storage area 110.
Optionally, in some embodiments of the present application, the first storage area 100 and the second storage area 110 may be hard disks of a server. The database constructed on the second storage area 110 may be a database that satisfies other requirements (non-analysis service) instead of the analysis database.
The second server 210 is further provided with a data sheet management device 11 and an ETL tool 13. The data table management device 11 may perform partition management on the data tables in the analysis database 12, such as creating a new table partition in the data tables, merging the original table partitions, and the like. The ETL tool 13 may import the data stored in the first storage area 100 to the analysis database 12.
When the ETL tool 13 obtains the source data of the data table to be stored in the analysis database 12 from the first storage area 100, the ETL tool 13 communicates with the data table management device 11 to obtain a determination result of whether a new table partition needs to be added to the data table, if the data table management device 11 determines that a new table partition needs to be added to the data table, the ETL tool 13 imports the obtained source data into the new table partition in the data table, and if the data table management device 11 determines that a new table partition does not need to be added to the data table, the ETL tool 13 imports the obtained source data into an existing table partition in the data table.
In some embodiments of the present application, the analysis database 12 may include a plurality of data tables, each of which may correspond to different business function data, such as the analysis database 12 includes a login data table and a browse data table. The login data table is used for storing login data, and the browsing data table is used for storing browsing data. The data table management device 11 is used for performing partition management on the data table, and a plurality of table partitions can be added below the data table according to the specified time granularity. For example, a table partition may be added to the data table by hour, day, or week. The time granularity may be set and adjusted according to actual requirements, for example, according to a Service Level Agreement (SLA) between the analysis platform and a third-party vendor.
Optionally, in some embodiments of the present application, the data table management apparatus 11 may be disposed in the ETL tool 13, or may be a software framework independent of the ETL tool 13. The data table management apparatus 11 and the ETL tool 13 may be stored in the second storage area 110, or may be stored in another storage area on the second server 210.
Optionally, in some embodiments of the present application, the first storage area 100 and the second storage area 110 may also be not storage areas on a server, but storage areas of other cloud storage platforms, distributed storage systems, or computer devices (e.g., desktop computers, industrial computers) with data storage functions.
Fig. 3 is a schematic diagram illustrating data interaction between the first server 200 and the second server 210 according to an embodiment of the present application.
30. Setting time information of the table partition in the analysis database 12 according to a data analysis period of the analysis service provided by the second server 210, and setting a partition size of the table partition in the analysis database 12 according to hardware information of the second server 210;
in some embodiments of the present application, the second storage area 110 is provided with an analysis database 12, and the analysis database 12 includes a plurality of data tables, each of which may be divided into a plurality of table partitions according to actual needs. Each table partition has attributes of time information, partition size information, and the like. The time information of the table partition includes a start time and an end time, and may be set according to an SLA of the second server 210, and the content of the SLA may include a data analysis period of the provided analysis service, a data query period of the query service, and the like. For example, if the data analysis period of the analysis service provided by the second server 210 is a natural day, the start time of the table partition may be set to a time when the natural day starts, and the end time of the table partition may be set to a time when the natural day ends.
Further, the partition size of the table partition may be set according to the disk I/O performance, the memory performance, and the processor performance of the second server 210. For example, the partition size rule for a table partition may be: the reference size is S, the jitter rate is m%, that is, the partition size of the table partition may be any value from S (1-m%) to S (1+ m%). Assuming that the table partition reference size is determined to be 500M based on the disk I/O performance, the memory performance, and the processor performance of the second server 210, the jitter rate is 10%, i.e., the table partition size may be any value between 450M and 550M. Assuming that the analysis period provided by the second server 210 is a natural day, the data table is partitioned as much as possible according to the natural day to store the source data of the first storage area 100, if the partition size allows.
31. The ETL tool 13 extracts the data to be dumped from the first storage area 100.
In some embodiments of the present application, the data to be dumped is part of the source data currently extracted from the first storage area 100 by the ETL tool 13. The attribute information of the data to be dumped may include a data start time, a data end time, and a data size. The data size of the data to be transferred and stored extracted each time by the ETL tool 13 needs to be smaller than the size of the table partition, and the time granularity of the data to be transferred and stored needs to be also smaller than the time granularity of the table partition. For example, if the size of the table partition is 500M, the data amount of the data to be transferred extracted each time by the ETL tool 13 is less than 500M, and may be 50M, 100M, 150M, or the like, that is, the data amount of the data to be transferred extracted each time by the ETL tool 13 is 50M, or the data amount of the data to be transferred extracted each time by the ETL tool 13 is 100M, or the data amount of the data to be transferred extracted each time by the ETL tool 13 is 150M.
In some embodiments of the present application, the data size of the data to be transferred extracted each time by the ETL tool 13 may also be different, but the data size of the data to be transferred extracted each time by the ETL tool 13 needs to be smaller than the size of the table partition.
If the time granularity of the table partition is one natural day, the data to be transferred and stored extracted each time by the ETL tool 13 needs to be data within the same natural day. In some embodiments of the present application, the analysis database 12 includes a plurality of data tables. After the data to be transferred and stored are extracted, the data table corresponding to the data to be transferred and stored and the time information of the latest table partition of the data table are determined. The most recent table partition of a data table is the most recently created table partition of the data table. The table structure of the data table may be as shown in table 1 below:
TABLE 1
| Field(s) | Type (B) | Field definitions |
| index | string | Database main key |
| table_name | string | Data table names |
| partition_id | int | ID of table partition |
| partition_starttime | long | Table partition start time |
| partition_endtime | long | End time of watch division |
| partition_size | long | Table partition size |
| partition_current | boolean | Identifying whether a most recent table partition |
For example, the field index of table 1 has a length of 36, the field table _ name has a length of 1024, and the field partition _ id has a length of 36.
In some embodiments, the analysis database 12 may create a plurality of data tables in advance according to different business function requirements, each data table is used for storing data of one business function, and the ETL tool 13 or the data table management device 11 may determine which business function data the data to be dumped is, so as to determine the data table corresponding to the data to be dumped. When the data table corresponding to the data to be unloaded is determined, the time information of the latest table partition of the data table can be acquired through a database query statement. For example, the database Query Language is Structured Query Language (SQL), and the following SQL statements may be used to perform Query operations on the specified data table, and obtain the partition start time and the partition end time of the latest table partition of the data table: "SELECT FROM Table _ Partition _ name"% Table _ name "and Partition _ current". The database query language is not limited to SQL, but may be other data query languages.
For example, the table partition size is 500M, the ETL tool 13 extracts 80M data to be dumped from the first storage area 100 at a time, and the data table corresponding to the data to be dumped is the data table Tb1 in the analysis database 12, that is, the data table Tb1 of the analysis database 12 that is dumped to the second storage area 110.
32. The ETL tool 13 transmits attribute information of data to be dumped to the data table management apparatus 11.
For example, the ETL tool 13 may transmit information such as a data start time, a data end time, a data size, etc. of the data to be dumped to the data table management apparatus 11.
33. The data table management means 11 determines whether a new table partition needs to be created for the data to be dumped.
In some embodiments, the source data in the first storage area 100 is sequentially transferred to the second storage area 110 according to the data generation time sequence, and the data table management apparatus 11 may compare the start time and the end time of the data to be transferred with the start time and the end time of the latest table partition of the data table to determine whether the data to be transferred falls within the time period of the latest table partition, and further determine whether a new table partition needs to be created for the data to be transferred. For example, the starting time of the data to be transferred is 2019-1-18,08:00:00, the ending time is 2019-1-18,15:00:00, the data table is partitioned according to natural days, the starting time of the latest table partition of the data table is 2019-1-18,00:00:00, the ending time is 2019-1-18, and 23:59:59, and the data to be transferred can be judged to fall into the time period of the latest table partition; if the starting time of the latest table partition of the data table is 2019-1-17,00:00:00, the ending time is 2019-1-17, and 23:59:59, it can be judged that the data to be transferred does not fall into the time period of the latest table partition.
If the data to be transferred does not fall into the time period of the latest table partition, the data table management device 11 may directly determine that the data to be transferred does not satisfy the storage condition for storing the data to be transferred into the latest table partition, and at this time, a new table partition may be created according to the time information of the data to be transferred. For example, a new table partition may be created in the data table by the following SQL language:
UPDATE Table_Partiton SET partition_current=false WHERE partition_current=true and table_name=‘%table_name’
INSERT INTO Table_Partition VALUES(“uuid”,table_name,“uuid”,$data_starttime,$data_endtime,$data_size,true);
after the creation of the table partition is completed, the table partition information in the data table can be updated through the following SQL language:
UPDATE Table_Partition SET partition_size=partition_size+%data_size WHERE table_name=‘%table_name’AND partition_current=true。
and when the data to be transferred and stored fall into the time period of the latest table partition, continuously judging whether the data volume of the data to be transferred and stored is less than or equal to the residual storage space of the latest table partition. If the data amount of the to-be-dumped data is less than or equal to the remaining storage space of the latest table partition, which indicates that the latest table partition has enough space to store the to-be-dumped data, at this time, the ETL tool 13 may store the to-be-dumped data to the latest table partition without creating a new table partition. If the data size of the data to be transferred is larger than the remaining storage space of the latest table partition, indicating that the latest table partition does not have enough space to store the data to be transferred, a new table partition may be created according to the time information of the data to be transferred. For example, a new table partition may be created in the data table by the following SQL language:
UPDATE Table_Partiton SET partition_current=false AND partition_endtime=%data_starttime-1WHERE partition_current=true AND table_name=‘%table_name’
INSERT INTO Table_Partition VALUES(“uuid”,table_name,“uuid”,$data_starttime,$data_endtime,$data_size,true);
after the creation of the table partition is completed, the table partition information in the data table can be updated through the following SQL language:
UPDATE Table_Partition SET partition_size=partition_size+%data_size WHERE table_name=‘%table_name’AND partition_current=true。
34. the data table management apparatus 11 transmits the result of whether to create the table partition to the ETL tool 13. In some embodiments, the data table management apparatus 11 transmits the result of whether to create the table partition to the ETL tool 13, so that when there is a new table partition created for the data to be dumped, the ETL tool 13 stores the data to be dumped into the newly created table partition of the data table Tb1, and when there is no new table partition created for the data to be dumped, the ETL tool 13 stores the data to be dumped into the latest table partition in the data table Tb 1.
35. The ETL tool 13 stores the data to be transferred to the second storage area 110.
If the data table management device 11 creates a new table partition in the data table Tb1 for the to-be-dumped data, the ETL tool 13 stores the to-be-dumped data into the newly created table partition; if the data table management device 11 does not create a new table partition in the data table Tb1 for the to-be-dumped data, the ETL tool 13 stores the to-be-dumped data to the latest table partition in the data table Tb 1.
For example, as shown in fig. 4, in the data table Tb1, the size of the table partition is 500M, the data query granularity of the query service is one natural day, and the table partition is divided according to one natural day (24 hours). The starting time of the data d1 to be transferred is 2019-1-18,01:00:00, the ending time is 2019-1-18,01:59:59, and the data volume of the data d1 to be transferred is 100M. Table partitioning TP in a data TablemFor the most recently created table partition, the table partition TPmThe starting time of the system is 2019-1-18,00:00:00, the ending time of the system is 2019-1-18,00:59:59, and the residual storage space of the system is 80M. By comparison, it can be determined that the time information of the data to be dumped d1 falls into the table partition TPmIn time period (table partition TP)mBelonging to the same natural day as the data d1 to be dumped), but the data amount of the data d1 to be dumped exceeds the table partition TPmThe remaining storage space of (d) can not store the data d1 to be stored into the table partition TPmAt this point a new table partition TP is createdm+1. The data d1 to be transferred has the starting time of 2019-1-18,01:00:00, the ending time of 2019-1-18,01:59:59, and the table partition TPm+1Has a start time of 2019-1-18,01:00:00, an end time of 2019-1-18,01:59:59, a storage space of 500M, and a table partition TPm+1Is the newly created table partition, the remainder of whichThe storage space is also 500M, and the table is partitioned into TPm+1Belong to the same natural day as the data d1 to be dumped (i.e. the time information of the data d1 to be dumped also falls into the table partition TPm+1In the time period) at this time, the ETL tool 13 can store the data d1 to be transferred to the table partition TPm+1. It will be appreciated that the table partitions TPmAnd table partition TPm+1Corresponding to the same natural day (2019-1-18).
For another example, the size of the table partition is 500M, the data query granularity of the query service is one natural day, and the table partition is divided according to one natural day (24 hours). The starting time of the data d1 to be transferred is 2019-1-18,01:00:00, the ending time is 2019-1-18,01:59:59, and the data volume of the data d1 to be transferred is 100M. Table partitioning TP in a data TablemFor the most recently created table partition, the table partition TPmThe starting time of the system is 2019-1-18,00:00:00, the ending time of the system is 2019-1-18,00:59:59, and the residual storage space of the system is 120M. By comparison, it can be determined that the time information of the data to be dumped d1 falls into the table partition TPmAnd the data amount of the data d1 to be dumped does not exceed the table partition TPmThe ETL tool 13 may store the data d1 to be transferred to the table partition TPmWhile the table is partitioned into zones TPmWill be changed to: the starting time is 2019-1-18,00:00:00, and the ending time is 2019-1-18,01:59: 59.
For another example, the size of the table partition is 500M, and the table partition is divided according to a natural day. The starting time of the data d1 to be transferred is 2019-1-18,08:00:00, the ending time is 2019-1-18,15:00:00, and the data volume of the data d1 to be transferred is 100M. Table partitioning TP in a data TablemFor the most recently created table partition, the table partition TPmThe starting time of the system is 2019-1-17,00:00:00, the ending time of the system is 2019-1-17,23:59:59, and the residual storage space of the system is 80M. By comparison, it can be determined that the time information of the data to be dumped d1 does not fall into the table partition TPmCannot store the data d1 to be transferred to the table partition TPmAt this point a new table partition TP is createdm+1. Since the data d1 to be dumped belongs to the data of 2019-1-18, the table partition TPm+1The starting time of (1) is 2019-1-18,00:00:00, end time 2019-1-18,23:59:59, memory space 500M, due to table partition TPm+1Is a newly created table partition, the remaining storage space is also 500M, and the data amount of the data d1 to be transferred does not exceed the table partition TPm+1The ETL tool 13 can store the data d1 to be transferred to the table partition TPm+1。
In some embodiments of the present application, the message body M1 sent by the ETL tool 13 to the data table management apparatus 11:
in response to the message body M1, the data table management apparatus 11 can generate a message body M2:
the data table management device 11 determines whether a new table partition needs to be created in the data table based on the start time, the end time, and the data amount of the data to be dumped, the start time, the end time, and the remaining storage space of the table partition, and returns the determination result to the ETL tool 13.
In this way, the table partition is set to a reasonable partition size based on the SLA of the analysis service provided by the second server 210 and the hardware resources of the second server 210 itself, and it is determined whether to place the data to be transferred and stored in the newly created table partition or the existing table partition for each time of extraction, so that the data stored in the table partition is as close to the reasonable partition size as possible, thereby reducing waste of system I/O to the maximum extent and improving stability of the analysis service.
Referring to fig. 5, a data storage method provided in the embodiment of the present application is applied to the second server 210. The second server 210 is provided with an analysis database 12, where the analysis database 12 includes a plurality of data tables, and each data table may store different service function data correspondingly. In this embodiment, the data storage method includes:
500, setting time information of the table partition of the data table based on a data analysis period of the analysis service provided by the second server 210, and setting a partition size of the table partition of the data table based on hardware information of the second server 210.
502. The attribute information of the data to be dumped extracted from the first storage area 100 by the ETL tool 13 is acquired.
And 504, determining a data table corresponding to the data to be unloaded and time information of the latest table partition of the data table.
506. And judging whether the data to be transferred and stored meet the storage condition of storing the data to the latest table partition.
508. And if the data to be transferred and stored meet the storage condition of storing the data to the latest table partition, storing the data to be transferred and stored to the latest table partition.
510. And if the data to be stored does not meet the storage condition of storing the data to the latest table partition, creating a new table partition according to the attribute information of the data to be stored, and storing the data to be stored to the newly-built table partition.
In some embodiments, analyzing the data stored in the database 12 is generally time-efficient, data half a year ago may be analyzed or queried in the month dimension, data three months ago may be analyzed or queried in the week dimension, and data one month ago may be analyzed or queried in the day dimension. Dynamically adjusting the size of the table partitions based on the time granularity of the services provided by the analysis database 12, and merging the table partitions into table partitions more suitable for analysis or query services may reduce the hardware resource occupancy in responding to analysis or query services and shorten the response time of analysis or query services.
Fig. 6 is a schematic diagram illustrating data interaction between the data table management apparatus 11 and the analysis database 12 according to an embodiment of the present application. The analysis database 12 is provided with a partition merging function.
60. The data table management means 11 acquires time information of a table partition of a specified data table in the analysis database 12.
In some embodiments of the present application, the designated data table is a data table in the analysis database 12 that needs to be partition-merged, and the designated data table may be determined according to an actual merging requirement. The time information of the table partition includes a start time and an end time.
61. The data table management device 11 determines the table partition capable of partition merging in the specified data table according to a preset partition merging rule.
In some embodiments of the present application, the data table management apparatus 11 may dynamically merge table partitions according to the time granularity of the analysis or query service, and may optimize disk I/O performance, processor occupancy, and the like in response to the analysis or query service. The preset partition merging rules can be set according to actual requirements, so that the table partitions are close to reasonable sizes in a partition merging mode as much as possible. For example, a preset partition merging rule may be set in conjunction with a service scenario. For data with longer storage time, the time granularity of the provided analysis or query service is often coarser, for example, data with storage time three months ago, and the query time granularity is measured in months. For data with newer storage time, the time granularity of the provided analysis or query service is often finer, for example, data with storage time within one month, and the query time granularity is measured in days.
The parameters that the preset partition merging rule can define include: the ID of the data table that needs to be partition merged, the time span of the table partition that needs to be partition merged, the size of the table partition to be merged into a new table partition, and the number of table partitions to be merged for each attempt. For example, the partitioning rule of the data table Tb1 between 2018-1-1 and 2018-1-31 is to partition according to natural days, the table partition size is a value between 450M and 550M, the preset partition merging rule is to merge the table partitions belonging to 2018-1-1 to 2018-1-31 in the data table Tb1, the partition threshold of the new table partition is 2G, and two adjacent table partitions can be merged each time the attempt is made. Based on the parameters defined above, the table partitions whose time belongs to 2018-1-1 to 2018-1-31 can be scanned from the data table Tb1 according to the preset partition merging rule. For example, the table partition meeting the preset partition merging rule can be scanned in the data table Tb1 through the following SQL language: SELECT FROM Table _ Partition _ cause Table _ name [% Table _ name' ORDER BY Partition _ start time ASC.
62. The data table management apparatus 11 transmits the partition merge result of the specified data table to the analysis database 12.
When the data table management device 11 determines the table partitions that can be partition-merged in the specified data table in the analysis database 12, it continues to determine the merging results of the table partitions. For example, the table partition TP is obtained by scanning the data table Tb1 to obtain a plurality of table partitions meeting the preset partition merging rule1Table partition TP2Table partition TP3Table partition TP4… table partition TPn. Table partition TP1Table partition TP2Table partition TP3… table partition TPn. Table partition TP1~TPnOptionally ordered in chronological order. In the following, the merging result of the table partitions is determined by taking an example of merging two table partitions at a time, but the merging result is not limited to two table partitions, and three adjacent table partitions may be merged at a time, or more than three table partitions may be merged.
When partition merging is carried out, merging is carried out according to the sequence of the table partitions, and the TP of the table partitions is judged1And table partition TP2Is greater than the partition threshold of the new table partition. The table may be partitioned into TPs1Partition size and table partition TP2Is added and compared with the partition threshold of the new table partition to realize the judgment of the table partition TP1And table partition TP2Is greater than the partition threshold of the new table partition. If the table is partitioned into TP1And table partition TP2Is less than or equal to the partition threshold of the new table partition, and is applied to the table partition TP1And table partition TP2Merging to obtain a first merging table partition TP11. For example, the table partition TP may be deallocated1By partitioning the table into zones TP2Is written into the table partition TP1Partition the table into TP1Named first merge table partition TP11And delete table partition TP2(ii) a Or a table partition may be newly created and named the first merged table partition TP11Partition the table into TP1And table partition TP2Writing content into the first merge table partition TP11And delete table partition TP1And table partition TP2。
In some embodiments, if the table is partitioned into TPs1And table partition TP2Is less than or equal to the partition threshold of the new table partition, indicating a table partition TP1And table partition TP2Merging can be performed, at which point the table partition TP can be made1And table partition TP2Merging to obtain a first merging table partition TP11. For example, the table partition TP may be partitioned by the SQL language1And table partition TP2Merging:
UPDATE Table_Partition SET partition_endtime=%TP2_partition_endtime WHERE table_name=‘%table_name’AND partition_id=‘%TP1_partition_id’
DELETE FROM Table_Partition WHERE partition_id=’%TP2_partition_id’。
when partition of the table TP is completed1And table partition TP2After merging, the first merging table partition TP is continuously judged11And table partition TP3Is greater than the partition threshold of the new table partition. If the first merged table is partitioned into TP11And table partition TP3Is less than or equal to the partition threshold of the new table partition, partition the table into TP3Merge to the first merge table partition TP11. If the first merged table is partitioned into TP11And table partition TP3The sum of the partition sizes is larger than the partition threshold of the new table partition, and the TP of the table partition is continuously judged3And table partition TP4Is greater than the partition threshold of the new table partition until the table partition TP is reachednAnd carrying out partition merging judgment.
If the table is partitioned into TP1And table partition TP2The sum of the partition sizes is larger than the partition threshold of the new table partition, and the TP of the table partition is continuously judged2And table partition TP3Is the sum of the partition sizes ofIf not, the value is larger than the partition threshold of the new table partition until the value is equal to the table partition TPnAnd carrying out partition merging judgment.
When the table pair partition TP is completed1~TPnAfter the partition merging judgment, the data table management device 11 may send the partition merging result to the analysis database 12.
63. The analysis database 12 performs partition merging on the specified data table based on the partition merging result. For example, analyzing the partition merge results received by the database 12 includes: [ Table partition TP1And table partition TP2Merging to obtain a first merging table partition TP11Table partition TP3Table partition TP4Table partition TP5Merging to obtain a second merged table partition TP12Table partition TP6]. The analysis database 12 partitions TP to the tables in the specified data table1And table partition TP2Merging to obtain a new table partition TP11The new table partition ID is TP11The analysis database 12 also partitions TP to the tables in the specified data table3Table partition TP4And table partition TP5Merge to obtain another new table partition TP12The new table partition ID is TP12。
In some embodiments, taking the HTTP Restful service as an example, the URL address of the request is: http:// pm _ ip: pm _ port/partitions/merge, the data sheet management means 11 sends a message body M3 to the analysis database 12:
if the analytics database 12 returns a status code 200 for message body M3, indicating that the first table partition and the second table partition were merged successfully.
As shown in FIG. 7, assuming that the partition threshold TH of the new table partition is 800M, the data table Tb1 is scanned to obtain the data table meeting the preset partition merging ruleThe plurality of table partitions includes: table partition TP1The partition size is 300M; table partition TP2The partition size is 400M; table partition TP3The partition size is 300M; table partition TP4The partition size is 200M; table partition TP5The partition size is 300M; table partition TP6The partition size is 400M. First, a table partition TP is taken1And table partition TP2Due to 300M +400M<800M, table partition TP1And table partition TP2Can be merged to form a table partition TP1And table partition TP2Merging to obtain a first merging table partition TP11The partition size is 700M; get table subregion TP again3Due to 700M +300M>800M, showing a first merge table partition TP11And table partition TP3The merging can no longer be continued, so the partition TP of the first merge table is abandoned11And table partition TP3Merging; get table subregion TP again4Due to 300M +200M<800M, table partition TP3And table partition TP4Can be merged to form a table partition TP3And table partition TP4Merging to obtain a second merging table partition TP12The partition size is 500M; get table subregion TP again5Since 500M +300M is 800M, the second merge table partition TP is indicated12May also be partitioned TP with the table5Continuing to merge and partitioning the table into TP5Merging into the second merge table partition TP12At this time, the second merge table partition TP12The partition size of (1) is changed from 500M to 800M; get table subregion TP again6Due to 800M +400M>800M, indicating a second merge table partition TP12And table partition TP6Cannot continue to merge, so abandoning the partition TP of the second merge table12And table partition TP6Merging; the final partition merging result is: [ Table partition TP1And table partition TP2Merging to obtain a first merging table partition TP11Table partition TP3Table partition TP4Table partition TP5Merging to obtain a second merged table partition TP12Table partition TP6]。
Referring to fig. 8, a data storage method provided in the embodiment of the present application is applied to the second server 210. In this embodiment, the data storage method includes:
800. scanning the designated data table in the analysis database 12 to obtain a plurality of table partitions TP meeting the preset partition merging rule1~TPnThe plurality of tables partitioning TP1~TPnAnd sorting according to the time sequence.
802. Determine adjacent table partitions TP1And table partition TP2Is greater than the partition threshold of the new table partition.
804. If the table is partitioned into TP1And table partition TP2Is less than or equal to the partition threshold of the new table partition, and is applied to the table partition TP1And table partition TP2Merging to obtain a first merging table partition TP11。
806. Judging the first merge table partition TP11And table partition TP3Is greater than the partition threshold of the new table partition.
808. If the first merged table is partitioned into TP11And table partition TP3Is less than or equal to the partition threshold of the new table partition, partition the table into TP3Merge to the first merge table partition TP11。
810. If the first merged table is partitioned into TP11And table partition TP3The sum of the partition sizes is larger than the partition threshold of the new table partition, and the adjacent table partitions TP are continuously judged3And table partition TP4Is greater than the partition threshold of the new table partition until the table partition TP is reachednAnd carrying out partition merging judgment.
812. If the table is partitioned into TP1And table partition TP2The sum of the partition sizes is larger than the partition threshold of the new table partition, and the adjacent table partitions TP are continuously judged2And table partition TP3Is greater than the partition threshold of the new table partition until the table partition TP is reachednAnd carrying out partition merging judgment.
814. Output partitioning TP for multiple tables1~TPnPartition for partition mergingThe regions merge the results into the analysis database 12.
Referring to fig. 9, a hardware structure diagram of the second server 210 according to an embodiment of the present disclosure is provided. As shown in fig. 9, the second server 210 may include a processor 2001, a memory 2002, and a communication bus 2003. The memory 2002 is used to store one or more computer programs 2004. One or more computer programs 2004 are configured for execution by the processor 2001. The one or more computer programs 2004 include instructions that may be used to be executed to implement the data storage method performed in the second server 210.
It is to be understood that the illustrated structure of the present embodiment does not constitute a specific limitation to the second server 210. In other embodiments, the second server 210 may include more or fewer components than illustrated, or combine certain components, or split certain components, or a different arrangement of components.
A memory may also be provided in the processor 2001 for storing instructions and data. In some embodiments, the memory in the processor 1001 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 2001. If the processor 2001 needs to use the instruction or data again, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 2001, thereby increasing the efficiency of the system.
In some embodiments, the processor 2001 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a SIM interface, and/or a USB interface, etc.
In some embodiments, the memory 2002 may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device. The present embodiment also provides a computer storage medium, in which computer instructions are stored, and when the computer instructions are run on an electronic device, the electronic device is caused to execute the above related method steps to implement the data storage method in the above embodiments.
The embodiment further provides a chip, which is electrically connected with the electronic device and controls the electronic device to execute the relevant method steps to realize the data storage method in the embodiment. The present embodiment also provides a computer program product, which when running on a computer, causes the computer to execute the relevant steps described above, so as to implement the data storage method in the above embodiments.
In addition, embodiments of the present application also provide an apparatus, which may be specifically a chip, a component or a module, and may include a processor and a memory connected to each other; the memory is used for storing computer execution instructions, and when the apparatus runs, the processor may execute the computer execution instructions stored in the memory, so as to make the chip execute the data storage method in the above-mentioned embodiments of the methods.
The second server, the computer storage medium, the computer program product, or the chip provided in this embodiment are all configured to execute the corresponding method provided above, so that the beneficial effects achieved by the second server, the computer storage medium, the computer program product, or the chip may refer to the beneficial effects in the corresponding method provided above, and are not described herein again.
Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the module or unit is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may be one physical unit or a plurality of physical units, that is, may be located in one place, or may be distributed to a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partially contributed to by the prior art, or all or part of the technical solutions may be embodied in the form of a software product, where the software product is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application.
Claims (16)
1. A data storage method is applied to a database, and is characterized in that the database comprises at least one data table, and the data storage method comprises the following steps:
responding to the storage requirement of the data to be transferred and stored, and determining a data table corresponding to the data to be transferred and stored in the database;
acquiring time information and residual storage space of a latest table partition of the data table, and time information and data volume of the data to be transferred; and
if the time information of the latest table partition is not matched with the time information of the data to be transferred and stored and/or the residual storage space of the latest table partition is smaller than the data volume of the data to be transferred and stored, a new table partition is created in the data table, and the data to be transferred and stored is stored in the newly created table partition.
2. The data storage method of claim 1, wherein the data to be dumped comprises data attributes, each data table in the database comprises table attributes, the table attributes are used for indicating the data attributes of the data which can be stored in the data table, and the determining the data table in the database corresponding to the data to be dumped comprises:
and determining a data table corresponding to the data to be transferred according to the data attribute of the data to be transferred and the table attribute of each data table in the database.
3. The data storage method of claim 2, wherein the data to be dumped is stored at a first server and the database is deployed at a second server, the method further comprising:
and extracting the data to be transferred and stored from the first server by using a preset data extraction tool, and sending the data attribute of the data to be transferred and stored to the second server.
4. The data storage method of claim 3, wherein the method further comprises:
setting a time granularity of a table partition of the data table according to a Service Level Agreement (SLA) of the second server;
and setting the partition size of the table partition of the data table according to the hardware configuration information of the second server, wherein the hardware configuration information at least comprises disk read/write (I/O) performance.
5. The data storage method of claim 4, wherein the method comprises:
and if the time information of the latest table partition and the time information of the data to be transferred do not belong to the same time granularity, determining that the time information of the latest table partition is not matched with the time information of the data to be transferred.
6. The data storage method according to claim 4, wherein the time information of the data to be dumped comprises a start time and an end time, the preset data extraction tool extracts the data amount of the data to be dumped from the first server each time is smaller than the partition size of the table partition, and the start time and the end time of the data to be dumped belong to the same time granularity.
7. The data storage method of claim 1, wherein the method further comprises:
and if the time information of the latest table partition is matched with the time information of the data to be transferred and stored and the residual storage space of the latest table partition is larger than or equal to the data volume of the data to be transferred and stored, storing the data to be transferred and stored in the latest table partition.
8. The data storage method according to claim 7, wherein the time information of the to-be-dumped data and the latest table partition each include a start time and an end time, and the storing the to-be-dumped data after the latest table partition further comprises:
and if the ending time of the data to be transferred is greater than the ending time of the latest table partition, updating the ending time of the latest table partition into the ending time of the data to be transferred.
9. The data storage method of claim 1, wherein said obtaining time information for a most recent table partition of said data table comprises:
and acquiring the starting time and the ending time of the latest table partition of the data table by using a preset database query statement.
10. The data storage method of claim 1, wherein creating a new table partition in the data table comprises:
creating the new table partition in the data table based on the time information of the data to be dumped.
11. A data storage method according to any one of claims 1 to 10, further comprising:
determining n table partitions capable of performing partition merging in the data table according to a preset partition merging rule, wherein the n table partitions are sequentially arranged according to a time sequence, and the preset partition merging rule defines a merged partition threshold;
sequentially performing partition merging attempts on m adjacent table partitions taking each table partition as a starting table partition, wherein m and n are positive integers, and m is smaller than n;
if the sum of the partition sizes of the m adjacent table partitions is less than or equal to the partition threshold, merging the m adjacent table partitions into a new table partition, and continuing to perform partition merging attempts on the m adjacent table partitions taking the new table partition as a starting table partition;
and if the sum of the partition sizes of the m adjacent table partitions is larger than the partition threshold value, abandoning the partition combination of the m adjacent table partitions.
12. The data storage method of claim 11, wherein the pre-partition merge rule further defines merge time information, and the determining n table partitions capable of partition merging in the data table according to the pre-partition merge rule comprises:
and screening n table partitions of which the time information of the table partitions falls into the merging time information from the data table.
13. The data storage method of claim 12, wherein the database is deployed at a second server, the method further comprising:
and setting the merging time information according to the SLA of the second server.
14. A data storage method is applied to a database, and is characterized in that the database comprises at least one data table, and the data storage method comprises the following steps:
determining n table partitions capable of performing partition merging in the data table according to a preset partition merging rule, wherein the n table partitions are sequentially arranged according to a time sequence, and the preset partition merging rule defines a merged partition threshold;
sequentially performing partition merging attempts on m adjacent table partitions taking each table partition as a starting table partition, wherein m and n are positive integers, and m is smaller than n;
if the sum of the partition sizes of the m adjacent table partitions is less than or equal to the partition threshold, merging the m adjacent table partitions into a new table partition, and continuing to perform partition merging attempts on the m adjacent table partitions taking the new table partition as a starting table partition;
and if the sum of the partition sizes of the m adjacent table partitions is larger than the partition threshold value, abandoning the partition combination of the m adjacent table partitions.
15. A computer-readable storage medium storing computer instructions that, when executed on an electronic device, cause the electronic device to perform the data storage method of any one of claims 1 to 14.
16. An electronic device, comprising a processor and a memory, the memory configured to store instructions, the processor configured to invoke the instructions in the memory such that the electronic device performs the data storage method of any one of claims 1 to 14.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011104762.8A CN112328592B (en) | 2020-10-15 | 2020-10-15 | Data storage method, electronic device, and computer-readable storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011104762.8A CN112328592B (en) | 2020-10-15 | 2020-10-15 | Data storage method, electronic device, and computer-readable storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN112328592A true CN112328592A (en) | 2021-02-05 |
| CN112328592B CN112328592B (en) | 2024-11-29 |
Family
ID=74313725
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202011104762.8A Active CN112328592B (en) | 2020-10-15 | 2020-10-15 | Data storage method, electronic device, and computer-readable storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN112328592B (en) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11537613B1 (en) | 2021-10-29 | 2022-12-27 | Snowflake Inc. | Merge small file consolidation |
| US11593306B1 (en) * | 2021-10-29 | 2023-02-28 | Snowflake Inc. | File defragmentation service |
| CN116701382A (en) * | 2023-08-03 | 2023-09-05 | 成都数默科技有限公司 | Automatic efficient data rollback method based on clickhouse database |
| CN118747173A (en) * | 2024-09-04 | 2024-10-08 | 卓望数码技术(深圳)有限公司 | A business data processing method and device |
| CN119357199A (en) * | 2024-12-25 | 2025-01-24 | 浙江大华技术股份有限公司 | Database partition table management method, electronic device and computer readable storage medium |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101216821A (en) * | 2007-01-05 | 2008-07-09 | 中兴通讯股份有限公司 | Data acquisition system storage management method |
| CN105095393A (en) * | 2015-06-30 | 2015-11-25 | 努比亚技术有限公司 | Method and device for data storage |
| CN109542961A (en) * | 2018-10-19 | 2019-03-29 | 中国平安财产保险股份有限公司 | Date storage method, device, computer equipment and storage medium |
| CN110727685A (en) * | 2019-10-09 | 2020-01-24 | 苏州浪潮智能科技有限公司 | A data compression method, device and storage medium based on Cassandra database |
-
2020
- 2020-10-15 CN CN202011104762.8A patent/CN112328592B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101216821A (en) * | 2007-01-05 | 2008-07-09 | 中兴通讯股份有限公司 | Data acquisition system storage management method |
| CN105095393A (en) * | 2015-06-30 | 2015-11-25 | 努比亚技术有限公司 | Method and device for data storage |
| CN109542961A (en) * | 2018-10-19 | 2019-03-29 | 中国平安财产保险股份有限公司 | Date storage method, device, computer equipment and storage medium |
| CN110727685A (en) * | 2019-10-09 | 2020-01-24 | 苏州浪潮智能科技有限公司 | A data compression method, device and storage medium based on Cassandra database |
Non-Patent Citations (1)
| Title |
|---|
| 褚艳: "Oracl8数据库分区的管理和使用", 甘肃科技, vol. 20, no. 5, 31 May 2004 (2004-05-31), pages 59 - 61 * |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11537613B1 (en) | 2021-10-29 | 2022-12-27 | Snowflake Inc. | Merge small file consolidation |
| US11593306B1 (en) * | 2021-10-29 | 2023-02-28 | Snowflake Inc. | File defragmentation service |
| CN116701382A (en) * | 2023-08-03 | 2023-09-05 | 成都数默科技有限公司 | Automatic efficient data rollback method based on clickhouse database |
| CN116701382B (en) * | 2023-08-03 | 2023-10-20 | 成都数默科技有限公司 | Automatic efficient data rollback method based on clickhouse database |
| CN118747173A (en) * | 2024-09-04 | 2024-10-08 | 卓望数码技术(深圳)有限公司 | A business data processing method and device |
| CN119357199A (en) * | 2024-12-25 | 2025-01-24 | 浙江大华技术股份有限公司 | Database partition table management method, electronic device and computer readable storage medium |
| CN119357199B (en) * | 2024-12-25 | 2025-04-11 | 浙江大华技术股份有限公司 | Database partition table management method, electronic device and computer-readable storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN112328592B (en) | 2024-11-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112328592B (en) | Data storage method, electronic device, and computer-readable storage medium | |
| CN108287708B (en) | Data processing method and device, server and computer readable storage medium | |
| CN108536745B (en) | Shell-based data table extraction method, terminal, equipment and storage medium | |
| US20200019881A1 (en) | Feature processing method and feature processing system for machine learning | |
| CN111339073A (en) | Real-time data processing method and device, electronic equipment and readable storage medium | |
| CN112699142A (en) | Cold and hot data processing method and device, electronic equipment and storage medium | |
| CN112506486A (en) | Search system establishing method and device, electronic equipment and readable storage medium | |
| CN115878027B (en) | A storage object processing method, device, terminal and storage medium | |
| CN113779426A (en) | Data storage method and device, terminal equipment and storage medium | |
| CN107391402A (en) | A kind of data operating method, device and a kind of data operation card | |
| CN111445319A (en) | Voucher generation method and device, computer equipment and storage medium | |
| CN117632860A (en) | Method and device for merging small files based on Flink engine and electronic equipment | |
| CN110222046B (en) | List data processing method, device, server and storage medium | |
| CN116821493A (en) | Message push method, device, computer equipment and storage medium | |
| CN115408546A (en) | Time sequence data management method, device, equipment and storage medium | |
| CN107609038B (en) | Data cleaning method and device | |
| CN112231292B (en) | File processing method, device, storage medium and computer equipment | |
| CN111651531A (en) | Data import method, device, device and computer storage medium | |
| CN114493642A (en) | User portrait label generation method and device, computing device and storage medium | |
| CN111459411B (en) | Data migration method, device, equipment and storage medium | |
| CN116755660A (en) | Methods, devices, computer equipment and storage media for determining the resources required for the project | |
| CN115904238A (en) | Storage method and device based on data integration, computer equipment and storage medium | |
| CN112632266B (en) | Data writing method and device, computer equipment and readable storage medium | |
| CN113626439A (en) | A data processing method, device, data processing equipment and storage medium | |
| US20120233224A1 (en) | Data processing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |