Data storage method and device
Technical Field
The invention relates to the field of financial technology (Fintech), in particular to a data storage method and device.
Background
With the development of computer technology, more and more technologies (such as distributed architecture, cloud computing or big data) are applied in the financial field, and the traditional financial industry is gradually shifting to the financial technology, and big data technology is no exception. But higher requirements are also put forward on the big data technology due to the requirements of safety and instantaneity in the financial and payment industries.
With respect to data storage, most enterprises choose to run their distributed databases on X86-based servers; wherein, the X86 architecture server is a server using Intel or other processing chip compatible with X86 instruction set.
With the application of the ARM architecture server, the enterprise applies the ARM architecture server to the storage of data in order to avoid risks; the ARM architecture server is a server using a processor chip of an ARM instruction set.
Based on the operation of the distributed database under the two different types of servers, it needs to be considered that the distributed database needs to be switched in an online migration mode when the ARM framework server operates, and the online migration of data consumes a long time.
In summary, the prior art cannot provide a distributed storage system to realize fast storage of data between different types of servers.
Disclosure of Invention
The invention provides a data storage method and a data storage device, which are used for solving the problem that a distributed storage system cannot be provided to realize the rapid storage of data among different types of servers.
In a first aspect, an embodiment of the present invention provides a data storage method, where the method is applicable to a distributed storage system including at least two different types of servers; the method comprises the following steps: determining each data fragment corresponding to the data to be stored; for each data fragment, determining a leading server used for storing the data fragment from the servers of the first type, and determining a following server used for storing the data fragment from the servers of the second type; storing the data fragments to the leading server and synchronizing to the following server; the first type of server and the second type of server are any two of the at least two different types of servers.
Based on the scheme, the server of the first type is determined to be used as the leading server, the server of the second type is determined to be used as the following server, and therefore all data corresponding to the data to be stored can be stored to the leading server in a slicing mode and can be synchronized to the following server, and the data to be stored can be efficiently stored in the servers of the two different types.
As a possible implementation method, for each data slice, determining, from servers of a first type, a leading server as a server for storing the data slice, and determining, from servers of a second type, a following server as a server for storing the data slice, includes: leading servers corresponding to the data fragments of the data to be stored are not identical; and/or following servers corresponding to the data fragments of the data to be stored are not identical, and at least two following servers corresponding to the same data fragment are not identical in type.
Based on the scheme, the data fragments obtained by reasonably segmenting the data to be stored are incompletely stored on the same leading server, which is beneficial to avoiding huge pressure generated by the same leading server writing the data fragments simultaneously, and the data fragments are stored in the incompletely same leading server, so that the situation that when one leading server is abnormal, other data fragments can still be obtained from other leading servers can be avoided; the data fragments are not completely stored in the same following server, which is beneficial to avoiding the huge pressure generated by simultaneously writing the data fragments into the same following server.
As a possible implementation method, if it is determined that the servers of the first type are all in the non-working state, for each data slice, determining, from the servers of the second type, a server that is a leading server for storing the data slice.
Based on the scheme, since the second type server is synchronized with each data fragment of the first type server, when it is determined that the first type server is abnormal and cannot respond to the data query request, the second type server is determined to be a leading server for storing the data fragment from the second type server, so that the second type server is facilitated to be a responding server for the corresponding data query request.
As a possible implementation method, a data query request is received; the data query request is used for acquiring at least one data fragment; if the leading server corresponding to the data fragment is in a non-working state, acquiring the data fragment from a first following server corresponding to the data fragment; the type of the first following server is the same as the type of the leading server corresponding to the data shards.
Based on the scheme, for a data query request for acquiring a certain data fragment, when a leading server corresponding to the data fragment is abnormal and cannot respond to the data query request, the data fragment can be quickly acquired from a first following server corresponding to the data fragment; wherein the first follower server is of the same type as the leader server.
As a possible implementation method, if the first following servers are all in a non-working state, obtaining the data fragments from second following servers corresponding to the data fragments; the type of the second following server is different from the type of the leading server corresponding to the data fragment.
Based on the scheme, for a data query request for acquiring a certain data fragment, if the data query request cannot be responded due to the fact that the leading server and the first following server corresponding to the data fragment are abnormal, the data fragment can be quickly acquired from the second following server corresponding to the data fragment; wherein the second follower server is of a different type than the leader server.
As a possible implementation method, the distributed storage system includes M servers of a first type and N servers of a second type; the performance of the M first type of servers on data access matches the performance of the N second type of servers on data access.
Based on the scheme, the distributed storage system comprises M first-type servers and N second-type servers, and the performance of the M first-type servers on data access is matched with the performance of the N second-type servers on data access, so that the data fragments can be reasonably stored on the two different types of servers in the later period.
As a possible implementation method, the N second type servers are determined by: determining the number X of transactions per second to be met by the distributed storage system, and determining the number Y of the second type of servers required by the distributed storage system under the condition of meeting the number X of transactions per second through a correlation coefficient P (X, Y); wherein P (X, Y) is determined by: acquiring the transaction processing number X per second when any one historical service request is processed by a first type of server, replaying the historical service request through a second type of server, and determining the number Y of the second type of server; determining a correlation coefficient P (X, Y) between said number of transactions processed per second (X) and said number of servers of said second type (Y) by equation (1):
formula (1):
where n is the number of samples, X
i、Y
iIs the i-point observation to which variable X, Y corresponds at replay,
is the average number of samples of X,
is the sample average of Y.
Based on the scheme, the number of the second type of servers matched with the performance of the first type of servers is accurately determined in a formula calculation mode according to historical service requests.
In a second aspect, embodiments of the present invention provide a data storage apparatus, which is suitable for a distributed storage system including at least two different types of servers; the device includes: the determining unit is used for determining each data fragment corresponding to the data to be stored; the processing unit is used for determining a leading server used for storing the data fragments from the servers of the first type and determining a following server used for storing the data fragments from the servers of the second type aiming at each data fragment; storing the data fragments to the leading server and synchronizing to the following server; the first type of server and the second type of server are any two of the at least two different types of servers.
Based on the scheme, the server of the first type is determined to be used as the leading server, the server of the second type is determined to be used as the following server, and therefore all data corresponding to the data to be stored can be stored to the leading server in a slicing mode and can be synchronized to the following server, and the data to be stored can be efficiently stored in the servers of the two different types.
As a possible implementation method, the leading servers corresponding to the data segments of the data to be stored are not completely the same; and/or following servers corresponding to the data fragments of the data to be stored are not identical, and at least two following servers corresponding to the same data fragment are not identical in type.
Based on the scheme, the data fragments obtained by reasonably segmenting the data to be stored are incompletely stored on the same leading server, which is beneficial to avoiding huge pressure generated by the same leading server writing the data fragments simultaneously, and the data fragments are stored in the incompletely same leading server, so that the situation that when one leading server is abnormal, other data fragments can still be obtained from other leading servers can be avoided; the data fragments are not completely stored in the same following server, which is beneficial to avoiding the huge pressure generated by simultaneously writing the data fragments into the same following server.
As a possible implementation method, the processing unit is further configured to determine, for each data slice, a master server from the servers of the second type as a master server for storing the data slice if it is determined that the servers of the first type are all in a non-working state.
Based on the scheme, since the second type server is synchronized with each data fragment of the first type server, when it is determined that the first type server is abnormal and cannot respond to the data query request, the second type server is determined to be a leading server for storing the data fragment from the second type server, so that the second type server is facilitated to be a responding server for the corresponding data query request.
As a possible implementation method, the processing unit is further configured to receive a data query request; the data query request is used for acquiring at least one data fragment; if the leading server corresponding to the data fragment is in a non-working state, acquiring the data fragment from a first following server corresponding to the data fragment; the type of the first following server is the same as the type of the leading server corresponding to the data shards.
Based on the scheme, for a data query request for acquiring a certain data fragment, when a leading server corresponding to the data fragment is abnormal and cannot respond to the data query request, the data fragment can be quickly acquired from a first following server corresponding to the data fragment; wherein the first follower server is of the same type as the leader server.
As a possible implementation method, the processing unit is further configured to obtain the data fragment from a second following server corresponding to the data fragment if the first following servers are both in a non-working state; the type of the second following server is different from the type of the leading server corresponding to the data fragment.
Based on the scheme, for a data query request for acquiring a certain data fragment, if the data query request cannot be responded due to the fact that the leading server and the first following server corresponding to the data fragment are abnormal, the data fragment can be quickly acquired from the second following server corresponding to the data fragment; wherein the second follower server is of a different type than the leader server.
As a possible implementation manner, the distributed storage system includes M servers of a first type and N servers of a second type; the performance of the M first type of servers on data access matches the performance of the N second type of servers on data access.
Based on the scheme, the distributed storage system comprises M first-type servers and N second-type servers, and the performance of the M first-type servers on data access is matched with the performance of the N second-type servers on data access, so that the data fragments can be reasonably stored on the two different types of servers in the later period.
As a possible implementation manner, the N second type servers are determined by the following method, including: determining the number X of transactions per second to be met by the distributed storage system, and determining the number Y of the second type of servers required by the distributed storage system under the condition of meeting the number X of transactions per second through a correlation coefficient P (X, Y); wherein P (X, Y) is determined by: acquiring the transaction processing number X per second when any one historical service request is processed by a first type of server, replaying the historical service request through a second type of server, and determining the number Y of the second type of server; determining a correlation coefficient P (X, Y) between said number of transactions processed per second (X) and said number of servers of said second type (Y) by equation (1):
formula (1):
where n is the number of samples, X
i、Y
iIs the i-point observation to which variable X, Y corresponds at replay,
is the average number of samples of X,
is a sample of YAverage number.
Based on the scheme, the number of the second type of servers matched with the performance of the first type of servers is accurately determined in a formula calculation mode according to historical service requests.
In a third aspect, an embodiment of the present invention provides a computing device, including:
a memory for storing program instructions;
a processor for calling program instructions stored in said memory to perform a method according to any of the first aspects in accordance with the obtained program.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the method according to any one of the first aspect.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a data storage method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of two different types of servers storing data according to an embodiment of the present invention;
fig. 3 is a data storage device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In some enterprises, as the amount of data continues to increase, higher demands are placed on the storage of data. As a common solution, a distributed database is used to store mass data. Distributed databases typically use smaller computer systems, each of which can be individually located in a single location, each of which may have a complete copy, or a partial copy, of a DBMS (Database Management System) and its own local Database, and many computers located at different locations are interconnected via a network to form a complete, globally logically centralized, physically distributed large Database.
The method comprises the steps that a large enterprise records collected client updating information of clients related to the enterprise by adopting a distributed database, and the number of the clients related to the large enterprise is 30000. It is conceivable that each customer is given its own serial number according to personal attribute information such as the customer's identification number, cell phone number, etc., whereby numbers 1, 2, 3, … … through 30000, which are 30000 customer numbers, can be generated.
For how to record the client update information of the clients with the 30000 client numbers in the distributed database, the means adopted in the prior art may be to store the client update information of the 30000 clients on at least three X86 architecture servers; further, it may be assumed that three X86 architecture servers are selected to store the client update information; for convenience of description, the three X86 architecture servers are respectively referred to as X86 architecture server _1, X86 architecture server _2, and X86 architecture server _ 3. If the X86 architecture server _1 can be used to write the client update information of clients No. 1 to 10000, the client update information of clients No. 1 to 10000 in the X86 architecture server _1 can be synchronized to the X86 architecture server _2 and the X86 architecture server _3 at the same time for the purpose of data backup; using X86 architecture server _2 to write the client update information of clients with numbers 10001 to 20000, the client update information of clients with numbers 10001 to 20000 in X86 architecture server _2 can be synchronized to X86 architecture server _1 and X86 architecture server _3 at the same time for data backup; the X86 architecture server _3 is used to write the client update information of clients 20001 # to 30000 # and simultaneously, the client update information of clients 20001 # to 30000 # in the X86 architecture server _3 can be synchronized to the X86 architecture server _1 and the X86 architecture server _2 for the purpose of data backup.
Regarding the X86 architecture server in the prior art, there is a risk of provisioning and blocking, which is specifically represented as: x86 architecture chips are typically sold by the intellectual property of the X86 architecture under the crisis of intel, presenting a certain supply and containment risk. Based on the actual research and study on servers different from the X86 architecture, the storage of the data required by the distributed database can be realized by an ARM architecture server without monopoly risk; in which ARM architecture servers can generally be provided by multiple vendors, and thus there is no risk of monopolization.
As with the previous example, when X86 fabric Server _1, X86 fabric Server _2, and X86 fabric Server _3 are at risk of monopoly, the client update information for the client existing on them can be transferred to the ARM fabric Server. However, the following problems may occur during the data transfer process: if the distributed database needs to run on an ARM architecture server, the distributed database can only run on the ARM server by using a binary installation package with source codes compiled into ARM versions, and if the distributed database is switched online in a migration mode, the data needs to be migrated online, so that the time consumption is long.
Based on the problems existing in the prior art, the embodiment of the invention now provides the following solutions:
as shown in fig. 1, a data storage method provided for an embodiment of the present invention is applicable to a distributed storage system including at least two different types of servers; the method comprises the following steps:
step 101, determining each data fragment corresponding to the data to be stored.
Step 102, aiming at each data fragment, determining a leading server used for storing the data fragment from a first type of server, and determining a following server used for storing the data fragment from a second type of server; storing the data fragments to the leading server and synchronizing to the following server; the first type of server and the second type of server are any two of the at least two different types of servers.
Based on the scheme, the server of the first type is determined to be used as the leading server, the server of the second type is determined to be used as the following server, and therefore all data corresponding to the data to be stored can be stored to the leading server in a slicing mode and can be synchronized to the following server, and the data to be stored can be efficiently stored in the servers of the two different types.
The data storage method is described by using two different types of servers, namely an X86 architecture server and an ARM architecture server; of course, the data storage method provided in the embodiment of the present invention may also be applicable to other types of architecture servers, and the present invention is not limited thereto.
In the step 101, each data slice corresponding to the data to be stored is determined.
For example, the data to be stored is the aforementioned client update information of 30000 clients; wherein, each data fragment is the client update information of the client with the number of 1-10000, the client update information of the client with the number of 20001-30000 and the client update information of the client with the number of 20001-30000.
In step 102, for each data fragment, determining a leading server from the servers of the first type as a server for storing the data fragment, and determining a following server from the servers of the second type as a server for storing the data fragment; storing the data fragments to the leading server and synchronizing to the following server; the first type of server and the second type of server are any two of the at least two different types of servers.
If facing client update information of a client with data shards of 1-10000, the dominant server storing the data shards needs to be determined first. Two X86 architecture servers and two ARM architecture servers are available for selection, and for the consideration of data storage reliability, the two X86 architecture servers can be used as a first type server, and the two ARM architecture servers can be used as a second type server; the two X86 architecture servers may be referred to as X86 architecture server _ a and X86 architecture server _ B, respectively, and the two ARM architecture servers may be referred to as ARM architecture server _ a and ARM architecture server _ B, respectively.
Next, determining X86 architecture server _ a as a leading server for storing client update information of clients whose data pieces are No. 1-10000 from the first type of server; further, ARM architecture server _ a is determined from the second type of server to be a follower server that stores client update information for clients whose data is fragmented into numbers 1-10000.
It should be noted that, in the embodiment of the present invention, the X86 architecture server _ B may also be used as a master server for storing client update information of clients with data fragmentation numbers 1 to 10000, which is not limited to the present invention; the embodiment of the present invention may also use the ARM framework server _ B as a following server for storing the client update information of the client with the number of data fragmentation 1-10000, which is not limited in the present invention.
Finally, storing the client update information of the client with the number of the data fragmentation 1-10000 to the leading server, namely writing the client update information of the client with the number of the data fragmentation 1-10000 into the X86 architecture server _ A; for data backup purposes, at the same time or at some point in the future, the client update information of clients with data fragmentation number 1-10000 can be synchronized to the following server, i.e. the client update information of clients with data fragmentation number 1-10000 is written to the X86 architecture server _ a.
It should be noted that, for the purpose of data backup, at the same time or at some time point in the future, it is also necessary to synchronize the client update information of clients with data fragmentation numbers 1-10000 to the X86 architecture server _ B which is the same type as the master server (X86 architecture server _ a).
As a possible implementation method, for each data slice, determining, from servers of a first type, a leading server as a server for storing the data slice, and determining, from servers of a second type, a following server as a server for storing the data slice, includes: leading servers corresponding to the data fragments of the data to be stored are not identical; and/or following servers corresponding to the data fragments of the data to be stored are not identical, and at least two following servers corresponding to the same data fragment are not identical in type.
By way of example, an example of how to store the client update information for clients with data shards 10001- > 20000 is described herein.
If customer update information is faced with how to store customers with data fragments 10001-. Whereas in the foregoing example the X86 architecture server _ a of the first type of server has been taken as the master server for storing the client update information of clients with data fragmentation of 1-10000, while taking into account the pressure of relieving the data over-concentration that may occur on the same master server, as a preferred way the X86 architecture server _ B may be determined as the master server for storing the client update information of clients with data fragmentation of 10001-20000; further, ARM architecture server _ B is determined from the second type of server as the following server to store the client update information for clients whose data fragmentation is 10001-.
Further, the client update information of the client with data fragment 10001-; for data backup purposes, at the same time or at some point in the future, the client update information of the client with data fragmentation 10001-.
It should be noted that, for the purpose of data backup, at the same time or at some time point in the future, the client update information of the clients with data fragments 10001 and 20000 needs to be synchronized to the X86 architecture server _ a which is the same type as the master server (X86 architecture server _ B).
Fig. 2 is a schematic diagram of two different types of servers for storing data according to an embodiment of the present invention. Wherein, X86_ A is the abbreviation of X86 architecture server _ A, similarly, X86_ B is the abbreviation of X86 architecture server _ B, ARM _ A is the abbreviation of ARM architecture server _ A, and ARM _ B is the abbreviation of ARM architecture server _ B; the X86_ A is used as a leading server, and is written with and stores the client update information of clients with the number of data fragment 1-10000, and the X86_ B and ARM _ A which are used as following servers are also written with and store the client update information of clients with the number of data fragment 1-10000 at the same time or at a certain future time point; similarly, X86_ B as the leading server writes and stores the client update information of client # 10001-.
As a possible implementation method, if it is determined that the servers of the first type are all in the non-working state, for each data slice, determining, from the servers of the second type, a server that is a leading server for storing the data slice.
In the distributed storage system formed by the four servers of the X86 architecture server _ A, X86 architecture server _ B, ARM architecture server _ a and the ARM architecture server _ B, the X86 architecture server _ a stores the client update information of the client No. 1-10000 data shards as the leading server, and stores the client update information of the client No. 10001-20000 data shards as the following server; for the X86 architecture server _ B, it stores the client update information of clients No. 10001 and 20000 data fragments as the leading server, and it stores the client update information of clients No. 1-10000 data fragments as the following server; for the ARM architecture server _ A, the client update information of a client with the number of 1-10000 data fragments is stored as a following server; for ARM architecture server _ B, it stores the client update information of data fragments 10001 and 20000 clients as the following servers.
For the client update information of the client with the number of data shards 1-10000, when the leading server (X86 architecture server _ A) is in a normal state, the client update information of the client with the number of data shards 1-10000 can be written in the X86 architecture server _ A, and simultaneously or at a certain time point in the future, the client update information of the client with the number of data shards 1-10000 is respectively written in two following servers, namely an X86 architecture server _ B and an ARM architecture server _ A;
when the X86 architecture server _ A is confirmed to be abnormal and the X86 architecture server _ B is in a normal state, the X86 architecture server _ B can be selected as a new leading server, the client update information of the client with the number of data fragmentation 1-10000 can be written in the X86 architecture server _ B, and the client update information of the client with the number of data fragmentation 1-10000 can be respectively written in two following servers at a certain time point in the future, wherein one server is the existing ARM architecture server _ A, and the other server can be the ARM architecture server which is newly put into use; (Note that the ARM architecture server is newly used instead of the X86 architecture server because the risk of monopolizing the X86 architecture server is considered, so that the ARM architecture server is directly started to be used as a data storage server after the X86 architecture server _ A is abnormal.)
When it is confirmed that the X86 configuration server _ A is abnormal and the X86 configuration server _ B is also abnormal, that is, it is determined that the first type of server is in a non-operating state, then ARM architecture server _ a can be selected as the new leading server, i.e. ARM architecture server _ a is converted from the previous following server to the leading server in the current state, client update information for clients No. 1-10000 data fragments can be written on ARM architecture server a, at the same time, at a certain time point in the future, the client update information of the client with the number of 1-10000 data fragments is respectively written into two following servers, the two following servers may be both newly-used ARM architecture servers, or may be an existing ARM architecture server _ B and a newly-used ARM architecture server, and the present invention is not limited thereto.
For the client update information of the data fragment 10001-20000 client, when the master server (X86 architecture server _ B) is in a normal state, the client update information of the data fragment 10001-20000 client can be written into the X86 architecture server _ B, and at the same time or at a certain time point in the future, the client update information of the data fragment 10001-20000 client is written into two following servers, namely, the X86 architecture server _ a and the ARM architecture server _ B, respectively;
when it is determined that the X86 architecture server _ B is abnormal and the X86 architecture server _ a is in a normal state, the X86 architecture server _ a may be selected as a new leading server, and the client update information of the data fragment 10001-; (Note that the ARM architecture server is newly used instead of the X86 architecture server because the risk of monopolizing the X86 architecture server is considered, so that the ARM architecture server is directly started to be used as a data storage server after the X86 architecture server _ A is abnormal.)
When it is confirmed that the X86 configuration server _ B is abnormal and the X86 configuration server _ A is also abnormal, that is, it is determined that the first type of server is in a non-operating state, ARM architecture server _ B can be selected as the new leading server, i.e. ARM architecture server _ B is converted from the previous following server to the leading server in the current state, by writing data fragments 10001 and 20000 clients' client update information on ARM framework server _ B, meanwhile, at a certain time point in the future, the client update information of the client with data fragmentation 10001-20000 is respectively written into two following servers, the two following servers may be both newly-used ARM architecture servers, or may be an existing ARM architecture server _ a and an existing ARM architecture server that is newly-used, and the present invention is not limited thereto.
As a possible implementation method, a data query request is received; the data query request is used for acquiring at least one data fragment; if the leading server corresponding to the data fragment is in a non-working state, acquiring the data fragment from a first following server corresponding to the data fragment; the type of the first following server is the same as the type of the leading server corresponding to the data shards.
As in the previous example, when a data query request is received, if the update information of the client 9999 is to be queried, the master server storing the client information of the client 9999 is accessed first, that is, the X86 architecture server _ a is accessed; when the X86 architecture server _ A is in normal state, the client update information of the client No. 9999 can be directly acquired from the X86 architecture server _ A; when the X86 configuration server _ a is abnormal, that is, the X86 configuration server _ a is in a non-working state, the client update information of the client No. 9999 is obtained from one of the X86 configuration server _ B in the two follower servers (X86 configuration server _ B and ARM configuration server _ a), that is, the X86 configuration server _ B is the first follower server, and the X86 configuration server _ B and the leading server (X86 configuration server _ a) are the same type of server.
As a possible implementation method, if the first following servers are all in a non-working state, obtaining the data fragments from second following servers corresponding to the data fragments; the type of the second following server is different from the type of the leading server corresponding to the data fragment.
As in the previous example, when a data query request is received, if the client update information of client No. 19999 needs to be queried, the master server storing the client information of client No. 19999 is accessed first, that is, the X86 architecture server _ B is accessed; when the X86 architecture server _ B is in a normal state, the client update information of client No. 19999 can be directly acquired from the X86 architecture server _ B; when an abnormality occurs in the X86 architecture server _ B, that is, the X86 architecture server _ B is in a non-working state, client update information of the client No. 19999 is obtained from the server X86 architecture server _ a of the two following servers (X86 architecture server _ a and ARM architecture server _ B), that is, the X86 architecture server _ a is the first following server, and the X86 architecture server _ a and the leading server (X86 architecture server _ B) are the same type of server; further, when the X86 architecture server _ a is abnormal, that is, the X86 architecture server _ a is in a non-working state, the client update information of the client No. 19999 is obtained from the only remaining following server (ARM architecture server _ B), that is, the ARM architecture server _ B is the second following server, and the X86 architecture server _ B and the leading server (X86 architecture server _ B) are different types of servers.
As a possible implementation method, the distributed storage system includes M servers of a first type and N servers of a second type; the performance of the M first type of servers on data access matches the performance of the N second type of servers on data access.
In deploying a distributed database to two different types of servers, it is important to achieve a perfect match in the performance of the two different types of services, given the fact that the performance of the different types of servers is not generally equal. With respect to the foregoing example, it is necessary to determine how many ARM architecture servers can achieve performance matching with how many X86 architecture servers.
The distributed storage system in the embodiment of the present invention includes M first type servers and N second type servers, and specifically determines how many N values of the second type server (ARM architecture server) are equivalent to the performance of the M first type servers (X86 architecture server), where one of the ways is to find a performance equivalence ratio through actual pressure measurement in a specific scenario: when the CPU utilization rate of the X86 architecture server is confirmed to be within 80%, the maximum TPS which can be provided is TPS MAX, and the minimum TPS is expected to be TPS MIN; and then, the ARM architecture server is tested in a pressing mode, and the number of the ARM architecture servers needed for reaching the TPS MAX and the number of the ARM architecture servers needed for reaching the TPS MIN are confirmed.
Among them, TPS (Transactions Per Second, the number of Transactions transmitted Per Second), i.e., the number of Transactions processed Per Second by a server, is generally used to describe the processing capability of a system.
As a possible implementation manner, the N second type servers are determined by the following method, including: determining the number X of transactions per second to be met by the distributed storage system, and determining the number Y of the second type of servers required by the distributed storage system under the condition of meeting the number X of transactions per second through a correlation coefficient P (X, Y); wherein P (X, Y) is determined by: acquiring the transaction processing number X per second when any one historical service request is processed by a first type of server, replaying the historical service request through a second type of server, and determining the number Y of the second type of server; determining a correlation coefficient P (X, Y) between said number of transactions processed per second (X) and said number of servers of said second type (Y) by equation (1):
formula (1):
where n is the number of samples, X
i、Y
iIs the i-point observation to which variable X, Y corresponds at replay,
is the average number of samples of X,
is the sample average of Y.
The distributed storage system in the embodiment of the present invention includes M first type servers and N second type servers, and specifically determines how many N values of the second type server (ARM architecture server) are equivalent to the performance of the M first type servers (X86 architecture servers), and may also determine the N values by performing data mining on a large number of samples and performing correlation analysis, which is specifically represented as follows:
for example, for all historical service request traffic, the requested traffic of the X86 architecture server is replayed at the ARM architecture server by means of TCPCOPY, and the correlation between the TPS of the ARM architecture server and the number of ARM architecture servers is found, where the correlation between the TPS of the ARM architecture server and the ARM architecture server is expressed by using a statistical method of Pearson correlation coefficients.
Assuming that all historical service request traffic processed by the X86 architecture server is 10000000, for any one of the 10000000 historical service request traffic, it is recorded as traffic _ 1: under the condition of consuming 3s, 600 transactions are processed in total, namely the number of processed transactions per second is 200; the traffic _1 is replayed by an ARM architecture server in a TCPCOPY mode, and the replay work of the traffic _1 can be realized under the condition that 2 ARM architecture servers consume 3s, namely the number of the ARM architecture servers is determined to be 2; for another example, for any historical service request traffic in the 10000000 historical service request traffic, it is recorded as traffic _ 2: under the condition of consuming 2s, 680 transactions are processed in total, namely the number of processed transactions per second is 340; the traffic _2 is replayed by the ARM architecture server in a TCPCOPY mode, and the replay operation of the traffic _2 can be realized under the condition that the time consumed by 3 ARM architecture servers is determined to be 2s, namely the number of the ARM architecture servers is determined to be 3. For the replay operation of the rest of the historical service request traffic, reference may be made to the situations of the traffic _1 and the traffic _2, which are not described herein.
Assuming that traffic replay is performed on all of the 10000000 pieces of historical service request traffic, determining a relationship between the number of transactions processed per second (X) and the number of ARM architecture servers (Y) of any one of the 10000000 pieces of historical service request traffic during replay: for the flow _1, X is 200, Y is 2; for the aforementioned flow _2, X is 340, and Y is 3; for the relationship between the number of transactions processed per second (X) corresponding to any remaining historical service request traffic and the number of ARM architecture servers (Y), reference may be made to the traffic _1 and the traffic _2, which is not described herein again.
Calculating by substituting the number of transactions processed per second (X) and the number of ARM architecture servers (Y) obtained by replaying the 10000000 pieces of historical service request traffic into formula (1), a correlation coefficient P (X, Y) between the number of transactions processed per second (X) and the number of ARM architecture servers (Y) can be obtained; wherein formula (1) is represented as follows:
where n is the number of samples, X
i、Y
iIs the i-point observation to which variable X, Y corresponds at replay,
is the average number of samples of X,
is the sample average of Y. That is, in the embodiment of the present invention, the value of n is 10000000; x
iIs the i-point observation, Y, corresponding to the number of transactions processed per second (X) during replay
iIs the i-point observed value corresponding to the number (Y) of ARM architecture servers during replay: e.g., for flow _1, X thereof
iIs 200, Y
iIs 2; e.g., for flow _2, X thereof
iIs 340, Y
iIs 3.
Setting a correlation coefficient P (X, Y) between the number of transactions processed per second (X) and the number of ARM architecture servers (Y) to be 0.8 by replaying 10000000 pieces of historical service request flow on the ARM architecture servers according to a formula (1); when the number X of transactions per second to be processed which is required to be met by the distributed storage system is determined, and X is set to be 2000, the number Y of the ARM architecture servers required by processing 2000 transactions per second can be rapidly determined according to P (X, Y) and information of historical service request flow and calculation in a return value formula (1).
Based on the same conception, the embodiment of the invention also provides a data storage device, which is suitable for a distributed storage system comprising at least two different types of servers; as shown in fig. 2, the apparatus includes:
the determining unit 301 is configured to determine each data slice corresponding to the data to be stored.
A processing unit 302, configured to determine, for each data fragment, a leading server serving as a server for storing the data fragment from servers of a first type, and determine a following server serving as a server for storing the data fragment from servers of a second type; storing the data fragments to the leading server and synchronizing to the following server; the first type of server and the second type of server are any two of the at least two different types of servers.
Further, for the device, the leading servers corresponding to the data segments of the data to be stored are not completely the same; and/or following servers corresponding to the data fragments of the data to be stored are not identical, and at least two following servers corresponding to the same data fragment are not identical in type.
Further, for the apparatus, the processing unit 302 is further configured to determine, for each data slice, a server of the second type as a leading server for storing the data slice if it is determined that the servers of the first type are all in a non-working state.
Further, for the apparatus, the processing unit 302 is further configured to receive a data query request; the data query request is used for acquiring at least one data fragment; if the leading server corresponding to the data fragment is in a non-working state, acquiring the data fragment from a first following server corresponding to the data fragment; the type of the first following server is the same as the type of the leading server corresponding to the data shards.
Further, for the apparatus, the processing unit 302 is further configured to, if the first following servers are all in a non-working state, obtain the data fragment from a second following server corresponding to the data fragment; the type of the second following server is different from the type of the leading server corresponding to the data fragment.
Further, for the apparatus, the distributed storage system includes M servers of a first type and N servers of a second type; the performance of the M first type of servers on data access matches the performance of the N second type of servers on data access.
Further, for the apparatus, determining N second type servers by: determining the number X of transactions per second to be met by the distributed storage system, and determining the number Y of the second type of servers required by the distributed storage system under the condition of meeting the number X of transactions per second through a correlation coefficient P (X, Y); wherein P (X, Y) is determined by: acquiring the transaction processing number X per second when any one historical service request is processed by a first type of server, replaying the historical service request through a second type of server, and determining the number Y of the second type of server; determining a correlation coefficient P (X, Y) between said number of transactions processed per second (X) and said number of servers of said second type (Y) by equation (1):
formula (1):
where n is the number of samples, X
i、Y
iIs the i-point observation to which variable X, Y corresponds at replay,
is the average number of samples of X,
is the sample average of Y.
Embodiments of the present invention provide a computing device, which may be specifically a desktop computer, a portable computer, a smart phone, a tablet computer, a Personal Digital Assistant (PDA), and the like. The computing device may include a Central Processing Unit (CPU), memory, input/output devices, etc., the input devices may include a keyboard, mouse, touch screen, etc., and the output devices may include a Display device, such as a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), etc.
Memory, which may include Read Only Memory (ROM) and Random Access Memory (RAM), provides the processor with program instructions and data stored in the memory. In embodiments of the present invention, the memory may be used to store program instructions for a data storage method;
and the processor is used for calling the program instructions stored in the memory and executing the data storage method according to the obtained program.
An embodiment of the present invention provides a computer-readable storage medium storing computer-executable instructions for causing a computer to execute a data storage method.
It should be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.