[go: up one dir, main page]

CN110636122A - Distributed storage method, server, system, electronic device and storage medium - Google Patents

Distributed storage method, server, system, electronic device and storage medium Download PDF

Info

Publication number
CN110636122A
CN110636122A CN201910857800.8A CN201910857800A CN110636122A CN 110636122 A CN110636122 A CN 110636122A CN 201910857800 A CN201910857800 A CN 201910857800A CN 110636122 A CN110636122 A CN 110636122A
Authority
CN
China
Prior art keywords
cluster
storage
file
weight coefficient
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910857800.8A
Other languages
Chinese (zh)
Inventor
郭杨勇
王建
周英能
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongchang (hangzhou) Information Technology Co Ltd
China Mobile Communications Group Co Ltd
Original Assignee
Zhongchang (hangzhou) Information Technology Co Ltd
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongchang (hangzhou) Information Technology Co Ltd, China Mobile Communications Group Co Ltd filed Critical Zhongchang (hangzhou) Information Technology Co Ltd
Priority to CN201910857800.8A priority Critical patent/CN110636122A/en
Publication of CN110636122A publication Critical patent/CN110636122A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例涉及计算机技术领域,公开了一种分布式存储方法、服务器、系统、电子设备以及存储介质。本发明中,接收文件的写入请求;获取存储集群阵列中各集群的权重系数;权重系数根据集群的当前的存储资源生成;根据各集群的权重系数,利用哈希算法确定用于写入文件的目标集群;将文件写入所述目标存储子集群中。如此,使得存储容量需要扩大时,在不影响文件上传下载效率的情况下,避免因数据重新分布而导致存储集群读写性能的降低,提高用户对云文件读写的性能体验,同时整体架构以集群为容量扩展的单位,还可以提高了后期运维人员对这个云存储系统的运维效率。

The embodiment of the invention relates to the field of computer technology, and discloses a distributed storage method, server, system, electronic equipment and storage medium. In the present invention, the write request of the file is received; the weight coefficient of each cluster in the storage cluster array is obtained; the weight coefficient is generated according to the current storage resources of the cluster; according to the weight coefficient of each cluster, the hash algorithm is used to determine the the target cluster for ; write the file to the target storage subcluster. In this way, when the storage capacity needs to be expanded, without affecting the efficiency of file upload and download, it can avoid the reduction of storage cluster reading and writing performance due to data redistribution, and improve the performance experience of users reading and writing cloud files. At the same time, the overall architecture is based on The cluster is the unit of capacity expansion, which can also improve the operation and maintenance efficiency of the cloud storage system for later operation and maintenance personnel.

Description

分布式存储方法、服务器、系统、电子设备以及存储介质Distributed storage method, server, system, electronic device and storage medium

技术领域technical field

本发明实施例涉及计算机技术领域,特别涉及一种分布式存储方法、服务器、系统、电子设备以及存储介质。The embodiments of the present invention relate to the field of computer technology, and in particular to a distributed storage method, server, system, electronic equipment, and storage medium.

背景技术Background technique

云存储是云计算概念延伸和发展出新概念,主要通过集群应用、网络技术或分布式文件系统等网络中大量各种不同类型的存储设备通过应用软件集合起来协同工作共同对外提供数据存储和业务访问功能的一个系统,保证数据的安全性,并节约存储空间。目前常见的分布式存储系统包括GFS谷歌文件系统、Lustre平行分布式文件系统、Ceph分布式文件系统和GlusterFS网络文件系统等。其中开源的Ceph作为一个可靠的、可扩展的、统一的、分布式的存储系统解决方案,尤其受到Open Stack开源的云计算管理平台的带动,使得Ceph一进入行业便受到各个互联网公司的追捧。Cloud storage is an extension of the concept of cloud computing and a new concept developed. It mainly provides data storage and services to the outside world through the collection of a large number of different types of storage devices in the network such as cluster applications, network technology or distributed file systems through application software. A system of access functions ensures data security and saves storage space. Currently common distributed storage systems include GFS Google file system, Luster parallel distributed file system, Ceph distributed file system and GlusterFS network file system, etc. Among them, the open-source Ceph is a reliable, scalable, unified, and distributed storage system solution, especially driven by the open-source cloud computing management platform of OpenStack, making Ceph sought after by various Internet companies as soon as it enters the industry.

现有技术提供的基于Ceph分布式集群的资源调度方法主要是基于单个Ceph分布式集群提供的,通过HDD硬盘驱动器合并写以及增加SSD固态驱动器缓存等以空间换时间的方式,进行资源调度。比如通过调整单个Ceph分布式集群的读写的强一致性方案,以及调整副本数和分布式节点的选择算法来优化单个Ceph集群的读写性能。The resource scheduling method based on the Ceph distributed cluster provided by the prior art is mainly based on a single Ceph distributed cluster, and resource scheduling is performed by exchanging space for time through HDD hard drive consolidation and adding SSD solid-state drive cache. For example, by adjusting the strong consistency scheme for reading and writing of a single Ceph distributed cluster, and adjusting the number of copies and the selection algorithm of distributed nodes to optimize the read and write performance of a single Ceph cluster.

然而,在实现本发明的过程中,发明人发现:现有技术只能针对单个Ceph分布式集群用户读写时的性能体验,没有考虑到单个Ceph分布式集群存储资源较低时,如何在保证用户读写性能体验的前提下进行资源扩展。However, in the process of realizing the present invention, the inventor found that: the existing technology can only focus on the performance experience of a single Ceph distributed cluster user when reading and writing, and does not take into account how to ensure that the storage resources of a single Ceph distributed cluster are low. Resource expansion is performed on the premise of user experience in reading and writing performance.

发明内容Contents of the invention

本发明实施方式的目的在于提供一种分布式存储方法、服务器、系统、电子设备以及存储介质,使得存储容量需要扩大时,在不影响文件上传下载效率的情况下,避免因集群资源的扩展而导致存储集群读写性能的降低,提高用户对云文件读写的性能体验,同时整体架构以集群为容量扩展的单位,还可以提高了后期运维人员对这个云存储系统的运维效率。The purpose of the embodiments of the present invention is to provide a distributed storage method, server, system, electronic equipment, and storage medium, so that when the storage capacity needs to be expanded, without affecting the efficiency of file upload and download, avoid This leads to a reduction in the read and write performance of the storage cluster and improves the user's performance experience in reading and writing cloud files. At the same time, the overall architecture uses the cluster as the unit of capacity expansion, which can also improve the operation and maintenance efficiency of the cloud storage system for later operation and maintenance personnel.

为解决上述技术问题,本发明的实施方式提供了一种分布式存储方法,包括:接收文件的写入请求;获取存储集群阵列中各集群的权重系数;权重系数根据集群的当前的存储资源生成;根据各集群的权重系数,确定用于写入文件的目标集群;将文件写入所述目标集群中。In order to solve the above technical problems, the embodiment of the present invention provides a distributed storage method, including: receiving a file write request; obtaining the weight coefficient of each cluster in the storage cluster array; the weight coefficient is generated according to the current storage resources of the cluster ; Determine the target cluster for writing the file according to the weight coefficient of each cluster; write the file into the target cluster.

本发明的实施方式还提供了一种服务器,包括:请求接收模块,用于接收文件的读取或写入请求;计算模块,用于获取存储集群阵列中各集群的权重系数,并根据权重系数,利用哈希算法确定用于写入文件的目标集群;写入模块,用于将文件写入目标集群中。The embodiment of the present invention also provides a server, including: a request receiving module, used to receive a file read or write request; a calculation module, used to obtain the weight coefficient of each cluster in the storage cluster array, and according to the weight coefficient , using a hash algorithm to determine the target cluster for writing the file; the writing module is used for writing the file into the target cluster.

本发明实施方式相对于现有技术而言,以多个分布式存储集群来组成一个存储集群阵列,同时根据各集群当前的存储资源为各集群设定相应的权重系数,以使得文件以权重系数为随机的概率分布至各集群,这样可以在不影响用户读写性能体验的基础上进行存储资源的扩展或调度,同时还可以提高了后期运维人员对整个分布式存储系统的运维效率。Compared with the prior art, the embodiment of the present invention uses multiple distributed storage clusters to form a storage cluster array, and at the same time sets corresponding weight coefficients for each cluster according to the current storage resources of each cluster, so that the files are assigned with the weight coefficient It is randomly distributed to each cluster, so that storage resources can be expanded or scheduled without affecting the user's read and write performance experience, and it can also improve the operation and maintenance efficiency of the entire distributed storage system for later operation and maintenance personnel.

另外,所述权重系数的取值包括默认权重系数;其中,所述默认权重系数由数据库管理系统实时地根据所述集群当前剩余的存储容量或所述集群所在机房的网络速度生成;所述默认权重系数与所述剩余的存储容量或所述网络速度正相关。通过这样的手段,权重系数不仅能够根据集群当前剩余的存储容量来生产,也可以针对考虑机房的网速对云存储文件读写速度的影响,来调整文件的写入策略,使得用户的使用体验更加良好。In addition, the value of the weight coefficient includes a default weight coefficient; wherein, the default weight coefficient is generated by the database management system in real time according to the current remaining storage capacity of the cluster or the network speed of the computer room where the cluster is located; the default The weight coefficient is positively correlated with the remaining storage capacity or the network speed. Through this method, the weight coefficient can not only be produced according to the current remaining storage capacity of the cluster, but also can adjust the file writing strategy considering the influence of the network speed of the computer room on the cloud storage file reading and writing speed, so as to improve the user experience. better.

另外,所述权重系数还包括用户权重系数;其中所述用户权重系数由用户根据实际情况设定并保存在所述数据库管理系统中;所述获取存储集群阵列中各集群的权重系数时,优先获取所述用户权重系数。通过这样的手段,当运维人员可以根据实际情况对当前需要存储的文件的分布进行自定义的设置,能够使得存储空间的利用更加的灵活,能够应对更多的场景。In addition, the weight coefficients also include user weight coefficients; wherein the user weight coefficients are set by the user according to actual conditions and stored in the database management system; when obtaining the weight coefficients of each cluster in the storage cluster array, priority Obtain the user weight coefficient. Through this method, when the operation and maintenance personnel can customize the distribution of the files that need to be stored according to the actual situation, it can make the use of storage space more flexible and can cope with more scenarios.

另外,存储集群阵列包括主集群阵列和备集群阵列;主集群阵列和备集群阵列中的各集群分别对应;目标集群为前述主集群阵列中的集群。当目标集群故障时,将文件写入目标集群对应的备集群阵列中的集群中。通过这样的手段,能够使得整个存储系统的可靠性得到增强,当发生如机房停电等意外时,系统也能稳定且可靠地运行。In addition, the storage cluster array includes a primary cluster array and a standby cluster array; each cluster in the primary cluster array and the standby cluster array corresponds to each other; the target cluster is a cluster in the aforementioned primary cluster array. When the target cluster fails, write the file to the cluster in the standby cluster array corresponding to the target cluster. Through such a means, the reliability of the entire storage system can be enhanced, and the system can run stably and reliably even when accidents such as a power outage in the computer room occur.

另外,生成由文件的标识和目标集群的标识构成的索引信息;将索引信息存入索引集群;其中,索引集群由高速存储介质搭建。通过高速介质搭建一个索引集群,能够大大提高服务器对于用户在读取云存储文件时的响应速度,显著提高用户体验。In addition, the index information composed of the identification of the file and the identification of the target cluster is generated; the index information is stored in the index cluster; wherein, the index cluster is constructed by a high-speed storage medium. Building an index cluster through high-speed media can greatly improve the response speed of the server to users when reading cloud storage files, and significantly improve user experience.

附图说明Description of drawings

一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定。One or more embodiments are exemplified by pictures in the accompanying drawings, and these exemplifications are not intended to limit the embodiments.

图1是根据本发明第一实施方式中分布式存储方法的流程图;FIG. 1 is a flowchart of a distributed storage method according to a first embodiment of the present invention;

图2是根据本发明第二实施方式中分布式存储方法的流程图;FIG. 2 is a flowchart of a distributed storage method according to a second embodiment of the present invention;

图3是根据本发明第三实施方式中分布式存储方法的流程图;FIG. 3 is a flowchart of a distributed storage method according to a third embodiment of the present invention;

图4是根据本发明第三实施方式中存储集群阵列的结构图;4 is a structural diagram of a storage cluster array according to a third embodiment of the present invention;

图5是根据本发明第四实施方式中服务器的结构图;5 is a structural diagram of a server according to a fourth embodiment of the present invention;

图6是根据本发明第五实施方式中分布式存储系统的结构图;6 is a structural diagram of a distributed storage system according to a fifth embodiment of the present invention;

图7是根据本发明第六实施方式中电子设备的结构图。FIG. 7 is a structural diagram of an electronic device according to a sixth embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合附图对本发明的各实施方式进行详细的阐述。然而,本领域的普通技术人员可以理解,在本发明各实施方式中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施方式的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本发明的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, various implementation modes of the present invention will be described in detail below in conjunction with the accompanying drawings. However, those of ordinary skill in the art can understand that, in each implementation manner of the present invention, many technical details are provided for readers to better understand the present application. However, even without these technical details and various changes and modifications based on the following implementation modes, the technical solution claimed in this application can also be realized. The division of the following embodiments is for the convenience of description, and should not constitute any limitation to the specific implementation of the present invention, and the various embodiments can be combined and referred to each other on the premise of no contradiction.

本发明的第一实施方式涉及一种分布式存储方法,应用在服务器上。在本实施方式中,接收文件的写入请求;获取存储集群阵列中各集群的权重系数;权重系数根据集群的当前的存储资源生成;根据各集群的权重系数,确定用于写入文件的目标集群;将文件写入所述目标集群中。通过将多个分布式存储集群组成一个存储集群阵列,同时根据各集群当前的存储资源为各集群设定相应的权重系数,以使得文件以权重系数为随机的概率分布至各集群,这样可以在不影响用户读写性能体验的基础上进行存储资源的扩展或调度,同时还可以提高了后期运维人员对整个分布式存储系统的运维效率。The first embodiment of the present invention relates to a distributed storage method, which is applied on a server. In this embodiment, the write request of the file is received; the weight coefficient of each cluster in the storage cluster array is obtained; the weight coefficient is generated according to the current storage resources of the cluster; and the target for writing the file is determined according to the weight coefficient of each cluster cluster; writes the file into said target cluster. By combining multiple distributed storage clusters into a storage cluster array, and setting corresponding weight coefficients for each cluster according to the current storage resources of each cluster, so that the files are distributed to each cluster with a random probability of the weight coefficient. Storage resources can be expanded or scheduled without affecting the user's reading and writing performance experience, and at the same time, it can also improve the operation and maintenance efficiency of the entire distributed storage system for later operation and maintenance personnel.

下面对本实施方式的分布式存储方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。The implementation details of the distributed storage method of this embodiment will be described in detail below, and the following content is only the implementation details provided for easy understanding, and is not necessary for implementing this solution.

本实施方式中的分布式存储方法如图1所示,具体包括:The distributed storage method in this embodiment is shown in Figure 1, specifically includes:

步骤101,接收文件的写入请求。Step 101, receiving a file write request.

具体的说,用户首先需要通过客户端向集群的基于超文本传输协议HTTP的代理服务器发送一个上传文件的请求,与代理服务器建立通讯连接。代理服务器接收的请求中包括文件的ID。上述客户端为Ceph在各种编程语言下的原生客户端。Specifically, the user first needs to send a file upload request to the proxy server based on the hypertext transfer protocol HTTP through the client to establish a communication connection with the proxy server. The request received by the proxy server includes the ID of the file. The above clients are the native clients of Ceph in various programming languages.

步骤102,获取存储集群阵列中各集群的权重系数。Step 102, acquiring weight coefficients of each cluster in the storage cluster array.

在本实施例中,上述集群为基于Ceph分布式文件系统搭建而成的集群,存储集群阵列是一个由多个Ceph分布式文件系统集群组成的集群阵列。当集群阵列的存储空间的需要扩展时,以一个集群为单位来进行扩展。In this embodiment, the aforementioned cluster is a cluster built based on the Ceph distributed file system, and the storage cluster array is a cluster array composed of multiple Ceph distributed file system clusters. When the storage space of the cluster array needs to be expanded, a cluster is used as a unit to expand.

具体地说,为了达到合理将文件写入至分布式存储集群中,充分且均衡地利用集群存储资源的目的,每个集群都具有一个根据自身当前存储资源的具体情况来生成的权重系数。上述存储资源包括:集群当前剩余的存储空间以及集群所在服务器机房当前的网络速度。Specifically, in order to reasonably write files into distributed storage clusters and make full and balanced use of cluster storage resources, each cluster has a weight coefficient generated according to the specific conditions of its own current storage resources. The above storage resources include: the current remaining storage space of the cluster and the current network speed of the server room where the cluster is located.

具体应用中,各集群当前的存储资源信息存储在一个小型关系型数据库管理系统MySQL中,并且该系统会动态且实时地更新数据,该系统根据各集群当前的存储资源信息生成各集群当前的权重系数。代理服务器在接收到文件请求后,会立刻向MySQL请求当前各集群的权重系数并还缓存在服务器中。In a specific application, the current storage resource information of each cluster is stored in a small relational database management system MySQL, and the system will update the data dynamically and in real time. The system generates the current weight of each cluster based on the current storage resource information of each cluster coefficient. After the proxy server receives the file request, it will immediately request the weight coefficients of each cluster from MySQL and cache them in the server.

以一个例子来说明:To illustrate with an example:

集群标识Cluster ID 当前剩余存储容量βCurrent remaining storage capacity β 权重系数αWeight coefficient α 11 20%20% 0.40.4 22 20%20% 0.40.4 33 10%10% 0.20.2

如上表所示,一个存储集群阵列包括三个总存储空间相等的Ceph集群,但是各集群当前剩余的存储容量并不相等,当运维人员期望接来下写入存储集群阵列中的文件能够使各集群的空间使用率达到均匀分布的情况下,我们可以为MySQL生成各集群当前的权重系数的过程设定一个简单的公式:As shown in the above table, a storage cluster array includes three Ceph clusters with equal total storage space, but the current remaining storage capacity of each cluster is not equal. When the space usage of each cluster reaches a uniform distribution, we can set a simple formula for MySQL to generate the current weight coefficient of each cluster:

Figure BDA0002198769660000041
Figure BDA0002198769660000041

基于上述例子,当的集群阵列的存储空间需要扩展时,添加一个新的集群“集群4”,该集群的剩余存储容量为100%,因此其权重系数

Figure BDA0002198769660000042
Based on the above example, when the storage space of the cluster array needs to be expanded, a new cluster "cluster 4" is added, and the remaining storage capacity of this cluster is 100%, so its weight coefficient
Figure BDA0002198769660000042

在另一个例子中,当一个一个存储集群阵列中各集群的总存储空间不相等时,运维人员仍然期望写入文件能够均匀地分布在各集群中,则可以以各集群当前剩余的具体容量来进行计算,而并非上述例子中的剩余存储容量相对于总存储容量的占比。In another example, when the total storage space of each cluster in a storage cluster array is not equal, the operation and maintenance personnel still expect the written files to be evenly distributed in each cluster, then the current remaining specific capacity of each cluster can be used to calculate, rather than the ratio of the remaining storage capacity to the total storage capacity in the above example.

在另一个例子中,本实施方式中的多个Ceph集群被部署在不同的机房,增强整个存储系统的可靠性,避免因物理因素导致的文件读写性能的突发性降低。例如:如当某个集群所在的机房发生网络阻塞时,即该集群的数据读写性能发生降低,则MySQL也可以根据用户设定的其他规则或公式来降低该集群的权重系数,以此保证用户在文件读写上能有良好的性能体验,同时也减轻了运维人员的维护负担。In another example, multiple Ceph clusters in this embodiment are deployed in different computer rooms to enhance the reliability of the entire storage system and avoid sudden reductions in file read and write performance caused by physical factors. For example, if the computer room where a certain cluster is located is blocked in the network, that is, the data read and write performance of the cluster decreases, MySQL can also reduce the weight coefficient of the cluster according to other rules or formulas set by the user, so as to ensure Users can have a good performance experience in file reading and writing, and it also reduces the maintenance burden of operation and maintenance personnel.

步骤103,根据各集群的权重系数,利用随机分布算法确定用于写入文件的目标集群。Step 103, according to the weight coefficients of each cluster, a random distribution algorithm is used to determine the target cluster for writing the file.

步骤104,将文件写入目标集群中。Step 104, write the file into the target cluster.

在本实施例中,上述随机分布算法为哈希算法。首先预先设置一个哈希值表,其中包括哈希值与节点两类数据,其中每一个哈希值与一个节点相对应。基于前述的例子,每一个集群又对应不同数量的节点,其中节点的数量与集群权重系数的大小成正比。假设比例系数为100,则集群1对应于40个节点、集群2对应于40个节点、集群3对应于20个节点。若有一个新增集群时,则直接增加100个对应于该新集群的节点。该算法以文件ID为输入参数,得到一个哈希值,然后通过查表的方式得到一个集群编号,并将其作为输出结果。以这种基于随机分布算法结合加权算法的方式,能够在整体上保证写入文件在存储空间上的分布能够符合各集群的权重系数。In this embodiment, the aforementioned random distribution algorithm is a hash algorithm. First, a hash value table is preset, which includes two types of data, hash value and node, where each hash value corresponds to a node. Based on the foregoing examples, each cluster corresponds to a different number of nodes, where the number of nodes is proportional to the size of the cluster weight coefficient. Assuming a scaling factor of 100, cluster 1 corresponds to 40 nodes, cluster 2 corresponds to 40 nodes, and cluster 3 corresponds to 20 nodes. If there is a new cluster, directly add 100 nodes corresponding to the new cluster. The algorithm takes the file ID as an input parameter, obtains a hash value, and then obtains a cluster number through table lookup, and takes it as the output result. In this way based on the random distribution algorithm combined with the weighting algorithm, it can be guaranteed that the distribution of written files in the storage space can conform to the weight coefficient of each cluster as a whole.

具体地说,向上述哈希算法中输入文件写入请求中的文件ID,算法返回一个地址,该地址为用于写入文件的集群的编号,然后建立客户端与Ceph集群间的数据通道,将文件传输并写入对应的集群中。由于哈希算法为一种随机算法,因此MySQL必须实时地监控各集群的存储容量并动态地为各集群生成权重系数,以保证文件在写入存储介质时的性能,同时保障各集群的容量足以容纳下该文件。Specifically, input the file ID in the file writing request to the above hash algorithm, and the algorithm returns an address, which is the number of the cluster used to write the file, and then establishes a data channel between the client and the Ceph cluster, Transfer and write the file to the corresponding cluster. Since the hash algorithm is a random algorithm, MySQL must monitor the storage capacity of each cluster in real time and dynamically generate weight coefficients for each cluster to ensure the performance of files when they are written to the storage medium, and at the same time ensure that the capacity of each cluster is sufficient Hold the file.

需要说明的是,本实施方式中的上述各示例均为方便理解进行的举例说明,并不对本发明的技术方案构成限定。It should be noted that, the above-mentioned examples in this embodiment are illustrations for easy understanding, and do not limit the technical solution of the present invention.

与现有技术相比,本实施方式将多个Ceph分布式存储集群作为一个整体形成Ceph分布式存储集群阵列,可以避免单个Ceph分布式存储集群存储资源不足时,可能无法进行存储资源扩容的问题,同时,由于无需对单个Ceph分布式存储集群进行扩容,使得不需要对单个Ceph分布式存储集群内部进行数据再平衡操作,达到了存储资源扩容用户不可感知的效果,提高了用户的使用体验;并且,由于预先为每个Ceph分布式存储集群设置了加权权重,使得进行文件写入时,只需要根据文件ID和每个所述Ceph分布式存储集群的加权权重进行哈希计算,来获取文件的具体写入地址,进一步提高了该分布式存储系统的响应速度和文件在集群中分布的均衡性。Compared with the prior art, this embodiment forms a Ceph distributed storage cluster array with multiple Ceph distributed storage clusters as a whole, which can avoid the problem that storage resource expansion may not be possible when a single Ceph distributed storage cluster has insufficient storage resources At the same time, since there is no need to expand the capacity of a single Ceph distributed storage cluster, there is no need to perform data rebalancing operations within a single Ceph distributed storage cluster, which achieves the effect of storage resource expansion that is not perceived by users, and improves the user experience; And, since the weighted weight is set for each Ceph distributed storage cluster in advance, when writing the file, only need to perform hash calculation according to the file ID and the weighted weight of each Ceph distributed storage cluster to obtain the file The specific write address of the distributed storage system further improves the response speed of the distributed storage system and the balance of file distribution in the cluster.

本发明的第二实施方式涉及一种分布式存储方法,流程如图2所示,包括:The second embodiment of the present invention relates to a distributed storage method, the process is shown in Figure 2, including:

步骤201,接收文件的写入请求。该步骤与本发明第一实施方式中的步骤101类似,在此不再赘述。Step 201, receiving a file write request. This step is similar to step 101 in the first embodiment of the present invention, and will not be repeated here.

步骤202,读取文件的大小。Step 202, read the size of the file.

步骤203,获取存储集群阵列中各集群的用户权重系数。Step 203, acquiring the user weight coefficients of each cluster in the storage cluster array.

在本实施方式中,运维人员可以根据实际情况的需要来设定各集群的权重系数并保存在MySQL中。当MySQL中同时存在MySQL自身生成的权重系数和用户设定的用户权重系数时,代理服务器获取用户权重系数作为算法使用的权重系数。In this embodiment, the operation and maintenance personnel can set the weight coefficients of each cluster according to the needs of the actual situation and save them in MySQL. When there are both the weight coefficient generated by MySQL itself and the user weight coefficient set by the user in MySQL, the proxy server obtains the user weight coefficient as the weight coefficient used by the algorithm.

在实际应用中,用户上传至云端的文件体积差异较大,大文件的体积可能是小文件的体积的数百倍。由于各个集群是由较小容量的存储介质搭建而成,可能某一个集群的剩余存储容量不足以保存一个大体积的文件,因此需要设定合适的策略来保证存储空间能够得到合理的运用,此时用户就可以设定多个方案的权重系数来适应各种大小的文件。比如当集群1、2、3中集群1的剩余空间容量较低时,则在接收到一个大文件的写入请求时,采用针对大体积文件的权重系数方案。In practical applications, the volume of files uploaded by users to the cloud varies widely, and the volume of large files may be hundreds of times that of small files. Since each cluster is built from a storage medium with a small capacity, the remaining storage capacity of a certain cluster may not be enough to store a large file. Therefore, it is necessary to set an appropriate strategy to ensure that the storage space can be used reasonably. At this time, the user can set the weight coefficients of multiple schemes to adapt to files of various sizes. For example, when the remaining space capacity of cluster 1 in clusters 1, 2, and 3 is low, when a write request for a large file is received, the weight coefficient scheme for large-volume files is adopted.

具体的说,首先设定针对大体积文件的集群权重系数方案,即:将无法保存大文件的集群的权重系数设置为0,再合理的将权重系数以常规规则分配至容量足以储存该文件的各集群。针对小体积文件的权重系数方案,即:正常地根据各集群的存储资源将权重系数分配至所有集群。Specifically, first set the cluster weight coefficient scheme for large-volume files, that is: set the weight coefficients of clusters that cannot save large files to 0, and then reasonably assign the weight coefficients to the clusters with sufficient capacity to store the file according to conventional rules. each cluster. The weight coefficient scheme for small-volume files, namely: normally assign the weight coefficient to all clusters according to the storage resources of each cluster.

上述例子仅仅将文件的体积大小分为了两种情况,在实际应用中,可以依据文件体积将文件分为N类,其中N为大于2的自然数,以此来使得存储容量的使用更为合理。当代理服务器接收到一个文件写入请求时,首先获取该文件的体积大小,根据体积大小来获取合适的权重系数方案。The above example only divides the size of files into two cases. In practical applications, files can be divided into N categories according to file size, where N is a natural number greater than 2, so as to make the use of storage capacity more reasonable. When the proxy server receives a file writing request, it first obtains the volume of the file, and obtains an appropriate weight coefficient scheme according to the volume.

在另一个例子中,存储介质的读写速率会因其空间的使用率增高而降低,因此运维人员可以根据实际情况来为各存储集群设定一个空间使用率的阈值,当某一集群空间使用率达到阈值时,则自动的将该集群的权重系数设置为0,来保证每个集群的读写速率都保持在一个用户可以接受的读写速率水平之上。In another example, the read/write rate of the storage medium will decrease due to the increase of space usage, so the operation and maintenance personnel can set a threshold of space usage for each storage cluster according to the actual situation. When a cluster space When the utilization rate reaches the threshold, the weight coefficient of the cluster is automatically set to 0 to ensure that the read and write rate of each cluster is kept above a user-acceptable read and write rate level.

在另一个例子中,当存储集群阵列因容量扩展而加入了新的集群时,运维人员可以将旧的集群的权重系数全部设为0,然后将权重系数平均地分配至各新集群,以此快速的达到数据均衡地存储在各集群之中的目的,从而使得存储集群阵列中的存储空间在利用上更加合理。In another example, when a new cluster is added to the storage cluster array due to capacity expansion, the operation and maintenance personnel can set all the weight coefficients of the old cluster to 0, and then distribute the weight coefficients evenly to each new cluster to This quickly achieves the purpose of evenly storing data in each cluster, thereby making the storage space in the storage cluster array more reasonable in utilization.

步骤204,根据各集群的权重系数,利用随机分布算法确定用于写入文件的目标集群。该步骤与本发明第一实施方式中步骤103类似,在此不再赘述。Step 204, according to the weight coefficients of each cluster, a random distribution algorithm is used to determine the target cluster for writing the file. This step is similar to step 103 in the first embodiment of the present invention, and will not be repeated here.

步骤205,将文件写入目标集群中。该步骤与本发明第一实施方式中步骤104类似,在此不再赘述。Step 205, write the file into the target cluster. This step is similar to step 104 in the first embodiment of the present invention, and will not be repeated here.

上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。The step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.

与现有技术相比,本实施方式中运维人员可以根据实际情况来设定不同方案地权重系数,对当前需要存储的文件的分布进行自定义的设置,使得存储空间的利用更加的灵活,能够应对更多的场景,提高用户性能体验。Compared with the existing technology, in this embodiment, the operation and maintenance personnel can set the weight coefficients of different schemes according to the actual situation, and customize the distribution of the files that need to be stored currently, so that the utilization of storage space is more flexible. It can handle more scenarios and improve user performance experience.

本发明第三实施方式涉及一种分布式存储方法,流程如图3所示,包括:The third embodiment of the present invention relates to a distributed storage method, the process is shown in Figure 3, including:

步骤301,接收文件的写入请求。该步骤与本发明第一实施方式中的步骤101类似,在此不再赘述。Step 301, receiving a file write request. This step is similar to step 101 in the first embodiment of the present invention, and will not be repeated here.

步骤302,获取存储集群阵列中各集群的权重系数。该步骤与本发明第一实施方式中的步骤102类似,在此不再赘述。Step 302, acquiring weight coefficients of each cluster in the storage cluster array. This step is similar to step 102 in the first embodiment of the present invention, and will not be repeated here.

步骤303,根据各集群的权重系数,利用随机分布算法确定用于写入文件的目标集群。该步骤与本发明第一实施方式中的步骤103类似,在此不再赘述。Step 303, according to the weight coefficients of each cluster, a random distribution algorithm is used to determine the target cluster for writing the file. This step is similar to step 103 in the first embodiment of the present invention, and will not be repeated here.

步骤304,向目标集群发送写入请求,判断目标集群是否正常。若是,执行步骤305;若否,执行步骤306。Step 304, sending a write request to the target cluster, and judging whether the target cluster is normal. If yes, go to step 305; if not, go to step 306.

步骤305,将文件写入目标集群中;步骤306,将文件写入备集群阵列中与目标集群对应的备集群。Step 305, write the file into the target cluster; Step 306, write the file into the standby cluster corresponding to the target cluster in the standby cluster array.

具体的说,如图4所示,整个存储集群阵列包括一个主集群阵列和一个备集群阵列。其中主集群阵列中的每一个集群在备集群阵列中都有一个与之相对应的备集群。当代理服务器利用算法通过需要写入的文件的ID确定了目标集群的集群标识后,首先向该集群标识对应的主集群阵列中的集群发送写入请求,然后等待集群阵列管理服务器返回的集群状态信息,若该集群当前正常运行,则将用户上传的文件写入至该集群中。若该集群当前处于异常状态,则向该集群对应的备集群发送文件写入请求,然后将用户上传的文件写入至该集群对应的备集群中。Specifically, as shown in FIG. 4 , the entire storage cluster array includes a primary cluster array and a standby cluster array. Each cluster in the primary cluster array has a corresponding standby cluster in the standby cluster array. When the proxy server uses the algorithm to determine the cluster ID of the target cluster through the ID of the file to be written, it first sends a write request to the cluster in the main cluster array corresponding to the cluster ID, and then waits for the cluster status returned by the cluster array management server information, if the cluster is currently running normally, write the file uploaded by the user to the cluster. If the cluster is currently in an abnormal state, send a file write request to the standby cluster corresponding to the cluster, and then write the file uploaded by the user to the standby cluster corresponding to the cluster.

在实际应用中,备集群阵列不仅在文件写入时可以作为应急备案,在用户需要读取云端文件时,也可以作为主集群阵列的镜像来使用。In practical applications, the standby cluster array can not only be used as an emergency record when files are written, but also can be used as a mirror image of the primary cluster array when users need to read cloud files.

具体的说,当文件写入至目标集群后,当各集群处于空闲状态,也就是各集群的数据吞吐量处于较低水平时,集群阵列管理服务器会将文件拷贝一个镜像保存至上述目标集群对应的备集群中。以此在不影响用户读写体验的情况下,提高云端文件的可靠性。当某一集群发生故障无法读取时,用户可以读取存储在备集群中的文件镜像。Specifically, after the file is written to the target cluster, when the clusters are idle, that is, when the data throughput of each cluster is at a low level, the cluster array management server will copy a file and save it to the corresponding target cluster. in the standby cluster. In this way, the reliability of cloud files can be improved without affecting the user's reading and writing experience. When a cluster fails to read, users can read the file mirror stored in the standby cluster.

步骤307,生成由文件ID和集群标识构成的索引信息,并将索引信息保存至索引集群中。Step 307, generating index information composed of a file ID and a cluster identifier, and storing the index information in the index cluster.

具体的说,存储系统中设置了一个存储介质全部为高速存储介质的索引集群,高速存储介质包括:固态驱动器SSD、动态随机存取存储器。当代理服务器接收到用户读取云端文件的请求时,首先根据文件ID在索引集群中查询对应的集群标识,然后再将文件数据传输至用户的客户端。由于索引集群中的存储介质全部为高速存储介质,因此这个查询的过程可以控制在一个极短的时间内,给用户提供更好地读取体验。同时,运维人员在对该分布式存储系统进行维护时,也能够更加迅速的获取到各个文件具体地存储地址,从而提高运维效率。Specifically, an index cluster in which all storage media are high-speed storage media is set in the storage system, and the high-speed storage media include: solid-state drive SSD, dynamic random access memory. When the proxy server receives a user's request to read a cloud file, it first searches the index cluster for the corresponding cluster ID according to the file ID, and then transmits the file data to the user's client. Since the storage media in the index cluster are all high-speed storage media, the query process can be controlled in a very short time, providing users with a better reading experience. At the same time, when the operation and maintenance personnel maintain the distributed storage system, they can also obtain the specific storage address of each file more quickly, thereby improving the operation and maintenance efficiency.

与现有技术相比,本实施方式中,通过在分布式存储系统中设置一个备集群阵列和一个索引集群,既可以提高文件长期存储的可靠性,用户在读取文件时的性能体验,同时也能够提高运维人员的运维效率。Compared with the existing technology, in this embodiment, by setting a backup cluster array and an index cluster in the distributed storage system, the reliability of long-term storage of files can be improved, and the performance experience of users when reading files can be improved. It can also improve the operation and maintenance efficiency of operation and maintenance personnel.

本发明第四实施方式涉及一种服务器,结构如图5所示,包括:The fourth embodiment of the present invention relates to a server, the structure of which is shown in Figure 5, including:

请求接收模块501,用于接收文件的读取或写入请求;A request receiving module 501, configured to receive a file read or write request;

计算模块502,用于获取存储集群阵列中各集群的权重系数并缓存,并根据该权重系数结合哈希算法确定用于写入文件的目标集群;The calculation module 502 is used to obtain and cache the weight coefficients of each cluster in the storage cluster array, and determine the target cluster for writing the file according to the weight coefficient combined with the hash algorithm;

在一个实施例中,计算模块在确定目标集群后,首先向目标集群发送写入请求,然后根据存储集群阵列的管理服务器的反馈来判断目标集群是否正常运行。当目标集群异常时,则向备集群阵列中与该目标集群对应的备集群发送写入请求。In one embodiment, after determining the target cluster, the computing module first sends a write request to the target cluster, and then judges whether the target cluster is running normally according to the feedback from the management server of the storage cluster array. When the target cluster is abnormal, a write request is sent to the standby cluster corresponding to the target cluster in the standby cluster array.

写入模块503,用于将用户上传的文件写入至算法确定的目标集群中。The writing module 503 is used for writing the file uploaded by the user into the target cluster determined by the algorithm.

在一个实施例中,写入模块再将用户文件写入至目标集群后,根据该文件ID以及目标集群的集群编号,生成该文件的索引信息,并将索引信息保存至索引集群中。In one embodiment, after the writing module writes the user file into the target cluster, it generates index information of the file according to the file ID and the cluster number of the target cluster, and saves the index information in the index cluster.

与现有技术相比,本实施方式中的服务器通过哈希算法结合权重系数来确定文件的具体存储地址,使得文件的整体分布符合权重系数的分布,从而充分且合理的利用该分布式存储系统的空间容量,避免了因数据重新分布而导致存储集群读写性能的降低,提高用户对云文件读写的性能体验。Compared with the existing technology, the server in this embodiment determines the specific storage address of the file through the hash algorithm combined with the weight coefficient, so that the overall distribution of the file conforms to the distribution of the weight coefficient, thereby fully and reasonably utilizing the distributed storage system The space capacity avoids the degradation of storage cluster read and write performance due to data redistribution, and improves the performance experience of users reading and writing cloud files.

值得一提的是,本实施方式中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本发明的创新部分,本实施方式中并没有将与解决本发明所提出的技术问题关系不太密切的单元引入,但这并不表明本实施方式中不存在其它的单元。It is worth mentioning that all the modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units. In addition, in order to highlight the innovative part of the present invention, units that are not closely related to solving the technical problems proposed by the present invention are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.

本发明第五实施方式涉及一种分布式存储系统,流程如图6所示,包括:The fifth embodiment of the present invention relates to a distributed storage system, the process of which is shown in Figure 6, including:

本发明第四实施方式涉及的服务器601;The server 601 involved in the fourth embodiment of the present invention;

由N个存储集群搭建的存储集群阵列602,用于存储用户上传的文件,其中N为大于1的自然数。The storage cluster array 602 constructed by N storage clusters is used to store files uploaded by users, where N is a natural number greater than 1.

在一个具体的例子中,存储集群阵列可以包括一个主集群阵列和一个备集群阵列,其中主集群阵列中的集群数量和备集群中的集群数量相等且分别对应。当服务器利用算法根据需要写入的文件的ID确定了目标集群的集群标识后,首先向该集群标识对应的主集群阵列中的集群发送写入请求,然后等待集群阵列管理服务器返回的集群状态信息,若该集群当前正常运行,则将用户上传的文件写入至该集群中。若该集群当前处于异常状态,则向该集群对应的备集群发送文件写入请求,然后将用户上传的文件写入至该集群对应的备集群中。In a specific example, the storage cluster array may include a primary cluster array and a standby cluster array, wherein the number of clusters in the primary cluster array is equal to and corresponds to the number of clusters in the standby cluster. When the server uses an algorithm to determine the cluster ID of the target cluster according to the ID of the file to be written, it first sends a write request to the cluster in the main cluster array corresponding to the cluster ID, and then waits for the cluster status information returned by the cluster array management server , if the cluster is currently running normally, write the file uploaded by the user to the cluster. If the cluster is currently in an abnormal state, send a file write request to the standby cluster corresponding to the cluster, and then write the file uploaded by the user to the standby cluster corresponding to the cluster.

在一个具体的例子中,备集群阵列不仅在文件写入时可以作为应急备案,在用户需要读取云端文件时,也可以作为主集群阵列的镜像来使用。当文件写入至目标集群后,当各集群处于空闲状态,也就是各集群的数据吞吐量处于较低水平时,集群阵列管理服务器会将文件拷贝一个镜像保存至上述目标集群对应的备集群中。以此在不影响用户读写体验的情况下,提高云端文件的可靠性。当某一集群发生故障无法读取时,用户可以读取存储在备集群中的文件镜像。In a specific example, the standby cluster array can not only be used as an emergency record when writing files, but also can be used as a mirror image of the primary cluster array when users need to read cloud files. After the file is written to the target cluster, when each cluster is idle, that is, when the data throughput of each cluster is at a low level, the cluster array management server will copy a mirror image of the file and save it to the standby cluster corresponding to the above target cluster . In this way, the reliability of cloud files can be improved without affecting the user's reading and writing experience. When a cluster fails to read, users can read the file mirror stored in the standby cluster.

索引集群603,用于存储文件的索引信息。The index cluster 603 is used to store index information of files.

具体的说,该索引集群的存储介质全部为高速存储介质,高速存储介质包括:固态驱动器SSD、动态随机存取存储器。当代理服务器接收到用户读取云端文件的请求时,首先根据文件ID在索引集群中查询对应的集群标识,然后再将文件数据传输至用户的客户端。由于索引集群中的存储介质全部为高速存储介质,因此这个查询的过程可以控制在一个极短的时间内,给用户提供更好地读取体验。同时,运维人员在对该分布式存储系统进行维护时,也能够更加迅速的获取到各个文件具体地存储地址,从而提高运维效率。Specifically, the storage media of the index cluster are all high-speed storage media, and the high-speed storage media include: a solid-state drive SSD and a dynamic random access memory. When the proxy server receives a user's request to read a cloud file, it first searches the index cluster for the corresponding cluster ID according to the file ID, and then transmits the file data to the user's client. Since the storage media in the index cluster are all high-speed storage media, the query process can be controlled in a very short time, providing users with a better reading experience. At the same time, when the operation and maintenance personnel maintain the distributed storage system, they can also obtain the specific storage address of each file more quickly, thereby improving the operation and maintenance efficiency.

数据库管理系统604,用于存储并管理存储集群阵列中各集群的容量使用信息以及根据容量使用信息动态计算并存储默认权重系数。The database management system 604 is configured to store and manage capacity usage information of each cluster in the storage cluster array, and dynamically calculate and store a default weight coefficient according to the capacity usage information.

在一个实施例中,该数据库管理系统为MySQL,运维人员可以根据实际情况的需要来设定各集群的权重系数并保存在MySQL中。当MySQL中同时存在MySQL自身生成的权重系数和用户设定的用户权重系数时,服务器优先获取用户权重系数作为算法使用的权重系数。In one embodiment, the database management system is MySQL, and operation and maintenance personnel can set the weight coefficients of each cluster according to actual needs and save them in MySQL. When MySQL has both the weight coefficient generated by MySQL itself and the user weight coefficient set by the user, the server preferentially obtains the user weight coefficient as the weight coefficient used by the algorithm.

与现有技术相比,本实施方式将多个Ceph分布式存储集群作为一个整体形成Ceph分布式存储集群阵列,可以避免单个Ceph分布式存储集群存储资源不足时,可能无法进行存储资源扩容的问题,同时,由于无需对单个Ceph分布式存储集群进行扩容,使得不需要对单个Ceph分布式存储集群内部进行数据再平衡操作,达到了存储资源扩容用户不可感知的效果。同时,当集群当前的存储空间使用情况并不均衡时,为各集群设定不同的权重系数,就可以使得接下来的文件写入后,各集群文件数据达到均衡分布。Compared with the prior art, this embodiment forms a Ceph distributed storage cluster array with multiple Ceph distributed storage clusters as a whole, which can avoid the problem that storage resource expansion may not be possible when a single Ceph distributed storage cluster has insufficient storage resources At the same time, since there is no need to expand the capacity of a single Ceph distributed storage cluster, there is no need to perform data rebalancing operations within a single Ceph distributed storage cluster, achieving the effect of storage resource expansion that is not perceived by users. At the same time, when the current storage space usage of the cluster is not balanced, setting different weight coefficients for each cluster can make the file data of each cluster reach a balanced distribution after the next file is written.

不难发现,本实施方式为与第一实施方式、第二实施方式以及第三实施方式相对应的系统实施例,本实施方式可与第一实施方式、第二实施方式以及第三实施方式互相配合实施。第一实施方式、第二实施方式以及第三实施方式中提到的相关技术细节在本实施方式中依然有效,为了减少重复,这里不再赘述。相应地,本实施方式中提到的相关技术细节也可应用在第一实施方式、第二实施方式以及第三实施方式中。It is not difficult to find that this embodiment is a system embodiment corresponding to the first embodiment, the second embodiment and the third embodiment, and this embodiment can be mutually compatible with the first embodiment, the second embodiment and the third embodiment Cooperate with implementation. The relevant technical details mentioned in the first embodiment, the second embodiment, and the third embodiment are still valid in this embodiment, and will not be repeated here to reduce repetition. Correspondingly, the relevant technical details mentioned in this implementation manner can also be applied in the first implementation manner, the second implementation manner and the third implementation manner.

值得一提的是,本实施方式中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本发明的创新部分,本实施方式中并没有将与解决本发明所提出的技术问题关系不太密切的单元引入,但这并不表明本实施方式中不存在其它的单元。It is worth mentioning that all the modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units. In addition, in order to highlight the innovative part of the present invention, units that are not closely related to solving the technical problems proposed by the present invention are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.

本发明第六实施方式涉及一种电子设备,如图7所示,包括至少一个处理器701;以及,与至少一个处理器701通信连接的存储器702;其中,存储器702存储有可被至少一个处理器701执行的指令,指令被至少一个处理器701执行,以使至少一个处理器701能够执行第一或第二或第三实施方式中的分布式存储方法。The sixth embodiment of the present invention relates to an electronic device. As shown in FIG. 7 , it includes at least one processor 701; and a memory 702 communicatively connected to at least one processor 701; The instructions executed by the processor 701 are executed by at least one processor 701, so that the at least one processor 701 can execute the distributed storage method in the first or second or third implementation manner.

其中,存储器702和处理器701采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器701和存储器702的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器701处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器701。处理器701负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器702可以被用于存储处理器701在执行操作时所使用的数据。Wherein, the memory 702 and the processor 701 are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors 701 and various circuits of the memory 702 together. The bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein. The bus interface provides an interface between the bus and the transceivers. A transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium. The data processed by the processor 701 is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor 701 . The processor 701 is responsible for managing the bus and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management and other control functions. And the memory 702 may be used to store data used by the processor 701 when performing operations.

本发明第七实施方式涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。The seventh embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The above method embodiments are implemented when the computer program is executed by the processor.

即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-OnlyMemory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。That is, those skilled in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, the program is stored in a storage medium, and includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, and other media that can store program codes.

本领域的普通技术人员可以理解,上述各实施方式是实现本发明的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本发明的精神和范围。Those of ordinary skill in the art can understand that the above-mentioned embodiments are specific examples for realizing the present invention, and in practical applications, various changes can be made to it in form and details without departing from the spirit and spirit of the present invention. scope.

Claims (10)

1. A distributed storage method, characterized in that, an application server includes:
receiving a write request of a file;
acquiring a weight coefficient of each cluster in a storage cluster array; the weight coefficient is generated according to the current storage resource of the cluster;
determining a target cluster for writing the file according to the weight coefficient of each cluster;
and writing the file into the target cluster.
2. The distributed storage method of claim 1,
the weight coefficients comprise default weight coefficients;
the default weight coefficient is generated by a database management system in real time according to the current residual storage capacity of the cluster and the network speed of a machine room where the cluster is located;
the default weight factor is positively correlated with the remaining storage capacity or the network speed.
3. The distributed storage method of claim 2,
the weight coefficients further comprise user weight coefficients; the user weight coefficient is set by a user according to actual conditions and is stored in the database management system;
when the weight coefficient of each cluster in the storage cluster array is obtained, judging whether the user weight coefficient exists or not;
and if so, acquiring the user weight coefficient.
4. The distributed storage method of claim 1,
the storage cluster array comprises a main cluster array and a standby cluster array;
each cluster in the main cluster array corresponds to each cluster in the standby cluster array;
the target cluster is a cluster in the main cluster array;
and when the target cluster fails, writing the file into a cluster in the standby cluster array corresponding to the target cluster.
5. The distributed storage method according to any one of claims 1 to 4, comprising, after said writing the file in the target cluster:
generating index information consisting of the identification of the file and the identification of the target cluster;
storing the index information into an index cluster; wherein the index cluster is built by a high-speed storage medium.
6. A server, comprising:
the request receiving module is used for receiving a writing request of a file;
the computing module is used for acquiring a weight coefficient of each cluster in the storage cluster array and determining the target cluster for writing the file according to the weight coefficient;
and the writing module is used for writing the file into the target cluster.
7. A distributed storage system, comprising:
the storage cluster array comprises N clusters, wherein N is a natural number greater than 1;
the server of claim 6, configured to store the user's uploaded file in a cluster of the storage cluster array.
8. The distributed storage system according to claim 7, further comprising:
the index cluster is used for storing index information of the files in the storage cluster array;
and the database management system is used for storing and managing the current storage resource information of each cluster in the storage cluster array and dynamically calculating and storing the default weight coefficient according to the storage resource information.
9. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to perform the distributed storage method of any of claims 1 to 5.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the distributed storage method of any one of claims 1 to 5.
CN201910857800.8A 2019-09-11 2019-09-11 Distributed storage method, server, system, electronic device and storage medium Pending CN110636122A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910857800.8A CN110636122A (en) 2019-09-11 2019-09-11 Distributed storage method, server, system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910857800.8A CN110636122A (en) 2019-09-11 2019-09-11 Distributed storage method, server, system, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN110636122A true CN110636122A (en) 2019-12-31

Family

ID=68971036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910857800.8A Pending CN110636122A (en) 2019-09-11 2019-09-11 Distributed storage method, server, system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN110636122A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111131457A (en) * 2019-12-25 2020-05-08 上海交通大学 A capacity and bandwidth compromise method and system for heterogeneous distributed storage
CN111562884A (en) * 2020-04-28 2020-08-21 北京奇艺世纪科技有限公司 Data storage method and device and electronic equipment
CN111736762A (en) * 2020-05-21 2020-10-02 平安国际智慧城市科技股份有限公司 Synchronous updating method, device, equipment and storage medium of data storage network
CN111767250A (en) * 2020-06-10 2020-10-13 钛星投资(深圳)有限公司 Decentralized storage method, download method and storage system
CN112637327A (en) * 2020-12-21 2021-04-09 北京奇艺世纪科技有限公司 Data processing method, device and system
CN113110796A (en) * 2020-01-13 2021-07-13 顺丰科技有限公司 Data management method, device, server and storage medium
CN113721855A (en) * 2021-09-01 2021-11-30 中国建设银行股份有限公司 Storage method and device of storage resources, electronic equipment and computer storage medium
CN113986136A (en) * 2021-10-28 2022-01-28 中国建设银行股份有限公司 Data file splitting method and main cluster device
CN114089917A (en) * 2021-11-19 2022-02-25 中国电信集团系统集成有限责任公司 Distributed object storage cluster, capacity expansion method and device thereof, and electronic equipment
CN114254791A (en) * 2020-09-23 2022-03-29 新智数字科技有限公司 Method and device for predicting oxygen content of flue gas
CN115687250A (en) * 2021-07-21 2023-02-03 中移(苏州)软件技术有限公司 A storage method, device, system and computer storage medium
CN116170275A (en) * 2022-12-30 2023-05-26 中国联合网络通信集团有限公司 A cloud network operation and maintenance management method and device
CN116418826A (en) * 2022-11-25 2023-07-11 中移(苏州)软件技术有限公司 Object storage system expansion method, device, system and computer equipment
CN118409717A (en) * 2024-07-03 2024-07-30 济南浪潮数据技术有限公司 A data distribution method, system, computer program product, device and medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100023621A1 (en) * 2008-07-24 2010-01-28 Netapp, Inc. Load-derived probability-based domain name service in a network storage cluster
CN101997884A (en) * 2009-08-18 2011-03-30 升东网络科技发展(上海)有限公司 Distributed storage system and method
CN106527981A (en) * 2016-10-31 2017-03-22 华中科技大学 Configuration-based data fragmentation method for adaptive distributed storage system
CN108011929A (en) * 2017-11-14 2018-05-08 平安科技(深圳)有限公司 Data request processing method, apparatus, computer equipment and storage medium
CN108600316A (en) * 2018-03-23 2018-09-28 深圳市网心科技有限公司 Data managing method, system and the equipment of cloud storage service
CN108614837A (en) * 2016-12-13 2018-10-02 杭州海康威视数字技术股份有限公司 File stores and the method and device of retrieval
CN108763436A (en) * 2018-05-25 2018-11-06 福州大学 A kind of distributed data-storage system based on ElasticSearch and HBase
CN108875035A (en) * 2018-06-25 2018-11-23 郑州云海信息技术有限公司 The date storage method and relevant device of distributed file system
CN109343801A (en) * 2018-10-23 2019-02-15 深圳前海微众银行股份有限公司 Data storage method, device, and computer-readable storage medium
CN109597567A (en) * 2017-09-30 2019-04-09 网宿科技股份有限公司 A kind of data processing method and device
CN110109886A (en) * 2018-02-01 2019-08-09 中兴通讯股份有限公司 The file memory method and distributed file system of distributed file system

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100023621A1 (en) * 2008-07-24 2010-01-28 Netapp, Inc. Load-derived probability-based domain name service in a network storage cluster
CN101997884A (en) * 2009-08-18 2011-03-30 升东网络科技发展(上海)有限公司 Distributed storage system and method
CN106527981A (en) * 2016-10-31 2017-03-22 华中科技大学 Configuration-based data fragmentation method for adaptive distributed storage system
CN108614837A (en) * 2016-12-13 2018-10-02 杭州海康威视数字技术股份有限公司 File stores and the method and device of retrieval
CN109597567A (en) * 2017-09-30 2019-04-09 网宿科技股份有限公司 A kind of data processing method and device
CN108011929A (en) * 2017-11-14 2018-05-08 平安科技(深圳)有限公司 Data request processing method, apparatus, computer equipment and storage medium
CN110109886A (en) * 2018-02-01 2019-08-09 中兴通讯股份有限公司 The file memory method and distributed file system of distributed file system
CN108600316A (en) * 2018-03-23 2018-09-28 深圳市网心科技有限公司 Data managing method, system and the equipment of cloud storage service
CN108763436A (en) * 2018-05-25 2018-11-06 福州大学 A kind of distributed data-storage system based on ElasticSearch and HBase
CN108875035A (en) * 2018-06-25 2018-11-23 郑州云海信息技术有限公司 The date storage method and relevant device of distributed file system
CN109343801A (en) * 2018-10-23 2019-02-15 深圳前海微众银行股份有限公司 Data storage method, device, and computer-readable storage medium

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111131457A (en) * 2019-12-25 2020-05-08 上海交通大学 A capacity and bandwidth compromise method and system for heterogeneous distributed storage
CN113110796A (en) * 2020-01-13 2021-07-13 顺丰科技有限公司 Data management method, device, server and storage medium
CN111562884A (en) * 2020-04-28 2020-08-21 北京奇艺世纪科技有限公司 Data storage method and device and electronic equipment
CN111562884B (en) * 2020-04-28 2023-10-27 北京奇艺世纪科技有限公司 Data storage method and device and electronic equipment
CN111736762A (en) * 2020-05-21 2020-10-02 平安国际智慧城市科技股份有限公司 Synchronous updating method, device, equipment and storage medium of data storage network
CN111736762B (en) * 2020-05-21 2023-04-07 平安国际智慧城市科技股份有限公司 Synchronous updating method, device, equipment and storage medium of data storage network
CN111767250A (en) * 2020-06-10 2020-10-13 钛星投资(深圳)有限公司 Decentralized storage method, download method and storage system
CN114254791A (en) * 2020-09-23 2022-03-29 新智数字科技有限公司 Method and device for predicting oxygen content of flue gas
CN114254791B (en) * 2020-09-23 2024-12-06 新奥新智科技有限公司 A method and device for predicting oxygen content in flue gas
CN112637327A (en) * 2020-12-21 2021-04-09 北京奇艺世纪科技有限公司 Data processing method, device and system
CN115687250A (en) * 2021-07-21 2023-02-03 中移(苏州)软件技术有限公司 A storage method, device, system and computer storage medium
CN113721855A (en) * 2021-09-01 2021-11-30 中国建设银行股份有限公司 Storage method and device of storage resources, electronic equipment and computer storage medium
CN113986136A (en) * 2021-10-28 2022-01-28 中国建设银行股份有限公司 Data file splitting method and main cluster device
CN114089917A (en) * 2021-11-19 2022-02-25 中国电信集团系统集成有限责任公司 Distributed object storage cluster, capacity expansion method and device thereof, and electronic equipment
CN116418826A (en) * 2022-11-25 2023-07-11 中移(苏州)软件技术有限公司 Object storage system expansion method, device, system and computer equipment
CN116170275A (en) * 2022-12-30 2023-05-26 中国联合网络通信集团有限公司 A cloud network operation and maintenance management method and device
CN118409717A (en) * 2024-07-03 2024-07-30 济南浪潮数据技术有限公司 A data distribution method, system, computer program product, device and medium
CN118409717B (en) * 2024-07-03 2024-10-11 济南浪潮数据技术有限公司 Data distribution method, system, computer program product, equipment and medium

Similar Documents

Publication Publication Date Title
CN110636122A (en) Distributed storage method, server, system, electronic device and storage medium
CN107590001B (en) Load balancing method and device, storage medium and electronic equipment
US9304815B1 (en) Dynamic replica failure detection and healing
CN101370030B (en) Resource load stabilization method based on contents duplication
CN102523234B (en) A kind of application server cluster implementation method and system
WO2011088767A1 (en) Content delivery method, system and schedule server
CN111600957A (en) File transmission method, device and system and electronic equipment
JP2012118987A (en) Computer implementation method, computer program, and system for memory usage query governor (memory usage query governor)
CN107770259A (en) Copy amount dynamic adjusting method based on file temperature and node load
CN113923216B (en) Distributed cluster current limiting system and method and distributed cluster node
CN114442912A (en) Method and apparatus for distributed data storage
CN113111038B (en) File storage method, device, server and storage medium
CN102012899A (en) Method, system and equipment for updating data
CN101471845A (en) Method for adjusting data block counterpart number and metadata server node
CN104767822A (en) Data storage method based on version
CN117407159A (en) Memory space management method and device, equipment and storage medium
WO2023045385A1 (en) Data processing method and related device
JP6035934B2 (en) Data store management device, data providing system, and data providing method
CN105208096A (en) Distributed cache system and method
CN111182011B (en) Service set distribution method and device
CN108287793A (en) The way to play for time and server of response message
CN114443262A (en) Computing resource management method, device, equipment and system
CN115499426B (en) Method, device, equipment and medium for transmitting massive small files
CN117527576A (en) Method, equipment and medium for adjusting static ECN waterline of RoCE
CN115904243A (en) A current limiting method, device, storage medium, and program product for internal IO

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191231

RJ01 Rejection of invention patent application after publication