[go: up one dir, main page]

HK1111488A - Distributed object-based storage system that stores virtualization maps in object attributes - Google Patents

Distributed object-based storage system that stores virtualization maps in object attributes Download PDF

Info

Publication number
HK1111488A
HK1111488A HK08101907.2A HK08101907A HK1111488A HK 1111488 A HK1111488 A HK 1111488A HK 08101907 A HK08101907 A HK 08101907A HK 1111488 A HK1111488 A HK 1111488A
Authority
HK
Hong Kong
Prior art keywords
file
storage devices
mapping
client
components
Prior art date
Application number
HK08101907.2A
Other languages
Chinese (zh)
Inventor
Marc Jonathan Unangst
Steven Andrew Moyer
Original Assignee
Panasas, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panasas, Inc. filed Critical Panasas, Inc.
Publication of HK1111488A publication Critical patent/HK1111488A/en

Links

Description

Distributed object-based storage system storing virtual mappings in object attributes
Technical Field
The present invention relates generally to data storage methods and, more particularly, to object-based methods in which a map of a file object is stored as at least one component attribute on an object storage device.
Background
As the dependency on electronics for data communication increases, different models are proposed to store large amounts of data efficiently and economically. Data storage mechanisms not only require a sufficient amount of physical disk space to store data, but also require various levels of fault tolerance or redundancy (depending on the criticality of the data) to maintain data integrity in the event of one or more disk failures.
In conventional networked storage systems, a data storage device, such as a hard disk, is associated with a particular server or a particular server with a particular backup server. Thus, access to the data storage device can only be through the server associated with the data storage device. Therefore, a client processor that needs to access a data storage device will access the associated server over the network and the server will access the data storage device at the user's request. In contrast, in object-based storage systems, each object-based storage device communicates directly with clients over a network, possibly through routers and/or bridges. An example of an object-Based storage system is described in co-pending U.S. patent application No. 10/109998 entitled "Data File Generation from a Mirrored RAID to a Non-Mirrored XOR-Based RAIDWithout Rewriting the Data," filed on 29.3.2002, which is incorporated herein by reference in its entirety.
Existing object-based storage systems, such as the system described in co-pending application No. 10/109998, typically include a plurality of object-based storage devices for storing object components, a metadata server, and one or more clients that access distributed, object-based files on the object storage devices. In such systems, a client typically accesses a file object having multiple components on different object storage devices by requesting a mapping of the file object (i.e., a list of storage devices for the object on which the file object component resides) from a metadata server that may include a centralized mapping repository containing mappings for each file object in the system. Once the map is retrieved from the metadata server and provided to the client, the client retrieves the components of the requested file object by issuing access requests to the object storage devices identified in the map.
In existing object-based storage systems, such as the aforementioned systems, the centralized storage of the file object maps of the metadata server and the requirement that the metadata server retrieve the maps of the file objects before a client may access the file objects often results in a performance bottleneck. To eliminate performance bottlenecks and improve system performance, there is a need to provide an object-based storage system that disperses file object mapping storage from a metadata server.
Disclosure of Invention
The present invention relates to a distributed object-based storage system and method that includes a plurality of storage devices for storing object components, a metadata server coupled to each of the object storage devices, and one or more clients that access distributed object-based files on the object storage devices. In the present invention, a file object having multiple components on different object storage devices is accessed by sending a file access request for the file object from a client to the object storage devices. In response to a file access request, a mapping is located that includes a list of object storage devices where components of the requested file object reside. The mapping is stored on the object storage as at least one component object attribute and, in one embodiment, includes information about the organization of the components of the requested file object on the object storage on the list. The mapping is sent to clients that fetch components of the requested file object by issuing access requests to each of the object stores on the list.
In one embodiment, the mapping that is located in response to the file access request is never stored on the metadata server. Alternatively, the mapping may be retrieved from the object store, passed to the metadata server, and then submitted to the client.
In an embodiment, one or more redundant copies of the map are stored on different object storage devices. In this embodiment, each copy is stored as at least one component object attribute on one of the different object storage devices.
By storing the mapping as at least one component object on the object storage, the present invention obtains at least two advantages over the prior art: (1) loss of a metadata server does not result in loss of a mapping; and (2) object ownership can be transferred without moving data or metadata. In particular, component object properties that identify entities that are considered to own a component object can be updated without copying or otherwise moving data associated with the component object.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. Wherein:
FIG. 1 illustrates a typical network-based file storage system designed according to object-based secure disk (OBD); and
fig. 2 illustrates the decentralized storage of a map of a file object with multiple components on different OBDs according to the present invention.
Detailed Description
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. It is to be understood that the figures and descriptions of the present invention included in this application illustrate and describe elements that are particularly relevant to the present invention, while omitting other elements that may be found in a typical data storage system or network for the sake of clarity.
Fig. 1 illustrates a typical network-based file storage system designed according to an object-based secure disk (OBD) 20. The file storage system 100 is implemented by a combination of hardware and software elements and generally includes administrator software (briefly, "administrator") 10, an OBD 20, a client 30, and a metadata server 40. It is to be understood that each manager is application code or software running on a corresponding server, such as metadata server 40. Clients 30 may run different operating systems, providing an operating system integrated file system interface. Metadata stored on server 40 may include file and directory object attributes and directory object content; however, in the preferred embodiment, the attributes and directory object content are not stored on the metadata server 40. The term "metadata" does not generally refer to the underlying data itself, but rather to the attributes and information that describe the data.
Fig. 1 shows some OBDs 10 attached to a network 50. The OBD 10 is a physical disk drive that stores data files in the network-based system 100 and may have the following properties: (1) it provides an object-oriented interface (rather than a sector-oriented interface); (2) it is attached to a network (e.g., network 50) rather than a data bus or backplane (i.e., the OBD 10 may be considered a first-class network citizen); and (3) it executes a security module to prevent unauthorized access to data stored thereon.
The basic abstraction output by the OBD 10 is an object that can be defined as an ordered set of bits of varying size. In contrast to prior art block-based storage disks, OBDs do not output a sector interface at all during normal operation. Objects on an OBD may be created, removed, written, read, added, etc. OBDs do not generate any information visible about a particular disk geometry and take advantage of the high level of information that can be provided through the OBD's direct interface with the network 50 to internally implement all layout optimizations. In one embodiment, one or more OBD objects are used to store each data file and each file directory in file system 100. Due to the object-based storage of data files, file objects may generally be read, written, opened, closed, expanded, created, deleted, moved, arranged, merged, concatenated, named, renamed and include access restrictions. Each OBD 10 communicates directly with clients 30 on the network, possibly through routers and/or bridges. OBDs, clients, managers, etc. may be considered "nodes" on the network 50. In system 100, no assumptions need be made about the network topology, except that the various nodes should be able to contact other nodes in the system. Servers in network 50, such as metadata server 40, only enable and facilitate data transfer between the client and the OBD, but the servers typically do not perform such transfer.
In theory, the various system "agents" (i.e., administrator 10, OBD 20, and client 30) are network entities that work independently. The administrator 10 may provide daily services related to individual files and directories, and the administrator 10 may be responsible for all file and directory specific status. The administrator 10 creates, deletes, and sets attributes on the client side entity (i.e., file or directory). The administrator 10 also performs aggregation of OBDs for performance and fault tolerance. An "aggregate" object is an object that uses OBDs in parallel and/or in a redundant configuration and produces higher data availability and/or higher I/O performance. Aggregation is the process of distributing a single data file or file directory to multiple OBD objects for performance (parallel access) and/or fault tolerance (storing redundant data). The aggregation scheme associated with a particular object is stored on the OBD 20 as an attribute of that object. A system administrator (e.g., an operator or software) may select any aggregation scheme for a particular object. Files and directories may be aggregated. In one embodiment, a new file or directory inherits the aggregation scheme of its immediate parent directory by default. A change in the layout of an object may cause a change in the layout of its parent directory. Manager 10 may be allowed to make layout changes for load or capacity balancing purposes.
The administrator 10 may also allow clients to perform their own I/O to aggregate objects (which allows direct data flow between OBDs and clients), as well as provide proxy services if necessary. As previously described, each file and directory in the file system 100 may be represented by a unique OBD object. Manager 10 may also accurately determine how objects are to be laid out-i.e., on which OBD or OBDs the object is to be stored, whether the object is to be mirrored, stripped, parity protected, etc. Manager 10 may also provide an interface through which a user expresses minimal requirements for object storage (e.g., "objects may still be accessed after any OBD failure").
Each manager 10 may be a separate component in the sense that the manager 10 may be used for other file system configurations or data system structures. In one embodiment, the topology of system 100 includes a "file system layout" abstraction and a "storage system layout" abstraction. The files and directories in the system 100 may be considered part of the file system layer, while the data storage functionality (including the OBD 20) may be considered part of the storage system layer. In one topology model, the file system layer may be above the storage system layer.
A Storage Access Module (SAM) (not shown) is a program code module that can be compiled into managers and clients. The SAM includes an I/O execution engine that implements the simple I/O, mirroring and mapping retrieval algorithms discussed below. The SAM generates and orders the OBD level operations necessary to implement system level I/O operations for simple and aggregate objects.
Each manager 10 maintains global parameters, concepts of other managers being active or having failed, and provides support for up/down state transitions for other managers. A benefit of the present system is that the location information describing on which data storage device or devices the required data is stored (i.e., the OBD) may be located on multiple OBDs in the network. Therefore, the client 30 need only identify one of the plurality of OBDs that contain location information for the desired data to be able to access the data. Data may be returned directly from the OBD to the client without passing through the manager.
Fig. 2 illustrates the decentralized storage of a map 210 of a typical file object 200 having multiple components (e.g., components A, B, C and D) stored on different OBDs 20 in accordance with the present invention. In the example shown, the object-based storage system includes n OBDs (labeled OBD1, OBD2.. OBDn), components A, B, C and D of a typical file object 200 are stored on OBD1, OBD2, OBD3, and OBD4, respectively. The map 210 also includes a list 220 of object storage devices on which the components of the representative file object 200 reside. The map 210 is stored as at least one component object attribute on an object storage device (e.g., OBD1, OBD3, or both) and includes information about the organization of components of file objects on the object storage devices on the list. For example, the list 200 indicates that the first, second, third, and fourth components (i.e., components A, B, C and D) of the file object 200 are stored on the OBD1, OBD2, OBD3, and OBD4, respectively. In an embodiment shown, the OBDs 1 and 3 contain redundant copies of the map 210.
In the present invention, a typical file object 200 having multiple components on different object storage devices is accessed by issuing a file access request for the file object from a client 30 to the object storage device 20 (e.g., OBD 1). In response to a file access request, a map 210 (stored on a target storage device as at least one component object attribute) is located on the object storage device and sent to the requesting client 30 which retrieves the requested file object component by issuing an access request to each of the object storage devices listed in the map.
In a preferred embodiment, the metadata server 40 does not include a centralized repository of mappings. The map 210 may be retrieved from the OBD 20 and transmitted directly to the client 30. Alternatively, when the map 210 is fetched from the OBD 20, the map 210 may be sent to the metadata server 40 and directly transmitted to the client 30.
Although the metadata server 40 does not maintain a centralized repository of mappings 210, in one embodiment of the present invention, the metadata server 40 optionally includes information (or hints) identifying the OBDs in which the mappings 210 corresponding to a given file object may be located. In this embodiment, a client 30 attempting to access a given file object initially fetches a corresponding hint from the metadata server 40. The client 30 then sends a request to the OBD identified by the hint to retrieve the mapping 210. In the event that client 30 is unable to locate a requested mapping 210 on an OBD identified with a hint (i.e., the hint is erroneous), client 30 may send a request for the mapping to one or more other OBDs until the mapping is located. Once the mapping is located, to correct the erroneous clue, the client 30 optionally sends information identifying the OBD in which the mapping was found to the metadata server 40.
Further, a copy of the mapping hint may be stored on one or more OBDs other than the OBD in which the mapping 210 is stored as an attribute of a component object that does not have a stored mapping. This enables the client to access the map 210 without first accessing the administrator and eliminates the need for additional OBD calls in the event that the client initially requests one of the OBDs in which the map 210 is not sent. The client may also obtain the mapping hint from the metadata server, or may obtain the hint directly from the OBD, possibly as part of a directory or other index object.
Finally, it will be appreciated by those skilled in the art that changes could be made to the embodiments described above without departing from the broad inventive concept thereof. It is understood, therefore, that this invention is not limited to the particular embodiments disclosed, but it is intended to cover modifications within the spirit and scope of the present invention as defined by the appended claims.

Claims (6)

1. In a distributed object-based storage system including a plurality of object storage devices for storing object components, a metadata server coupled to each of the object storage devices, and one or more clients accessing distributed object-based files on the object storage devices, a method for accessing a file object having a plurality of components on different object storage devices, comprising:
issuing a file access request for a file object from a client to an object storage device;
in response to the file access request, locating a map comprising a list of object storage devices on which components of the requested file object reside, wherein the map is stored as at least one component object attribute on the object storage devices;
sending the mapping to a client; and
to retrieve the components of the requested file object, an access request is issued from the client to each of the object stores on the list.
2. The method of claim 1, wherein the mapping includes information about an organization of components of the requested file object on object stores on the list.
3. The method of claim 1, wherein the mapping is never stored on the metadata server.
4. The method of claim 1, wherein the mapping is retrieved from an object store, passed to the metadata server, and then submitted to the client.
5. The method of claim 1, wherein one or more redundant copies of the mapping are stored on different object storage devices, each copy being stored as at least one component object attribute on one of the different object storage devices.
6. In a distributed object-based storage system including a plurality of object storage devices for storing object components, a metadata server coupled to each of the object storage devices, and one or more clients accessing distributed object-based files on the object storage devices, a system for accessing file objects having a plurality of components on different object storage devices, comprising:
a client that issues a file access request to the object store for the file object;
wherein, in response to the file access request, the object storage device locates a mapping comprising a list of object storage devices on which components of the requested file object reside and sends the mapping to the client, wherein the mapping is stored as at least one component object attribute on object storage devices; and
to retrieve the components of the requested file object, an access request is issued from the client to each of the object stores on the list.
HK08101907.2A 2004-08-13 2005-08-04 Distributed object-based storage system that stores virtualization maps in object attributes HK1111488A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/918,200 2004-08-13

Publications (1)

Publication Number Publication Date
HK1111488A true HK1111488A (en) 2008-08-08

Family

ID=

Similar Documents

Publication Publication Date Title
CN101040282A (en) A distributed object-based storage system that stores virtual maps into object attributes
US7681072B1 (en) Systems and methods for facilitating file reconstruction and restoration in data storage systems where a RAID-X format is implemented at a file level within a plurality of storage devices
US7793146B1 (en) Methods for storing data in a data storage system where a RAID-X format or formats are implemented at a file level
US8301673B2 (en) System and method for performing distributed consistency verification of a clustered file system
US7734597B2 (en) System and method performing an on-line check of a file system
US7930275B2 (en) System and method for restoring and reconciling a single file from an active file system and a snapshot
US7036039B2 (en) Distributing manager failure-induced workload through the use of a manager-naming scheme
US7707193B2 (en) System and method for verifying and restoring the consistency of inode to pathname mappings in a filesystem
US7904649B2 (en) System and method for restriping data across a plurality of volumes
US7937453B1 (en) Scalable global namespace through referral redirection at the mapping layer
US7577817B2 (en) Storage virtualization system and methods
US7613724B1 (en) Metadirectory namespace and method for use of the same
US8209289B1 (en) Technique for accelerating the creation of a point in time representation of a virtual file system
EP1875385A1 (en) Storage system architecture for striping data container content across volumes of a cluster
JP2007503658A (en) Virus detection and alerts in shared read-only file systems
US20050278383A1 (en) Method and apparatus for keeping a file system client in a read-only name space of the file system
US20050234916A1 (en) Method, apparatus and program storage device for providing control to a networked storage architecture
US8095503B2 (en) Allowing client systems to interpret higher-revision data structures in storage systems
US20040015522A1 (en) Apparatus, system and method of providing a stackable private write file system
US7805412B1 (en) Systems and methods for parallel reconstruction of files and objects
US7882086B1 (en) Method and system for portset data management
HK1111488A (en) Distributed object-based storage system that stores virtualization maps in object attributes
US10915504B2 (en) Distributed object-based storage system that uses pointers stored as object attributes for object analysis and monitoring
Competa et al. Modern File Systems and Storage
Andreas Böhm Igor Praher Jakob 0155477 0056228 igor@ bytelabs. org jp@ hapra. at