CN119201850A

CN119201850A - Data access method, device and system

Info

Publication number: CN119201850A
Application number: CN202311791307.3A
Authority: CN
Inventors: 鹿智婷; 方新
Original assignee: Huawei Cloud Computing Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2023-06-25
Filing date: 2023-12-22
Publication date: 2024-12-27

Abstract

The present application provides a data access method and related equipment, which is applied to an object storage service node, the object storage service node includes a hierarchical namespace, the hierarchical namespace is used to map at least one object to a file or directory on a directory tree, the method includes: receiving a file creation request, the file creation request carries a first file name and a first file identification code of the parent directory of the first file; obtaining metadata of the parent directory of the first file corresponding to the first file identification code; according to the metadata of the parent directory of the first file, creating a first file under the parent directory, and assigning a second file identification code to the first file, wherein the first file is a file or directory on the directory tree; recording the correspondence between the second file identification code and the first file name in a file index table. In this way, users use file identification codes to access files or directories stored in the object storage service node without having to parse the full path of the file layer by layer, thereby improving user access efficiency.

Description

Data access method, device and system

The present application claims priority from the chinese patent application filed at 25 of 6.2023, with application number 202310752941.X, entitled "method, apparatus and other devices for achieving file semantics for object storage", the entire contents of which are incorporated herein by reference.

Technical Field

The present application relates to the field of storage technologies, and in particular, to a data access method, device, and system.

Background

File storage is the storage of data in the form of files on a file storage service node, which can be accessed by a user through a file interface provided by the file storage service node. Object storage is the storage of data in the form of objects on an object storage service node, which is accessed by a user through an object interface provided by the object storage service node.

Currently, in order to be compatible with an application (abbreviated as a file application) in which a part can access data only through a file interface, an object storage service node provides a file service based on object storage, that is, the file application can access data on the object storage service node through the file interface provided by the object storage service node. However, since the object storage service node identifies a file by using a full path of the file, when accessing the file stored in the object storage service node, the client of the file application also needs to use the full path of the file for access, resulting in lower access efficiency.

Disclosure of Invention

The application provides a data access method, a data access device and other equipment, wherein a file identification code is distributed to a file or a directory stored by an object storage service node through a file identification code distributor, and the file identification code is globally unique and is used for identifying the file or the directory. The user accesses the files or the catalogues stored by the object storage service node by using the file identification codes, so that the full path of the files does not need to be analyzed layer by layer, and the access efficiency of the user is improved.

The first aspect provides a data access method, which is applied to an object storage service node, wherein at least one object is stored in a storage barrel of the object storage service node, the object storage service node comprises a hierarchical naming space, the hierarchical naming space is used for mapping at least one object into a file or a directory on a directory tree, the method comprises the steps of receiving a file creation request, the file creation request carries a first file identification code of a first file name and a parent directory of the first file, acquiring metadata of the parent directory of the first file corresponding to the first file identification code, creating the first file under the parent directory according to the metadata of the parent directory of the first file, distributing a second file identification code for the first file, wherein the first file is the file or the directory on the directory tree, and recording the corresponding relation between the second file identification code and the first file name into a file index table.

In this embodiment, the object storage service node adds a file identifier allocator and a file index table, and when a user creates a file or a directory, the object storage service node generates a globally unique file identifier for the created file or directory through the file identifier allocator, where the file identifier is a basic attribute of the file or directory, and once created, the file identifier does not change. Meanwhile, when the file identification code is generated, the corresponding relation from the file identification code to the file information is recorded in the file index table, and through the file index table, a user can inquire the metadata information of the file or the catalog through the file identification code. Therefore, the user accesses the files or the catalogues stored by the object storage service node by using the file identification codes, the full path of the files does not need to be analyzed layer by layer, and the access efficiency of the user is improved.

With reference to the first aspect, in some implementations of the first aspect, a file writing request is received, the file writing request carries a first file to be written and a second file identification code, a first file name corresponding to the second file identification code is obtained from a file index table, the first file name is used for indicating metadata of the first file, metadata of the first file is obtained, the metadata of the first file comprises a data layout of the first file, the first file is written in a first storage unit according to the data layout of the first file, and the metadata of the first file is updated.

In the implementation mode, the file storage client uses the file identification code to write the file into the object storage service node, and the object storage service node acquires metadata information corresponding to the file identification code from the file index table, so that the file is written according to the data layout of the file, and therefore, the full path of the file does not need to be analyzed layer by layer, and the data access efficiency is improved.

With reference to the first aspect, in some implementations of the first aspect, the metadata of the first file includes one or more of a file size of the first file, a creation time of the first file, an access right of the first file, a data layout of the first file, where the data layout of the first file includes an identification of a first storage unit storing the first file, an offset of the first file in the first storage unit, and a length of the first file in the first storage unit.

With reference to the first aspect, in some implementations of the first aspect, a first file reading request is received, the first file reading request carries a second file identification code, a first file name corresponding to the second file identification code is obtained from a file index table, metadata of a first file is obtained, the metadata of the first file includes a data layout of the first file, and the first file is read from a first storage unit according to the data layout of the first file.

In the implementation manner, after the object storage service node writes the file, the user can directly carry the file identification code to read the data, acquire the metadata information of the file according to the file identification code, and read the file according to the data layout of the file, so that the full path of the file is not required to be analyzed layer by layer, and the data access efficiency is improved.

With reference to the first aspect, in some implementations of the first aspect, a second file reading request is received, the second file reading request carries an access path of the first file, the access path of the first file is analyzed through a hierarchical namespace to obtain a second file identification code of the first file, a first file name corresponding to the second file identification code is obtained from a file index table, metadata of the first file is obtained, and the first file is read from the first storage unit according to a data layout of the first file.

In this implementation manner, when the file storage client uses the file full path access object storage service node to read the file, the object storage service node can identify two access modes of the file identification code or the file full path, and still access the file in a mode compatible with the file full path, and analyze the file full path through the hierarchical namespace to obtain the file identification code of the file, and further read the file according to the file identification code.

With reference to the first aspect, in some implementations of the first aspect, a third file reading request is received, where the third file reading request includes a file handle, a second file identification code in the file handle is identified through a gateway, a first file name corresponding to the second file identification code is obtained from a file index table, metadata of the first file is obtained, and the first file is read from the first storage unit according to a data layout of the first file.

In the implementation mode, the file storage client reads the file from the object storage service node based on the file protocol, the conversion from the file protocol to the object protocol which can be identified by the object storage service node can be realized through the gateway, and even if the file client passes through the file protocol, the file storage client can still access the file stored by the object storage service node based on the file identification code, so that the data access efficiency is improved. In addition, the file identification code is used as the access identification, so that the file protocol is more easily compatible.

The second aspect provides a data access method, which is applied to a file storage client, wherein the file storage client accesses an object storage service node through a file interface, at least one object is stored in a storage barrel of the object storage service node, the object storage service node comprises a layered naming space, the layered naming space is used for mapping at least one object into a file or a directory on a directory tree, the method comprises the steps of receiving a file creation instruction for a first file sent by an application program, the first file is a file or a directory on the directory tree, acquiring a first file identification code of a father directory of the first file from a mapping relation table, sending a file creation request to the object storage service node, wherein the file creation request carries a first file name and the first file identification code, receiving a second file identification code distributed by the object storage service node, and recording a corresponding relation between the second file identification code and the first file name into the mapping relation table.

In this embodiment, the file storage client builds a mapping relationship table of file names and file identification codes, and when interacting with the object storage service node, the file storage client can directly query the file identification codes in the mapping relationship table cached by the file storage client, thereby reducing the access amount and access pressure of the object storage service node. Meanwhile, for the file storage client, the process of constructing the buffer layer by layer in the full path of the file is reduced, and the data access efficiency is improved.

With reference to the second aspect, in some implementations of the second aspect, a file writing instruction for the first file sent by the application program is received, a second file identification code corresponding to the first file is obtained from a mapping relation table of the file storage client, a file writing request is sent to the object storage service node, the file writing request carries the second file identification code, and a processing result of the file writing request is received.

With reference to the second aspect, in some implementations of the second aspect, a first file reading instruction for the first file sent by the application program is received, a second file identification code corresponding to the first file is obtained from a mapping relation table of the file storage client, a first file reading request is sent to the object storage service node, the first file reading request carries the second file identification code, and a processing result of the first file reading request is received.

In a third aspect, a data access device is provided, where the device is applied to an object storage service node, at least one object is stored in a storage bucket of the object storage service node, the object storage service node includes a hierarchical namespace, the hierarchical namespace is used for mapping the at least one object into a file or a directory on a directory tree, the device includes a receiving module, an obtaining module, a processing module, and a processing module, where the receiving module is used for receiving a file creation request, the file creation request carries a first file name and a first file identification code of a parent directory of the first file, the obtaining module is used for obtaining metadata of the parent directory of the first file corresponding to the first file identification code, the processing module is used for creating the first file under the parent directory according to the metadata of the parent directory of the first file, and distributing a second file identification code to the first file, where the first file is a file or a directory on the directory tree, and the processing module is also used for recording a correspondence between the second file identification code and the first file name into a file index table.

The fourth aspect provides a data access device, which is applied to a file storage client, the file storage client accesses an object storage service node through a file interface, at least one object is stored in a storage barrel of the object storage service node, the object storage service node comprises a layered naming space, the layered naming space is used for mapping at least one object into a file or a directory on a directory tree, the device comprises a receiving module, an obtaining module, a sending module and a receiving module, the receiving module is used for receiving a file creation instruction sent by an application program and aiming at a first file, the first file is a file or a directory on the directory tree, the obtaining module is used for obtaining a first file identification code of a parent directory of the first file from a mapping relation table, the sending module is used for sending a file creation request to the object storage service node, the file creation request carries a first file name and the first file identification code, the receiving module is also used for receiving a second file identification code distributed by the object storage service node, and the corresponding relation between the second file identification code and the first file name is recorded in the mapping relation table.

Regarding the technical principles and advantages of the third and fourth aspects, reference may be made to the foregoing descriptions of the first and second aspects, and details are not repeated herein.

In a fifth aspect, the present application provides a cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory, the processor of the at least one computing device being operable to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method of any one of the implementations of the first aspect or the first aspect.

In a sixth aspect, the present application provides a cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory, the processor of the at least one computing device being configured to execute instructions stored in the memory of the at least one computing device, such that the cluster of computing devices performs the method of any one of the implementations of the second aspect or the second aspect described above.

In a seventh aspect, the present application provides a computer program product comprising instructions which, when executed by a cluster of computer devices, cause the cluster of computer devices to perform the method of any one of the implementations of the first aspect or the first aspect.

In an eighth aspect, the application provides a computer program product comprising instructions which, when executed by a cluster of computer devices, cause the cluster of computer devices to perform the method of any one of the implementations of the second or second aspect described above.

In a ninth aspect, the present application provides a computer readable storage medium comprising computer program instructions which, when executed by a cluster of computing devices, perform the method of any one of the implementations of the first aspect or the first aspect described above.

In a tenth aspect, the application provides a computer readable storage medium comprising computer program instructions which, when executed by a cluster of computing devices, perform the method of any one of the implementations of the second or third aspects above.

Drawings

In order to more clearly describe the embodiments of the present application or the technical solutions in the background art, the following description will describe the drawings that are required to be used in the embodiments of the present application or the background art.

Fig. 1 is a schematic view of a scenario of data access according to the present application.

Fig. 2 is a schematic architecture diagram of an object storage client according to the present application.

Fig. 3 is a schematic diagram of an architecture of an object storage service node according to the present application.

FIG. 4 is a schematic diagram of a data access architecture according to the present application.

FIG. 5 is a schematic diagram of a mapping of object storage space and file storage space in accordance with the present application.

Fig. 6 is a schematic diagram of a data access architecture according to the present application.

Fig. 7 is a schematic flow chart of a data access according to the present application.

FIG. 8 is a schematic diagram of mapping an object storage space and a file storage space according to the present application.

Fig. 9 is a schematic flow chart of a data access according to the present application.

Fig. 10 is a schematic flow chart of writing a file provided by the present application.

Fig. 11 is a schematic diagram of a file reading process according to the present application.

Fig. 12 is a schematic diagram of a file reading process according to the present application.

Fig. 13 is a schematic diagram of a file reading process according to the present application.

Fig. 14 is a schematic structural diagram of a data access device provided by the present application.

Fig. 15 is a schematic structural diagram of another data access device provided by the present application.

FIG. 16 is a schematic diagram of a computing device according to the present application.

Fig. 17 is a schematic structural diagram of a computing device cluster according to the present application.

Fig. 18 is a schematic diagram of a configuration of a network connection between computing devices according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

In describing embodiments of the present application, unless otherwise indicated, the term "and/or" herein is an association relationship describing an associated object, and indicates that there may be three relationships, for example, a and/or B, and may indicate that there are three cases where a exists alone, while a and B exist together, and B exists alone. The symbol "/" herein indicates that the associated object is or is a relationship, e.g., A/B indicates A or B. Wherein A, B may be singular or plural.

In describing embodiments of the present application, the term "plurality" herein refers to two or more unless otherwise indicated. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (a, b, or c) of a, b, c, a-b, a-c, b-c, or a-b-c may be represented, wherein a, b, c may be single or plural. For example, the plurality of processing units means two or more processing units and the like, and the plurality of elements means two or more elements and the like.

In addition, in order to facilitate the clear description of the technical solution of the embodiment of the present application, in the embodiment of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.

Meanwhile, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations or explanations. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the description of the embodiments of the present application, unless otherwise indicated, the meaning of "a plurality of" means two or more, and in order to make the technical solutions provided by the present application clearer, explanation of related terms is first performed before the technical solutions provided by the present application are specifically described.

(1) The file system (FILE SYSTEM) is software in the operating system for managing and storing file information, and comprises a basic file system and a plurality of file subsystems, wherein a user can detach and install the file subsystems according to the requirement. The target tree provided by the file system comprises a root directory, subdirectories and files under the subdirectories, wherein the root directory or the subdirectories can comprise the files and the next-level subdirectories.

(2) The file storage service (File Storage Service, FSS) is to store data in the form of files on the file storage service node. The file storage service node provides a multi-level tree directory storage method based on a file system. The file storage service node is used for providing operations (i.e., file services) conforming to file semantics for users, such as operations of reading files, writing files, creating files, deleting files, opening files, closing files, renaming (Rename) files, and the like. The file storage service node provides a file interface, the file interface can indicate operations conforming to file semantics, for example, the file interface can be a portable operating system interface (Portable Operating SYSTEM INTERFACE, POSIX), and a user can access data on the file storage service node through a POSIX interface. It should be noted that, when a user accesses data through the file interface, the data may be referred to as a file.

(3) An object store service (Object Storage Service, OBS) is to store data in the form of objects in object storage nodes. The object storage node provides a flattened storage mode based on a storage bucket and objects, all the objects in the storage bucket are in the same logic level, and a multi-level tree directory structure in the file storage service node is removed. The object storage system is used to provide operations (i.e., object services) that conform to object semantics, such as uploading objects, downloading objects, listing each object in a bucket, copying objects, moving objects, renaming objects, deleting each object of a bucket, deleting a specified plurality of objects, and so forth, to a user. The object storage node provides an object interface, which may indicate operations conforming to the semantics of the object, for example, the object interface may be a Representational STATE TRANSFER, REST application programming interface (Application Programming Interface, API) through which a user may access data on the object storage service node. It should be noted that, when a user accesses data through the object interface, the data may be referred to as an object.

(4) Objects (objects) are the basic unit of data storage in an Object storage system, an Object being actually an aggregate of data of one file and its associated attribute information (metadata). The data uploaded to the OBS by the tenant is stored in the bucket in the form of an object. The object includes three parts of Key value (Key), metadata (Metadata), and Data (Data). The key, i.e. the object name, is for example a sequence of characters with a length of more than 0 and not more than 1024 encoded by utf-8, each object in a bucket having a unique object key.

(5) Bucket (Bucket) is a container for storing objects in OBS, object storage provides a flattened storage mode based on the Bucket and the objects, all the objects in the Bucket are in the same logic level, and the multi-level tree directory structure in the file system is removed. Each barrel has the own storage category, access right, belonging area and other attributes, and tenants can create barrels with different storage categories and access rights and configure more advanced attributes to meet the storage requirements of different scenes.

(6) The object storage device (Object Storage Device, OSD), which is a basic storage unit of the object storage system, is disposed on a physical disk, specifically, a storage space of a fixed size of the physical disk, and the object storage system manages the physical disks of the plurality of computing devices in the form of OSD.

With the development of storage technology, in order to be able to store unstructured data such as movies, office documents, images, and the like more conveniently, object storage is proposed in the related art. Object storage uses flat address space to store data, where there is no layering of directories and files, so that one object storage system can store large amounts of data.

In the object storage, the tenant can log in the cloud management platform on the public cloud access page through the pre-registered account number and password, and after the login is successful, the tenant selects and purchases the corresponding public cloud service, such as an OBS, a virtual machine service, a container service and the like, on the public cloud access page. For OBS, the tenant can further create multiple buckets through a configuration interface or API on a public cloud access page provided by a cloud platform, the sum of the number and the size of objects stored in each bucket is not limited, the tenant does not need to consider the expandability of data, the OBS is a service based on REST API and hypertext transfer protocol (Hypertext Transport Protocol, HTTP) or hypertext transfer security protocol (Hypertext Transfer Protocol Secure, HTTPs), and the tenant can locate bucket resources through a uniform resource locator (Uniform Resource Identifier, URI), which is also called access domain name, also called domain name for short. In OBS, the bucket names are globally unique and cannot be modified, i.e. the bucket created by the tenant cannot be the same as other bucket names already created by the tenant, nor can the bucket names created by other tenants.

Fig. 1 shows a schematic diagram of at least one application scenario in the present application. As shown in fig. 1, each bucket may include a plurality of objects, the objects between the buckets are isolated from each other, a tenant logs in the cloud management platform 10 through an object storage client, selects and purchases a cloud service of the object storage service on the cloud management platform 10, and after the cloud service is purchased, the tenant can perform object storage based on a function provided by the object storage service. The cloud management platform 10 is mainly used for managing the infrastructure of the running object storage service. For example, the infrastructure running the object store service may include a plurality of data centers disposed in different areas, each data center including a plurality of servers. The data center may provide underlying resources, such as computing resources, storage resources, etc., for the object storage service. Therefore, when purchasing and using the object storage service, the tenant mainly pays for the used resources. Specifically, the object store service provides a domain name of a bucket, and a tenant can access the domain name of the bucket through an object store client to upload data to the bucket or download data from the bucket, wherein the uploaded data is stored in the bucket in an object manner.

When the tenant uploads the object, the storage category of the object can be designated, and if the storage category is not designated, the storage category of the tenant is consistent with the storage category of the bucket by default. After uploading, the storage class of the object may be modified and the bucket may be accessed through the object storage client. The object store client may be a dedicated object store client provided for a locally used browser or cloud platform of the tenant. For example, the tenant may access the bucket through a browser provided in a local computer, and after the input account passes the verification, the tenant may upload the object into the bucket or modify, delete, etc. the object in the bucket by using the browser, where the local computer accesses the internet.

The system architecture to which the present application is applicable is described below with reference to fig. 2, and in this embodiment, the object storage client 20 may be deployed on the user side, such as a terminal device 21 that may be deployed outside the data center. The terminal device 21 may further be provided with an application program 22, and the application is not limited to a specific type of terminal device, for example, the terminal device may be a mobile phone, a notebook computer, a tablet computer, a palm computer, a wireless terminal in a smart city, or a wireless device in a smart home. The object store client 20 can interact with the application 22, as the application 22 can send a data modification request to the object store client 20 under the triggering of the tenant. The object store client 20, upon receiving the data modification request, may send a data write request to the application 22 in accordance with the data modification request.

The cloud management platform 10 is disposed in a data center and is respectively connected to an object storage service node (the object storage service node may be simply referred to as an object storage service node, and in the present application, the object storage service node 201 and the object storage service node 202 are drawn as examples) and a remote connection gateway 204 through a switching device 203.

The tenant may log in to the cloud management platform 10 at the object storage client 20 using an account number, create a bucket through the cloud management platform 10, configure a bucket name, and obtain the bucket domain name. After detecting the operation of the tenant (such as the operation of creating a bucket, configuring a bucket name, etc.), the cloud management platform 10 may issue a creation instruction to the object storage service node (such as the object storage service node 201 or the object storage service node 202) through the switching device 203, where the creation instruction includes information such as the bucket name, the bucket domain name, etc., notifies the object storage service node to create the bucket, and stores the information such as the bucket name, the bucket domain name, etc.

After the bucket is created, the tenant may also access the bucket domain name by operating the object store client 20, locate the bucket in the object store service node, upload data in the bucket (upload objects), download data (download objects). In the present application, the tenant may also modify the objects in the bucket by operating the object store client 20. That is, the tenant may trigger the object storage client 20 to send a data write request to the object storage service node, indicating that the data to be modified in the target object stored in the bucket of the object storage service node is modified.

After receiving the data writing request, the object storage service node can determine the target object in the bucket according to the bucket name and the name of the target object carried in the data writing request, and modify the data to be modified into modified data according to the offset of the data to be modified in the target object carried in the data writing request and the data length of the data to be modified.

For example, the remote connection gateway 204 may be implemented by a virtual private network (Virtual Private Network, VPN) or a private network, and the object storage service node may be implemented by a server provided with a plurality of physical disks.

In the following, the structure of the object storage service node is described by taking the object storage service node 201 as an example, referring to fig. 3, the object storage service node 201 includes a software layer and a hardware layer, the hardware layer includes a disk controller 2015, a physical network card 2016, a physical disk 1 and a physical disk 2 (only taking the object storage service node 201 as an example and not limiting the number of physical disks in the object storage service node in the present application). The software layer includes an object storage device control unit 2011 and an operating system 2012, the control unit 2011 runs on the operating system 2012, the operating system 2012 includes a disk drive 2013 and a physical network card drive 2014, the cloud management platform 10 can communicate with the control unit 2011 through the physical network card 2016, the control unit 2011 controls the disk controller 2015 to set the physical disk 1 and the physical disk 2 to a plurality of persistent storage units (PERSISTENCE LOG, PLOG) through the disk drive 2013, the application only takes PLOG as a basic storage unit as an example, and all storage spaces capable of being used as the basic storage unit are suitable for the application. After receiving the creation instruction of the bucket 1, the cloud management platform 10 notifies the control unit 2011 to create the bucket 1, the control unit 2011 creates the bucket 1 on the physical disk 1 and the physical disk 2 in the hardware layer through the operating system, PLOG of the bucket 1 are distributed in the physical disk 1 and the physical disk 2, and information such as the bucket name, the bucket domain name and the like of the bucket 1 is saved.

When a tenant needs to upload an object into bucket 1 through object store client 20, the tenant may trigger object store service node 201 to write the object into bucket through object store client 20. For example, an object may be written into a bucket by the control unit 2011 in the object storage service node 201. When the tenant needs to download the object in the bucket 1 through the object storage client 20, the tenant may trigger the object storage service node 201 to read the object from the bucket through the object storage client 20, for example, the control unit 2011 in the object storage service node 201 may obtain the object from the bucket, and feed back to the object storage client 20.

The above description is described with respect to the granularity of the object storage service node. From a functional perspective, interactions inside an object store service node can be understood as interactions between functional layers.

OBS stores objects in a flat namespace, provides an Object storage service in the form of a Key-Value pair, and a user can uniquely obtain the contents of an Object according to the name (Key) of the Object. In general, objects can be managed as files, but in order to make it more convenient for tenants to manage Data, object storage services provide a way to simulate folders by adding "/", e.g., "test/123.Jpg", to the names of objects, where "test" is simulated as a folder and "123.Jpg" is simulated as the file name under the "test" folder, and in fact, the object name (Key) is "test/123.Jpg", and the Data (Data) is 123.Jpg file itself. Although the user may use names like test1/test.jpg, this does not indicate that the user's Object is saved under the test1 directory. For OBS, test1/test. Jpg is just one string, and a. Jpg is not an essential difference.

However, many big data, data analysis applications, and artificial intelligence simulations currently store data in a hierarchical namespace having infinite depth of nesting and atomic operations in the namespace, such as many big data and data analysis applications and file system configuration uses. The file system is a typical tree index structure, a file named test1/test. Jpg, and the access process needs to access the directory of test1 first, and then find the file named test. Jpg under the directory. The file system can easily support folder operations such as renaming a directory, deleting a directory, moving a directory, etc.

Thus, in order to be compatible with file applications, the use of a hierarchical directory structure in an object storage system of a flat namespace is implemented, the object storage service node providing a hierarchical namespace for file system operations of the object storage structure, i.e. building file services on the basis of the object storage. That is, the file application may access data on the object storage service node through a file interface provided by the object storage service node.

Fig. 4 shows a schematic architecture of a hierarchical namespace in the present application, taking the object storage service node 201 as an example, to illustrate the hierarchical namespaces in the object storage service node. Referring to fig. 4, an object storage service node 201 stores a plurality of objects. Hierarchical namespace 400 maps object identifiers stored in object storage space 401 into file or directory identifiers (or paths) in file storage space 402, and the mapping is stored in hierarchical namespace 400. That is, a plurality of objects in the object storage space 401 are organized into a multi-level directory hierarchy of the file storage space 402. Where object storage space 401 is a planar namespace and file storage space 402 is a hierarchical namespace, these two namespaces may coexist in object storage service node 201 and allow a user to access data stored in object storage service node 201 in two different ways. Illustratively, a user may access an object 405 stored in a bucket 403 in an object storage space 401 through an object interface based on REST API and HTTP protocol by means of an object storage client 20, and may access a file 407 stored under a directory 406 in a file system 404 in a file storage space 402 through a file storage client 40 based on Network file system (Network FILESYSTEM, NFS) protocol.

Where object storage space 401 stores data using a two-layer data model of bucket-objects (which may also be considered directory-objects), file namespace 402 stores data using a directory tree structure of file system-directory-subdirectory-files. Thus, hierarchical namespace 400 maps bucket 403 in object storage space 401 to file system 404 in file storage space 402 and object 405 to directory 406 or file 407. That is, hierarchical namespace 400 maps a plurality of objects in bucket 403 to a plurality of nodes on the directory tree.

When a node is a file, the node may also be referred to as a file node. When a node is a directory, the node may also be referred to as a directory node. When an object is mapped to a file node, the object may include user data, metadata, and an object ID. When an object is mapped to a directory node, the object may include metadata as well as an object ID.

For example, as shown in FIG. 5, object store service node 201 may map an object "/" in bucket 403 to a root directory "/" in file system 404, an object "/b" to a directory "b", and an object "/b/c" to a file "c", i.e., map an object to a directory or file in the file system in a full path manner.

The object store service node 201 maps bucket-object representations in a flat namespace in an object store space to forward slash "/+ filenames" in a file namespace to emulate folders and organize into a multi-level directory tree structure enabling single atomic operations to rename a directory or file. However, the object storage service node identifies a file by using a full path, when a user accesses the file, the object storage service node needs to analyze the full path layer by layer, access the file stored under the directory in the file system, and the deeper the accessed directory, the larger the consumed resource and the lower the access efficiency. In addition, in the cache of the client, the full path needs to be constructed in a layer-by-layer traversal mode through the directory tree structure, so that the access efficiency of the client and the service node is low when the client and the service node access the file.

In order to solve the problem of low access efficiency of a client and an object storage service node, the application provides a data access method, wherein a file identification code distributor distributes a file identification code for a file or a directory stored in the object storage service node, the file identification code is globally unique and is used for identifying the file or the directory, a user accesses the file or the directory stored in the object storage service node by using the file identification code, the full path of the file is not required to be analyzed layer by layer, and the access efficiency of the user is improved.

For ease of understanding, at least one application scenario of the embodiments of the present application will be described first.

Fig. 6 is a schematic diagram of at least one application scenario in an embodiment of the present application. As shown in fig. 6, a file identifier allocator 610 and a file index table 620 are added to the object storage service node 201. Wherein, when a user creates a file or a directory, the object storage service node 201 generates a globally unique file identification code for the created file or directory through the file identification code distributor 610, the file identification code is a basic attribute of the file or directory, and once created, the file identification code is unchanged. Meanwhile, the file identification code is generated, and at the same time, the corresponding relation from the file identification code to the file information is recorded in the file index table 620, and through the file index table 620, the user can query the metadata information of the file or the directory through the file identification code.

Next, a data access method provided by the present application is described based on the above description. It will be appreciated that the method is set forth based on what has been described above, some or all of which may be found in the description above.

Fig. 7 is a schematic flow chart of a data access method according to the present application. It will be appreciated that the method may be implemented in any suitable computing, processing, storage-capable apparatus, device, platform or cluster of devices. As shown in fig. 7, the method may include steps S701 to S704:

Step S701, a file creation request is received.

In this embodiment, the object storage service node receives a file creation request sent by the file storage client, where the file creation request is used to request the object storage service node to create a first file, where the first file is created under a parent directory of the first file. The object store service node may be the object store service node 201, which is also described below as an example.

Optionally, when the file storage client sends the file creation request, the first file identification code corresponding to the parent directory of the first file may be queried in the cache of the file client, and then the first file name to be created and the first file identification code of the parent directory of the first file may be sent to the object storage service node.

In this embodiment, after receiving a file creation request, the object storage service node first determines an access manner of the file creation request. For example, the file creation request is to use the file identification code to access the object storage service node or to access the object storage service node in the full path of the file. The access identification of the file creation request can be used for judging whether the file creation request is a file identification code or a file full path access object storage service node. The access identifier may be in various forms, such as, but not limited to, a query (query) parameter, or carried by a header field of the file creation request, or carried by a special prefix or a special suffix of the URI.

In one example, the file creation request may carry the identification type of the file identification code with the query parameter, e.g., the file identification code is 100/c with the query parameter, i.e., the file with the file name c is created under the parent directory with the file identification code of 100. For example, the file creation request may be as follows:

create 100/c

Host:xxxx

Connection:xxx

In one example, the file creation request may be carried by a header field of the file creation request, such as using Uritype to identify that the file creation request is accessed over a full path of the file. For example, the file creation request may be as follows:

Uritype:path

in one example, the file creation request may be carried by a special prefix or special suffix of the URI, such as using the special prefix FIFEID _200/c to identify that the file creation request is accessed by a file identification code. For example, the file creation request may be as follows:

create/FILEID_200/c

Step S702, obtain metadata of a parent directory of the first file corresponding to the first file identifier.

In this embodiment, the object storage service node receives a file creation request sent by the file storage client, where the file creation request carries a first file name and a first file identification code of a parent directory of the first file, queries metadata corresponding to the parent directory of the first file from the file index table 620, obtains layout information of the parent directory, and queries whether the parent directory is an effective directory, i.e., determines whether the directory is not deleted, and simultaneously whether the first file exists under the parent directory.

In step S703, the first file is created under the parent directory according to the metadata of the parent directory of the first file, and the second file identification code is allocated to the first file.

In this embodiment, if the first file does not exist in the parent directory corresponding to the first file identifier, and the parent directory is an active directory, the object storage service node creates the first file in the parent directory, and allocates the second file identifier to the first file through the file identifier allocator 610 in the object storage service node. The file identifier allocator 610 may allocate a globally unique file identifier to the first file based on a logic clock or a cluster ID allocator, where the allocation manner only needs to ensure the globally unique file identifier and cannot be repeated, but the specific allocation manner is not limited thereto. In addition, the length, type and composition of the file identification code are not particularly limited in the embodiment of the present application.

In one example, as shown in FIG. 5, when directory b is created, it may be assigned a file identifier of 100 by the file identifier allocator 610, and when file c is created under the parent directory b, it may be assigned a file identifier of 200 by the file identifier allocator 610.

In step S704, the corresponding relationship between the second file identification code and the first file name is recorded in the file index table.

In this embodiment, the object storage service node creates a first file under the parent directory, allocates a second file identifier for the first file, and records a correspondence between the second file identifier and the first file name in the file index table. Metadata information of the first file can be queried through the second file identification code through the file index table.

In one example, as shown in FIG. 8, the object storage service node maps objects in bucket 803 of object storage 801 to files or directories in file system 804 of file storage 802, and assigns file identifier 1024 to the root directory, 50 to directory a, 100 to directory a/b, 200 to a/b/c, and 300 to a/b/d by file identifier allocator 610. After the file identifier allocator 610 allocates a file identifier to a file or directory, the object in the bucket 803 is stored with the file identifier+file name of the parent directory, e.g., directory a is stored as "1024/a" and file a/b/c is stored as "100/c".

In one example, after the file identifier allocator 610 allocates a file identifier to a file or directory, the correspondence of the file identifier to the name of the file or directory is stored in the file index table 620, as shown in table 1. The file identification code is a key in the file index table, and the name of the file or the directory is a value in the file index table.

Table 1 file index table

Key	Value
		50	1024/a
100	50/b
		200	100/c
300	100/d

In one example, metadata information corresponding to a file name in a file index table may be obtained from a metadata index table in an object storage service node, as shown in table 2. Therefore, the metadata information of the file or the catalog can be directly acquired through the file identification code. Metadata of a file may be used to describe attributes of the file. The metadata of the file may include at least one of a file size of the file, a creation time of the file, a modification time of the file, an access right of the file, a data layout of the file, wherein the data layout of the file includes an identification of a storage unit storing the file, an offset of the storage unit of the file, and a length of the storage unit of the file. Second, the metadata of the file may also include a file index code of the file.

Table 2 metadata index table

Key	Value
		1024/a	File metadata information, 50
50/b	File metadata information, 100
		100/c	File metadata information, 200
100/d	File metadata information, 300

In this way, in this embodiment, the file identifier distributor distributes the file identifier for the file or the directory, so that when a user accesses the file, the file identifier is globally unique and is used to identify the file or the directory, and the user accesses the file or the directory stored in the object storage service node by using the file identifier, so that it is not necessary to analyze the full path of the file layer by layer, and the access efficiency of the user is improved.

In some possible implementation manners, in order to improve the data access efficiency, the file storage client may use the file identification code as a key to construct a mapping relationship table as shown in table 1, and when interacting with the object storage service node, the file storage client may directly query the file identification code in the mapping relationship table cached by the file storage client, thereby reducing the access amount and access pressure of the object storage service node, and simultaneously reducing the process of layer-by-layer construction of the full path of the file, and improving the data access efficiency. In this embodiment, as shown in fig. 9, the data access method may further include steps S901 to S904:

Step S901 receives a file creation instruction for a first file sent by an application program.

In this embodiment, the file storage client may receive a file creation instruction for the first file from the application program, the file creation instruction being for instructing creation of the first file at the object storage service node. The first file is a hierarchical namespace that maps at least one object stored in a bucket of an object storage service node into a file or directory on a directory tree. The file storage client may be the file storage client 40 and the object storage service node may be the object storage service node 201, as will be described below.

In one example, the application calls a file creation instruction create (c) for file "c".

Alternatively, the file storage client may be implemented based on a private file client, such as a user space based file system (FILESYSTEM IN Userspace, fuse) implementation, an interface capable of exporting the file system to a Linux kernel through a user space program, enabling an unprivileged user to create his own file system without editing the kernel code. The FUSE-based file storage client provides two APIs, a "high-level" synchronous API and a "low-level" asynchronous API. Both APIs receive requests from the kernel and pass to the main program (fuse_main function), which processes using the corresponding callback function. When using a high-level API, the callback function works with the file name (FILE NAMES) and path (paths) instead of an inodes, and returns when the callback function completes a request process. When using a low-level API, the callback function must work with the inode, responding to the use of a separate set of API functions that must be displayed.

Optionally, the high-level API constructs an access mode of a full path of the file, and uses the full path of the file as an access identifier of the access, but the high-level API has lower implementation cost and lower efficiency, and the file storage client based on FUSE needs to construct layer by layer when transmitting the full path of the file. Moreover, the high-level API cannot be used for constructing a file full-path cache, and renaming of the file cannot be supported. However, by using the low-level API, access identification using the file identification code as access can be supported, metadata cache using the file identification code as index can be maintained, renaming of the file and other operations are supported, and the access amount and access pressure to the object storage service node are reduced by constructing a mapping relation table in the file storage client.

Step S902, a first file identification code of a parent directory of the first file is obtained from the mapping relation table.

In this embodiment, after receiving a file creation instruction for a first file sent by an application program, a file storage client obtains a first file identification code of a parent directory of the first file from a mapping relationship table cached by the file storage client. The mapping relation table stores the corresponding relation between file names and file identification codes, and when the file storage client interacts with the object storage service node, the file storage client can directly inquire the file identification codes in the mapping relation table cached by the file storage client to determine the first file identification code of the father directory of the first file.

For example, the parent directory of the query file "c" in the file storage client's cache is/a/b, with a file identification of 200.

Step S903, a file creation request is sent to the object storage service node.

In this embodiment, after acquiring the first file identifier of the parent directory of the first file from the mapping relationship table, the file storage client sends a file creation request to the object storage service node, where the file creation request is used to request creation of the first file at the object storage service node, and the file creation request carries the first file name and the first file identifier of the parent directory.

In one example, the file storage client sends a file write request PUT/100/curitype = fileid, i.e., instructs access to the object storage service node with file identity fileid, and creates file c under the parent directory with file identity 100.

Optionally, the file storage client may carry an access identifier in the file creation request for identifying whether the file creation request is based on file full path access or based on a file identification code. The access identifier may be in various forms, such as, but not limited to, a query (query) parameter, or carried by a header field of the file creation request, or carried by a special prefix or a special suffix of the URI.

Step S904, a second file identification code distributed by the object storage service node is received, and the corresponding relation between the second file identification code and the first file name is recorded in a mapping relation table.

In this embodiment, after the object storage service node creates the first file under the parent directory and allocates the second file identifier to the first file through the file identifier allocator 610 in the object storage service node, the object storage service node sends the second file identifier of the first file to the file storage client. And after the file storage client receives the second file identification code distributed by the object storage service node, recording the corresponding relation between the second file identification code and the first file name into a mapping relation table.

In this way, in this embodiment, the file storage client may directly query the file identification code in the mapping relation table cached by the file storage client when interacting with the object storage service node by caching the mapping relation table of the file identification code and the file name, thereby reducing the access amount and access pressure of the object storage service node. Meanwhile, for the file storage client, the process of constructing the buffer layer by layer in the full path of the file is reduced, and the data access efficiency is improved.

In some possible implementations, as shown in fig. 10, the data access method further includes steps 1001 to 1007:

in step S1001, the application program transmits a file write instruction for the first file.

In this embodiment, after the object storage service node creates the first file under the parent directory of the first file, the data content of the first file needs to be written. The application program sends a file writing instruction for the first file to the file storage client, wherein the file writing instruction is used for instructing the file storage client to write the data content of the first file in the object storage service node.

In one example, the application invokes a file write instruction write (c) for file "c".

In step S1002, the file storage client obtains the second file identifier corresponding to the first file from the mapping relationship table.

In this embodiment, after receiving a file writing instruction for a first file, a file storage client queries a second file identification code corresponding to the first file from a mapping relationship table cached by the file storage client. For example, the second file identifier of the file "c" is found in the cache to be 200.

In step S1003, the file storage client sends a file write request, where the file write request carries a second file identification code.

In this embodiment, after the file storage client queries the second file identifier corresponding to the first file, a file writing request is sent to the object storage service node, where the file writing request is used to request writing of the data content of the first file into the object storage service node, and the file writing request carries the second file identifier queried by the file storage client in the cache, which indicates that the object storage service node is accessed based on the file identifier.

In one example, the file storage client sends a file write request PUT/200uritype = fileid, i.e., instructs to access the object storage service node with a file identification code fileid and write a file with a file identification code 200 to the object storage service node.

In step S1004, the object storage service node obtains the first file name corresponding to the second file identification code from the file index table.

In this embodiment, after receiving a file writing request sent by a file storage client, the object storage service node determines that a file identification code is accessed according to an access identifier of the file writing request, analyzes a second file identification code in the file writing request, obtains a first file name corresponding to the second file identification code from a file index table, and obtains metadata of the first file based on the first file name.

In one example, the object storage service node receives the file write request PUT/200uritype = fileid, determines through uritype that the file write request is accessed based on the file identification code fileid, and parses the file identification code to be 200.

In step S1005, the object storage service node acquires metadata of the first file.

In this embodiment, the object storage service node obtains a first file name corresponding to the second file identification code from the file index table, and obtains metadata information of the first file from the metadata index table based on the first file name. The metadata of the first file includes one or more of a file size of the first file, a creation time of the first file, an access right of the first file, a data layout of the first file, wherein the data layout of the first file includes an identification of a first storage unit storing the first file, an offset of the first file in the first storage unit, and a length of the first file in the first storage unit.

In step S1006, the object storage service node writes the first file in the first storage unit according to the data layout of the first file.

In this embodiment, after acquiring metadata of the first file based on the second file identification code, the object storage service node determines a first storage unit storing the first file according to a data layout of the first file, and writes the first file in the first storage unit.

Optionally, the file creation request carries an offset in the first storage unit to write the first file, indicating that the first file is written under the offset address of the first storage unit.

In step S1007, the object storage service node updates metadata of the first file.

In this embodiment, after the first storage unit writes the first file, the object storage service node needs to update metadata information of the first file, for example, update a size of the first file, a modification time of the first file, and a data layout of the first file.

In this way, in the embodiment, the file storage client uses the file identification code to write the file in the object storage service node, so that the full path of the file does not need to be analyzed layer by layer, and the data access efficiency is improved.

In some possible implementations, as shown in fig. 11, the data access method further includes steps 1101 to 1106:

In step S1101, the application program transmits a first file reading instruction for the first file.

In this embodiment, after the object storage service node writes the first file, the application program may send a first file reading instruction for the first file to the file storage client, where the first file reading instruction is used to instruct the file storage client to read the first file from the object storage service node.

In one example, the application invokes a first file read instruction read (c) for file "c".

In step S1102, the file storage client obtains a second file identifier corresponding to the first file from the mapping relationship table.

In this embodiment, after receiving a first file reading instruction for a first file, a file storage client queries a second file identification code corresponding to the first file from a mapping relationship table cached by the file storage client. For example, the second file identifier of the file "c" is found in the cache to be 200.

In step S1103, the file storage client sends a first file reading request, where the file reading request carries a second file identification code.

In this embodiment, after the file storage client queries the second file identifier corresponding to the first file, a first file reading request is sent to the object storage service node, where the first file reading request is used to instruct to read the first file from the object storage service node. The first file read request carries the queried second file identification code, indicating that the object storage service node is accessed based on the file identification code.

In one example, the file storage client sends a first file read request GET/200uritype = fileid, i.e., indicates that the object storage service node is accessed with a file identification code fileid, and reads a file with a file identification code 200 from the object storage service node.

In step S1104, the object storage service node obtains the first file name corresponding to the second file identification code from the file index table.

In this embodiment, after receiving a first file reading request sent by a file storage client, the object storage service node determines that a file identification code is accessed according to an access identifier of the first file reading request, analyzes a second file identification code in the first file reading request, obtains a first file name corresponding to the second file identification code from a file index table, and obtains metadata of the first file based on the first file name.

In one example, the object storage service node receives the first file read request GET/200uritype = fileid, determines through uritype that the first file read request is accessed based on the file identification code fileid, and parses the file identification code to be 200.

In step S1105, the object storage service node acquires metadata of the first file.

In step S1106, the object storage service node reads the first file in the first storage unit according to the data layout of the first file.

In this embodiment, after acquiring metadata of the first file based on the second file identification code, the object storage service node determines a first storage unit storing the first file according to a data layout of the first file, and reads the first file from the first storage unit.

Optionally, the first file read request carries an offset and a length of the first file at the first storage unit, indicating that the first file is read at the offset address and the specified length of the first storage unit.

In this way, in the embodiment, the file storage client uses the file identification code to read the file from the object storage service node, so that the full path of the file does not need to be analyzed layer by layer, and the data access efficiency is improved.

In some possible implementations, as shown in fig. 12, the data access method further includes steps 1201 to 1205:

step 1201, a second file read request is received.

In this embodiment, after receiving the second file reading request sent by the file storage client, the object storage service node determines that the file is accessed in a full path according to the access identifier of the second file reading request.

In one example, the file storage client sends a second file read request GET/a/b/curitype = path, i.e., indicates to access the object storage service node with the full path of the file, and reads the file with the full path of/a/b/c from the object storage service node.

Step 1202, parsing the access path of the first file through the hierarchical namespace to obtain a second file identification code of the first file.

In this embodiment, as shown in fig. 8, since the object storage service node maps the object in the bucket 803 of the object storage space 801 to a file or a directory in the file system 804 of the file storage space 802, after the file identifier allocator 610 allocates a file identifier to the file or the directory, the object in the bucket 803 is stored with the file identifier+file name of the parent directory, and thus cannot be directly matched to the file using the full path of the file. Therefore, the access path of the first file needs to be analyzed through the hierarchical namespace to obtain the second file identification code of the first file. The hierarchical namespace may be the hierarchical namespace 400, also described below as an example.

In one example, the hierarchical namespace parses the full path of the file as/a/b/c, resulting in a file identification of 200 for the file c.

Step 1203, obtaining a first file name corresponding to the second file identification code from the file index table.

In this embodiment, after the second file identification code of the first file is obtained by parsing, a first file name corresponding to the second file identification code is obtained from the file index table, and metadata of the first file is obtained based on the first file name.

In step 1204, metadata for the first file is obtained.

In this embodiment, the object storage service node obtains a first file name corresponding to the second file identification code from the file index table, and obtains metadata information of the first file from the metadata index table based on the first file name.

In step 1205, the first file is read from the first storage unit according to the data layout of the first file.

Thus, in this embodiment, when the file storage client uses the file full path access object storage service node to read the file, the object storage service node can identify two access modes of the file identification code or the file full path, and still is compatible with the file full path mode to access the file.

In some possible implementations, as shown in fig. 13, the data access method further includes steps 1301 to 1305:

In step 1301, a third file read request is received, the third file read request including a file handle.

In this embodiment, after receiving the third file read request sent by the file storage client, the object storage service node determines that the access is based on the file handle through the access identifier of the third file read request, that is, the third file read request is based on the file protocol to access the object storage service node. The file handle (FILE HANDLE) is a data structure, typically an integer or pointer, used in the operating system to access files. The file handles are used to identify open files, each having a unique file handle.

Alternatively, the file protocol may include a server message Block (SERVER MESSAGE Block, SMB), a Network file system (Network FILESYSTEM, NFS) protocol, or a Common Network file system protocol (Common INTERNET SYSTEM, CIFS). The SMB protocol is an application-level network protocol that is used primarily for sharing printers, file access, serial ports, and other communications between nodes on a network. SMB is mainly used by Windows systems, and is an authenticated inter-process communication mechanism. NFS allows one system to share directories and files with others over a network, and by using NFS, users and programs can access files on remote systems like local files. The CIFS protocol is a network file system access protocol through an Internet file system, and can realize network file sharing among hosts of the windows system through the CIFS protocol. The Common Internet File System (CIFS) is an enhanced version of microsoft server message block protocol (SMB), a standard method for computer users to share files across intranets and the internet. CIFS enables collaboration over the internet by defining a remote file access protocol compatible with the way applications share data on local disks and network file servers. CIFS runs on TCP/IP, enhancing its scalability with the global domain name service system (DNS) over the internet, while optimizing for slow dial-up connections that are ubiquitous on the internet. The CIFS may send via the network to the remote device using the redirection packet, and the redirector may also send a request to the protocol stack of the local computer using the CIFS.

Alternatively, the NFS or SMB protocol defines a file handle under the protocol, for example, the handle of NFSv3 is 64 bytes and the handle of SMB2 is 16 bytes. For example, in the NFS protocol, a file identification code may be encapsulated in a file handle.

In one example, the file storage client sends a third file read request Rpc: read (200), i.e., encapsulating the file identification 200 into a file handle of the NFS or SMB protocol, wherein the remote procedure call protocol (Remote Procedure Call, rpc) is a protocol for building an appropriate framework that a program can use to request services from a program located on another computer on the network without having to know the details of the network.

Step 1302, identifying, by the gateway, a second file identification code in the file handle.

In this embodiment, the object storage service node includes a gateway, and after receiving the third file reading request sent by the file storage client, the second file identification code encapsulated in the file handle may be identified by the gateway.

In one example, after receiving the third file read request Rpc: read (200), the gateway converts the file protocol request format into a HHTP protocol format that can be recognized by the object storage service node, i.e., into GET/200uritype = fileid, further into accessing the object storage service node with the file identification code fileid, and reads the file with the file identification code 200 from the object storage service node.

Step 1303, obtaining the first file name corresponding to the second file identification code from the file index table.

In this embodiment, after receiving a third file reading request sent by a file storage client, the object storage service node determines that the file is accessed by using an access identifier of the third file reading request, analyzes a second file identifier in the third file reading request, obtains a first file name corresponding to the second file identifier from a file index table, and obtains metadata of the first file based on the first file name.

In one example, the object storage service node receives the GET/200uritype = fileid converted by the gateway, determines through uritype that the third file read request is accessed based on the file identification code fileid, and parses the file identification code to be 200.

In step 1304, metadata for the first file is obtained.

In step 1305, the first file is read from the first storage unit according to the data layout of the first file.

Optionally, the third file read request carries an offset and a length of the first file at the first storage unit, indicating that the first file is read at the offset address and the specified length of the first storage unit.

Thus, in this embodiment, the file storage client reads the file from the object storage service node based on the file protocol, and can implement conversion from the file protocol to the HHTP protocol that can be identified by the object storage service node through the gateway, that is, "NAS over HTTP" is implemented, and even if the file client passes through the file protocol, the file storage client can still access the file stored by the object storage service node based on the file identification code, thereby improving the data access efficiency. In addition, the file identification code is used as the access identification, so that the file protocol is more easily compatible.

The present application also provides a data access apparatus, as shown in fig. 14, the apparatus 1400 includes:

A receiving module 1401, configured to receive a file creation request, where the file creation request carries a first file name and a first file identification code of a parent directory of a first file;

an obtaining module 1402, configured to obtain metadata of a parent directory of the first file corresponding to the first file identifier;

A processing module 1403, configured to create a first file under a parent directory according to metadata of a parent directory of the first file, and allocate a second file identification code to the first file, where the first file is a file or a directory on a directory tree;

the processing module 1403 is further configured to record a correspondence between the second file identifier and the first file name in the file index table.

In some possible implementations, the receiving module 1401 is further configured to receive a file writing request, where the file writing request carries a first file to be written and a second file identification code, the obtaining module 1402 is further configured to obtain a first file name corresponding to the second file identification code from a file index table, where the first file name is used to indicate metadata of the first file, the obtaining module 1402 is further configured to obtain metadata of the first file, where the metadata of the first file includes a data layout of the first file, the processing module 1403 is further configured to write the first file in a first storage unit according to the data layout of the first file, and the processing module is further configured to update the metadata of the first file.

In some possible implementations, the metadata of the first file includes one or more of a file size of the first file, a creation time of the first file, an access right of the first file, a data layout of the first file, the data layout of the first file including an identification of a first storage location storing the first file, an offset of the first file at the first storage location, and a length of the first file at the first storage location.

In some possible implementations, the receiving module 1401 is further configured to receive a first file reading request, where the first file reading request carries a second file identifier, the obtaining module 1402 is further configured to obtain a first file name corresponding to the second file identifier from a file index table, the obtaining module 1402 is further configured to obtain metadata of the first file, where the metadata of the first file includes a data layout of the first file, and the processing module 1403 is further configured to read the first file from the first storage unit according to the data layout of the first file.

In some possible implementations, the receiving module 1401 is further configured to receive a second file reading request, where the second file reading request carries an access path of the first file, the processing module 1403 is further configured to parse the access path of the first file through a hierarchical namespace to obtain a second file identification code of the first file, the obtaining module 1402 is further configured to obtain a first file name corresponding to the second file identification code from a file index table, the obtaining module 1402 is further configured to obtain metadata of the first file, and the processing module 1403 is further configured to read the first file from the first storage unit according to a data layout of the first file.

In some possible implementations, the receiving module 1401 is further configured to receive a third file reading request, where the third file reading request includes a file handle, the processing module 1403 is further configured to identify, by the gateway, a second file identifier in the file handle, the obtaining module 1402 is further configured to obtain a first file name corresponding to the second file identifier from the file index table, the obtaining module 1402 is further configured to obtain metadata of the first file, and the processing module 1403 is further configured to read the first file from the first storage unit according to a data layout of the first file.

The receiving module 1401, the acquiring module 1402 and the processing module 1403 may be implemented by software, or may be implemented by hardware. Illustratively, the implementation of processing module 1403 is described next with processing module 1403 as an example. Similarly, the implementation of the receiving module 1401 and the acquiring module 1402 may refer to the implementation of the processing module 1403.

Module as an example of a software functional unit, the processing module 1403 may include code that runs on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container, among others. Further, the above-described computing examples may be one or more. For example, processing module 1403 may include code running on multiple hosts/virtual machines/containers. It should be noted that, multiple hosts/virtual machines/containers for running the code may be distributed in the same region (region), or may be distributed in different regions. Further, multiple hosts/virtual machines/containers for running the code may be distributed among the same AZ, or may be distributed among different AZs, each AZ including one data center or multiple geographically proximate data centers. Wherein typically a region may comprise a plurality of AZs.

Also, multiple hosts/virtual machines/containers for running the code may be distributed in the same virtual private cloud VPC, or may be distributed among multiple VPCs. In general, one VPC is disposed in one region, and a communication gateway is disposed in each VPC for implementing inter-connection between VPCs in the same region and between VPCs in different regions.

Module as an example of a hardware functional unit, the processing module 1403 may include at least one computing device, such as a server or the like. Alternatively, the processing block 1403 may be a device implemented by an application specific integrated circuit ASIC, or a programmable logic device PLD, or the like. Wherein, the PLD can be CPLD, FPGA, GAL or any combination thereof.

The multiple computing devices included in processing module 1403 may be distributed in the same region or may be distributed in different regions. The plurality of computing devices included in processing module 1403 may be distributed among the same AZ or among different AZ. Likewise, multiple computing devices included in processing module 1403 may be distributed across the same VPC or across multiple VPCs. Wherein the plurality of computing devices may be any combination of computing devices such as servers, ASIC, PLD, CPLD, FPGA, and GAL.

It should be noted that, in other embodiments, the receiving module 1401 may be configured to perform any step in the data access method, the acquiring module 1402 may be configured to perform any step in the data access method, the processing module 1403 may be configured to perform any step in the data access method, the steps that the receiving module 1401, the acquiring module 1402 and the processing module 1403 are responsible for implementing may be specified according to needs, and the receiving module 1401, the acquiring module 1402 and the processing module 1403 implement different steps in the data access method respectively to implement all functions of the data processing apparatus.

The present application also provides a data access apparatus, as shown in fig. 15, the apparatus 1500 includes:

A receiving module 1501, configured to receive a file creation instruction for a first file sent by an application program, where the first file is a file or a directory on a directory tree;

an obtaining module 1502, configured to obtain, from the mapping relationship table, a first file identifier of a parent directory of a first file;

A sending module 1503, configured to send a file creation request to an object storage service node, where the file creation request carries a first file name and a first file identification code;

The receiving module 1501 is further configured to receive a second file identifier allocated by the object storage service node, and record a correspondence between the second file identifier and the first file name in the mapping relationship table.

In some possible implementations, the receiving module 1501 is further configured to receive a file writing instruction for the first file sent by the application program, the obtaining module 1502 is further configured to obtain a second file identification code corresponding to the first file from a mapping relationship table of the file storage client, the sending module 1503 is further configured to send a file writing request to the object storage service node, where the file writing request carries the second file identification code, and the receiving module 1501 is further configured to receive a processing result of the file writing request.

In some possible implementations, the receiving module 1501 is further configured to receive a first file reading instruction for a first file sent by an application program, the obtaining module 1502 is further configured to obtain a second file identification code corresponding to the first file from a mapping relationship table of a file storage client, the sending module 1503 is further configured to send a first file reading request to an object storage service node, where the first file reading request carries the second file identification code, and the receiving module 1501 is further configured to receive a processing result of the first file reading request.

The receiving module 1501, the acquiring module 1502 and the transmitting module 1503 may be implemented by software or may be implemented by hardware. Illustratively, the implementation of the acquisition module 1502 will be described next as an example of the acquisition module 1502. Similarly, the implementation of the receiving module 1501 and the transmitting module 1503 may refer to the implementation of the acquiring module 1502.

Module as an example of a software functional unit, the acquisition module 1502 may include code that runs on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container, among others. Further, the above-described computing examples may be one or more. For example, the acquisition module 1502 may include code that runs on multiple hosts/virtual machines/containers. It should be noted that, multiple hosts/virtual machines/containers for running the code may be distributed in the same region (region), or may be distributed in different regions. Further, multiple hosts/virtual machines/containers for running the code may be distributed among the same AZ, or may be distributed among different AZs, each AZ including one data center or multiple geographically proximate data centers. Wherein typically a region may comprise a plurality of AZs.

Module as an example of a hardware functional unit, the acquisition module 1502 may include at least one computing device, such as a server or the like. Alternatively, the acquisition module 1502 may be a device implemented using an application specific integrated circuit ASIC, or a programmable logic device PLD, or the like. Wherein, the PLD can be CPLD, FPGA, GAL or any combination thereof.

The acquisition module 1502 includes multiple computing devices that may be distributed in the same region or in different regions. The plurality of computing devices included in the acquisition module 1502 may be distributed in the same AZ or may be distributed in different AZs. Likewise, multiple computing devices included in the acquisition module 1502 may be distributed in the same VPC, or may be distributed in multiple VPCs. Wherein the plurality of computing devices may be any combination of computing devices such as servers, ASIC, PLD, CPLD, FPGA, and GAL.

It should be noted that, in other embodiments, the receiving module 1501 may be configured to perform any step in the data access method, the obtaining module 1502 may be configured to perform any step in the data access method, and the sending module 1503 may be configured to perform any step in the data access method. The steps of the receiving module 1501, the acquiring module 1502 and the transmitting module 1503 that are responsible for implementation may be specified as required, and all functions of the data processing apparatus are implemented by implementing different steps in the data access method by the receiving module 1501, the acquiring module 1502 and the transmitting module 1503, respectively.

The present application also provides a computing device 100. As shown in fig. 16, computing device 100 includes a bus 102, a processor 104, a memory 106, and a communication interface 108. Communication between the processor 104, the memory 106, and the communication interface 108 is via the bus 102. Computing device 100 may be a server or a terminal device. It should be understood that the present application is not limited to the number of processors, memories in computing device 100.

Bus 102 may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one line is shown in fig. 16, but not only one bus or one type of bus. Bus 104 may include a path to transfer information between various components of computing device 100 (e.g., memory 106, processor 104, communication interface 108).

The processor 104 may include any one or more of a central processing unit (central processing unit, CPU), a graphics processor (graphics processing unit, GPU), a Microprocessor (MP), or a digital signal processor (DIGITAL SIGNAL processor, DSP).

The memory 106 may include volatile memory (RAM), such as random access memory (random access memory). The processor 104 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, mechanical hard disk (HARD DISK DRIVE, HDD) or Solid State Disk (SSD).

The memory 106 stores therein executable program codes that the processor 104 executes to realize the functions of the aforementioned receiving module 1401, acquiring module 1402, and processing module 1403, respectively, thereby realizing a data access method. That is, the memory 106 has stored thereon instructions for performing the data access method. Fig. 16 shows only an example in which the memory 106 stores program codes realizing the functions of the foregoing receiving module 1401, acquiring module 1402, and processing module 1403.

Communication interface 103 enables communication between computing device 100 and other devices or communication networks using a transceiver module such as, but not limited to, a network interface card, transceiver, or the like.

The embodiment of the application also provides a computing device cluster. The cluster of computing devices includes at least one computing device. The computing device may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop, notebook, or smart phone.

As shown in fig. 17, the cluster of computing devices includes at least one computing device 100. The same instructions for performing the data access method may be stored in memory 106 in one or more computing devices 100 in the cluster of computing devices.

In some possible implementations, portions of the instructions for performing the data access method may also be stored separately in the memory 106 of one or more computing devices 100 in the cluster of computing devices. In other words, a combination of one or more computing devices 100 may collectively execute instructions for performing a data access method.

It should be noted that the memories 106 in different computing devices 100 in the computing device cluster may store different instructions for performing part of the functions of the data processing apparatus. That is, the instructions stored by the memory 106 in the different computing devices 100 may implement the functionality of one or more of the receiving module 1401, the acquiring module 1402, and the processing module 1403, or the functionality of one or more of the receiving module 1501, the acquiring module 1502, and the transmitting module 1503.

In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein the network may be a wide area network or a local area network, etc. Fig. 18 shows one possible implementation. As shown in fig. 18, two computing devices 100A and 100B are connected by a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this type of possible implementation, instructions to perform the functions of the receiving module 1401 are stored in the memory 106 in the computing device 100A. In fig. 18, an example is shown in which an instruction for executing the function of the receiving module 1401 is stored in the memory 106 in the computing device 100A. Meanwhile, instructions to perform the functions of the acquisition module 1402 and the processing module 1403 are stored in the memory 106 in the computing device 100B. In fig. 18, an example is shown in which instructions for executing the functions of the acquisition module 1402 and the processing module 1403 are stored in the memory 106 in the computing device 100B.

The connection between clusters of computing devices shown in fig. 18 may be implemented by computing device 100B in consideration of the large amount of data that is required for the data access method provided by the present application, and thus in consideration of the functions implemented by acquisition module 1402 and processing module 1403.

It should be appreciated that the functionality of computing device 100A shown in fig. 18 may also be performed by multiple computing devices 100. Likewise, the functionality of computing device 100B may also be performed by multiple computing devices 100.

Embodiments of the present application also provide a computer program product comprising instructions. The computer program product may be software or a program product containing instructions capable of running on a computing device or stored in any useful medium. The computer program product, when run on at least one computing device, causes the at least one computing device to perform a data access method, or a data access method.

The embodiment of the application also provides a computer readable storage medium. The computer readable storage medium may be any available medium that can be stored by a computing device or a data storage device such as a data center containing one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc. The computer-readable storage medium includes instructions that instruct a computing device to perform a data access method or instruct a computing device to perform a data access method.

It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the protection scope of the technical solution of the embodiments of the present invention.

Claims

1. A data access method, the method being applied to an object storage service node, the object storage service node having at least one object stored in a storage bucket thereof, the object storage service node comprising a hierarchical namespace for mapping the at least one object to a file or directory on a directory tree, the method comprising:

receiving a file creation request, wherein the file creation request carries a first file name and a first file identification code of a parent directory of a first file;

acquiring metadata of a parent directory of the first file corresponding to the first file identification code;

Creating the first file under the parent directory according to the metadata of the parent directory of the first file, and distributing a second file identification code for the first file, wherein the first file is a file or a directory on the directory tree;

And recording the corresponding relation between the second file identification code and the first file name into a file index table.

2. The method of claim 1, further characterized in that the method further comprises:

Receiving a file writing request, wherein the file writing request carries the first file and the second file identification code to be written;

acquiring the first file name corresponding to the second file identification code from the file index table, wherein the first file name is used for indicating metadata of the first file;

Acquiring metadata of the first file, wherein the metadata of the first file comprise a data layout of the first file;

writing the first file in a first storage unit according to the data layout of the first file;

and updating the metadata of the first file.

3. The method of claim 1 or 2, wherein the metadata of the first file comprises one or more of a file size of the first file, a creation time of the first file, an access right of the first file, a data layout of the first file, the data layout of the first file comprising an identification of the first storage unit storing the first file, an offset of the first file at the first storage unit, and a length of the first file at the first storage unit.

4. A method according to any one of claims 1 to 3, further comprising:

Receiving a first file reading request, wherein the first file reading request carries the second file identification code;

Acquiring the first file name corresponding to the second file identification code from the file index table;

and reading the first file from the first storage unit according to the data layout of the first file.

5. The method according to claim 1, wherein the method further comprises:

Receiving a second file reading request, wherein the second file reading request carries an access path of the first file;

Analyzing the access path of the first file through the hierarchical naming space to obtain the second file identification code of the first file;

acquiring metadata of the first file;

6. The method of claim 1, wherein the object storage service node comprises a gateway, the method further comprising:

Receiving a third file read request, the third file read request including a file handle;

identifying the second file identification code in the file handle through the gateway;

acquiring metadata of the first file;

7. A data access method, wherein the method is applied to a file storage client, the file storage client accesses an object storage service node through a file interface, at least one object is stored in a storage bucket of the object storage service node, the object storage service node comprises a hierarchical namespace, and the hierarchical namespace is used for mapping the at least one object into a file or a directory on a directory tree, the method comprises:

Receiving a file creation instruction for a first file sent by an application program, wherein the first file is a file or a directory on the directory tree;

acquiring a first file identification code of a parent directory of the first file from the mapping relation table;

sending a file creation request to the object storage service node, wherein the file creation request carries the first file name and the first file identification code;

and receiving a second file identification code distributed by the object storage service node, and recording the corresponding relation between the second file identification code and the first file name into the mapping relation table.

8. The method of claim 7, wherein the method further comprises:

Receiving a file writing instruction aiming at the first file, which is sent by the application program;

Acquiring the second file identification code corresponding to the first file from a mapping relation table of the file storage client;

Sending a file writing request to the object storage service node, wherein the file writing request carries the second file identification code;

And receiving a processing result of the file writing request.

9. The method according to claim 7 or 8, characterized in that the method further comprises:

receiving a first file reading instruction aiming at the first file, which is sent by the application program;

Sending a first file reading request to the object storage service node, wherein the first file reading request carries the second file identification code;

and receiving a processing result of the first file reading request.

10. A data access apparatus for application to an object storage service node having at least one object stored in a bucket of the object storage service node, the object storage service node comprising a hierarchical namespace for mapping the at least one object to a file or directory on a directory tree, the apparatus comprising:

the receiving module is used for receiving a file creation request, wherein the file creation request carries a first file name and a first file identification code of a parent directory of a first file;

the acquisition module is used for acquiring metadata of a parent directory of the first file corresponding to the first file identification code;

The processing module is used for creating the first file under the father catalog according to the metadata of the father catalog of the first file and distributing a second file identification code for the first file, wherein the first file is a file or catalog on the catalog tree;

The processing module is further configured to record a correspondence between the second file identification code and the first file name in a file index table.

11. The apparatus of claim 10, wherein the apparatus further comprises:

The receiving module is further configured to receive a file writing request, where the file writing request carries the first file and the second file identification code to be written;

the obtaining module is further configured to obtain, from the file index table, the first file name corresponding to the second file identification code, where the first file name is used to indicate metadata of the first file;

the acquisition module is further configured to acquire metadata of the first file, where the metadata of the first file includes a data layout of the first file;

The processing module is further configured to write the first file in a first storage unit according to the data layout of the first file;

the processing module is further configured to update metadata of the first file.

12. The apparatus of claim 10 or 11, wherein the metadata of the first file comprises one or more of a file size of the first file, a creation time of the first file, an access right of the first file, a data layout of the first file, the data layout of the first file comprising an identification of the first storage unit storing the first file, an offset of the first file at the first storage unit, and a length of the first file at the first storage unit.

13. The apparatus according to any one of claims 10 to 12, further comprising:

The receiving module is further configured to receive a first file reading request, where the first file reading request carries the second file identification code;

The obtaining module is further configured to obtain the first file name corresponding to the second file identification code from the file index table;

The processing module is further configured to read the first file from the first storage unit according to a data layout of the first file.

14. The apparatus of claim 10, wherein the apparatus further comprises:

the receiving module is further configured to receive a second file reading request, where the second file reading request carries an access path of the first file;

The processing module is further configured to parse the access path of the first file through the hierarchical namespace to obtain the second file identification code of the first file;

The acquisition module is further used for acquiring metadata of the first file;

15. The apparatus of claim 10, wherein the object storage service node comprises a gateway, the apparatus further comprising:

The receiving module is further configured to receive a third file reading request, where the third file reading request includes a file handle;

The processing module is further configured to identify, through the gateway, the second file identification code in the file handle;

16. A data access apparatus, the apparatus being applied to a file storage client, the file storage client accessing an object storage service node through a file interface, at least one object being stored in a bucket of the object storage service node, the object storage service node comprising a hierarchical namespace for mapping the at least one object to a file or directory on a directory tree, the apparatus comprising:

The receiving module is used for receiving a file creation instruction for a first file sent by an application program, wherein the first file is a file or a directory on the directory tree;

The acquisition module is used for acquiring a first file identification code of a father directory of the first file from the mapping relation table;

The sending module is used for sending a file creation request to the object storage service node, wherein the file creation request carries the first file name and the first file identification code;

The receiving module is further configured to receive a second file identification code allocated by the object storage service node, and record a correspondence between the second file identification code and the first file name in the mapping relationship table.

17. The apparatus of claim 16, wherein the apparatus further comprises:

the receiving module is further configured to receive a file writing instruction for the first file, where the file writing instruction is sent by the application program;

The obtaining module is further configured to obtain the second file identification code corresponding to the first file from a mapping relationship table of the file storage client;

the sending module is further configured to send a file writing request to the object storage service node, where the file writing request carries the second file identification code;

The receiving module is further configured to receive a processing result of the file writing request.

18. The apparatus according to claim 16 or 17, characterized in that the apparatus further comprises:

the receiving module is further configured to receive a first file reading instruction for the first file, where the first file reading instruction is sent by the application program;

the sending module is further configured to send a first file reading request to the object storage service node, where the first file reading request carries the second file identification code;

The receiving module is further configured to receive a processing result of the first file reading request.

19. A cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory;

the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method of any one of claims 1 to 6.

20. A cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory;

the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method of any of claims 7 to 9.

21. A computer program product containing instructions that, when executed by a cluster of computing devices, cause the cluster of computing devices to perform the method of any of claims 1 to 6.

22. A computer program product containing instructions which, when executed by a cluster of computing devices, cause the cluster of computing devices to perform the method of any of claims 7 to 9.

23. A computer readable storage medium comprising computer program instructions which, when executed by a cluster of computing devices, perform the method of any of claims 1 to 6.

24. A computer readable storage medium comprising computer program instructions which, when executed by a cluster of computing devices, perform the method of any of claims 7 to 9.