[go: up one dir, main page]

WO2011128454A1 - Storing a directory tree structure - Google Patents

Storing a directory tree structure Download PDF

Info

Publication number
WO2011128454A1
WO2011128454A1 PCT/EP2011/056067 EP2011056067W WO2011128454A1 WO 2011128454 A1 WO2011128454 A1 WO 2011128454A1 EP 2011056067 W EP2011056067 W EP 2011056067W WO 2011128454 A1 WO2011128454 A1 WO 2011128454A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
index
node
current
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2011/056067
Other languages
French (fr)
Inventor
Paul Anderson
Simon Wilkinson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Edinburgh
Original Assignee
University of Edinburgh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Edinburgh filed Critical University of Edinburgh
Publication of WO2011128454A1 publication Critical patent/WO2011128454A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1453Management of the data involved in backup or backup restore using de-duplication of the data

Definitions

  • Embodiments of the present invention relate to methods, apparatus, computer programs for storing a directory tree structure and for retrieving a stored directory tree structure. In particular, they relate to back up and/or restore. BACKGROUND TO THE INVENTION
  • a directory tree structure is a hierarchical organization of data structures typically comprising folders and files.
  • a folder can depend from another folder but cannot depend from a file.
  • a file can depend from a folder but cannot depend from a file.
  • the directory tree structure forms an acyclic directed rooted tree graph with the files as leaf nodes and the folders as internal nodes. It is desirable to copy a directory tree structure.
  • an apparatus comprising:
  • means for backing up the current node comprising:
  • a storage system comprising: one or more storage apparatus that stored a plurality of documents and a plurality of associated indexes, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document; and
  • an interface configured to receive an index and to provide in reply a document associated with the index
  • a sub-set of the plurality of documents and the plurality of associated indexes define a directory tree having internal nodes and leaf nodes, wherein any document of the sub-set comprising one or more indexes represents a respective internal node of the directory tree and the one or more indexes access documents representing child nodes of the respective internal node and
  • documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the directory tree.
  • index uses the index to query a database storing a plurality of documents and a plurality of associated indexes, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document;
  • a sub-set of the plurality of documents and the plurality of associated indexes define the directory tree having internal nodes and leaf nodes, wherein any document of the sub-set comprising one or more indexes represents a respective internal node of the directory tree and the one or more indexes access documents representing child nodes of the respective internal node and
  • documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the directory tree, and receiving in reply a document of the sub-set comprising one or more indexes and representing a respective root node of the directory tree, wherein the one or more indexes access documents representing child nodes of the root node in the directory tree structure.
  • an index-addressable storage providing the document to an index-addressable storage; and storing an index for the provided document in a further document represented by a parent node of the identified node in the graph wherein the index has been generated based on the provided document and has a value that represents the content of the provided document.
  • an apparatus comprising:
  • means for processing the document comprising:
  • a storage system comprising: one or more storage apparatus that stored a plurality of documents and a plurality of associated indexes, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document; and
  • an interface configured to receive an index and to provide in reply a document associated with the index
  • a sub-set of the plurality of documents and the plurality of associated indexes are represented by a graph having internal nodes and leaf nodes, wherein any document of the sub-set comprising one or more indexes represents a respective internal node of the graph and the one or more indexes access documents representing child nodes of the respective internal node and
  • documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the graph.
  • index uses the index to query a database storing a plurality of documents and a plurality of associated indexes, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document;
  • a sub-set of the plurality of documents and the plurality of associated indexes are represented by the graph having internal nodes and leaf nodes, wherein any document of the sub-set comprising one or more indexes represents a respective internal node of the graph and the one or more indexes access documents representing child nodes of the respective internal node and
  • documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the graph, and receiving in reply a document of the sub-set comprising one or more indexes and representing a respective root node of the graph, wherein the one or more indexes access documents representing child nodes of the root node in the graph.
  • Fig. 1 schematically illustrates a content-addressable storage (CAS) system
  • Fig 2 schematically illustrates a directory tree structure
  • Fig 3 schematically illustrates an alternative representation of the directory structure in a form in which it is stored in the content addressable storage system
  • Fig 4 schematically illustrates a method suitable for storing the directory structure in the content addressable storage system
  • Fig 5 schematically illustrates an optional document encryption block
  • Fig 6 illustrates one example of encryption
  • Fig 7 schematically illustrates a security method
  • Fig 8 schematically illustrates an example of a remote back up system
  • Fig 9 schematically illustrates a suitable controller for a host client apparatus or a host server apparatus.
  • Fig. 1 schematically illustrates a content-addressable storage (CAS) system 2.
  • the content-addressable storage (CAS) system 2 is an index addressable storage system in which a document dependent index 3 is used to reference a document 5.
  • the system 2 comprises: one or more storage apparatuses 4 and an interface 6.
  • the document 5 may be any collection of data whether structured or unstructured.
  • the storage apparatus or apparatuses 4 store a plurality of documents 5 and a plurality of associated indexes 3. Each index 3 is used to access a respective associated document 5 and has a value that represents the content of the respective associated document 5.
  • the value uniquely represents the content at least with respect to the storage apparatus or apparatuses 4 and may, in some embodiments, be a globally unique representation of the content.
  • the interface 6 is configured to handle database queries in the form of a received index 3. If the received index 3 is associated in the storage apparatus 4 with a document 5 then that document 5 is returned via the interface 6. If the received index 3 is not associated in the storage apparatus
  • the interface 6 is configured to augment the database by storing a document
  • the system 2 may have an index generator (not shown) which would generate an index 3 from a received document 5. In this implementation, it is not necessary for the index 3 to be received by the interface 6.
  • the index 3 performs two functions. It operates as content-address in the CAS system 2 and it has a value that uniquely represents the content of the document from which it is formed and with which it is associated in the CAS system 2. This latter property of the index in combination with the fact that the CAS system 2 only stores a document if its index is not already stored, prevents duplication of documents.
  • the index 3 may, for example be a cryptographic hash value produced by providing the document 5 as input to a cryptographic hash function.
  • a cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value. Any change to the data will change the hash value.
  • Fig 2 schematically illustrates an example of a directory structure 10 for the storage and organization of data structures.
  • the directory structure 10 comprises folders 1 1 and files 12.
  • a folder can contain a file but a file cannot contain a folder.
  • the directory structure forms a tree with a root folder 1 1 0 o-
  • the directory tree structure 10 may be represented as a directed acyclic (rooted tree) graph.
  • the graph has internal nodes that have one or more dependent nodes.
  • the root node is a special case of an internal node as it does not depend from another node.
  • the graph has leaf nodes that depend from one or more internal nodes but that do not
  • Fig 3 schematically illustrates an alternative representation of the directory tree structure 10 in a form in which it is stored in the content addressable storage system 2.
  • the directory tree structure 10 is represented as a directed acyclic (rooted tree) graph 20.
  • the graph 20 has internal nodes that have one or more dependent nodes.
  • the root node 21 is a special case of an internal node as it does not depend from another node.
  • the graph 20 has leaf nodes that depend from one or more internal nodes but that do not have dependent nodes.
  • a node that depends from another node is called a 'child' or 'dependent' node.
  • a node from which one or more other nodes dependent is called a 'parent' node.
  • the graph 20 has a similar arrangement of nodes as the graph for the directory tree structure as illustrated in Fig 2. However, the graph 20 represents, at each node, a document 5.
  • the root node 22 0 o is separately connected to two dependent child nodes- a first node 22i 0 and a second node 22n via respective edges.
  • the first node 22i 0 is an internal node and is separately connected to dependent child nodes 22 2 o, 22 2 i via respective edges.
  • the second node 22i 0 is a leaf node and is not connected to any dependent child nodes.
  • Each node 22 nm has a corresponding document 5 nm . If a node 22 nm is a leaf node the corresponding document 5 nm is the file 12 nm . If a node 22 nm is an internal node, the corresponding document 5 nm is a document that comprises one or more indexes 3.
  • the one or more indexes 3 are usable to access documents at child nodes of the respective internal node.
  • the one or more indexes 3 may, for example, be cryptographic hash values of the documents at child nodes of the respective internal node.
  • the root node 22 0 o is an internal node
  • the corresponding document 5oo at this node comprises index 3io and index 3n .
  • Index 3io is the cryptographic hash value of document 5io which is stored at internal node 22i 0 . It can be used to access the document 5io in the CAS system 2.
  • Index 3n is the cryptographic hash value of document 5n which is stored at leaf node 22n . It can be used to access the document 5n in the CAS system 2.
  • Fig 4 schematically illustrates a method suitable for storing the directory tree structure 10 in the content addressable system 2.
  • the method 10 may be a back up method.
  • the method starts 21 , then at block 32 the directory tree structure 10 is descended.
  • a leaf node 22 of the directory tree structure 10 that has not been backed up is identified as a 'current' node and its file as a current document 5.
  • a node 22 is backed up when the document 5 at the node is stored in the CAS system 2.
  • the current node is backed up. This can be divided into a number of sub-blocks.
  • a current index 3 for the current node is generated using the current document 5.
  • the current index 3 has a value that uniquely represents the content of the current document 5.
  • the current index 3 may, for example, be a cryptographic hash value of the current document 5.
  • the current document 5 is provided to an index-addressable storage 2. It is assumed in this implementation that the index-addressable storage 2 is configured to generate an index 3 from the current document 5. If this were not the case, the index 3 could be sent with the current document 5.
  • the current index 3 is stored in a 'parent document', that is the document 5 at a parent node of the current node.
  • the root node 22 0 o is an internal node, its corresponding document 5oo at this node comprises index 3io for document 5io at node 22i 0 and index 3i for document 5i ⁇ at node 22n .
  • the method moves to block 37, where the directory tree structure 10 is ascended one level by identifying the parent node of the current node as the new current node and the parent document as the new current document.
  • the parent document can be closed and the method moves to block 39.
  • the directory tree structure 10 is traversed by identifying a child node of the current node that has not been backed up, as the new current node and returning to block 32.
  • the directory tree structure 10 is descended, if possible, to a leaf node 22 that directly or indirectly depends from the current node and the leaf node is identified as the new current node. The method then moves to block 33 to back up the leaf node.
  • the directory tree structure is, as described above, backed-up from the bottom up.
  • Each document at an internal node inherits, via block 36, an index generated from each of its child node documents.
  • the method described requires the identification of at least leaf nodes 22 of the directory tree structure 10 that have not been backed up. This may be achieved by generating an index 3 for each node using its document.
  • the index 3 has a value that uniquely represents the content of the document 5.
  • the current index 3 may, for example, be a cryptographic hash value of the document. If the index 3 already exists as an index for the accessing a document in the CAS system 2, then it or an equivalent has already been backed up at the CAS system 2. If the index 3 does not already exist as an index for accessing a document in the CAS system 2, then the node needs to be backed up.
  • step of generating an index 3 for a node illustrated in step 34 of Fig 4 may occur as prelude to the method 30 when the nodes that require back up are identified. If the indexes that are generated for identifying the nodes that require back up are cached, then they may be re-used later during back up. In this case, at step 34 of Fig 4, rather than generating the index, the step would instead comprise obtaining the cached index.
  • Fig 5 schematically illustrates an optional document encryption block 34A, which may occur in the method 30 after block 34 and before block 35.
  • the current document is encrypted before providing the (encrypted) current document to the index-addressable storage at block 35.
  • Fig 6 illustrates one example of encryption.
  • the document 5 is provided to a cryptographic hash algorithm which produces a hash value 43.
  • the hash value 43 is used as a cryptographic key for encrypting the document.
  • the document is provided to symmetric key encryption algorithm and the hash value is used as the symmetric key.
  • the encrypted document 5 is provided to a cryptographic hash algorithm which produces a hash value. This hash value is used as an index for the document.
  • the index and the cryptographic key are both stored in the parent document.
  • Additional properties of the document 5 may also be stored in the parent document.
  • the properties may, for example, include metadata such as revision date.
  • Fig 7 schematically illustrates a method 50 that may be performed when the method 30 ends 40.
  • the current node is a root node of the directory tree structure 10.
  • a current index for the current node is generated using the current document.
  • the current index may, for example, be a cryptographic hash value of the current document.
  • the current index has a value that uniquely represents the content of the current document.
  • the current document is provided to the index-addressable storage system 2.
  • the current index is stored in a master document
  • the master document is encrypted using a secret.
  • the secret is typically personal to the user like a unique ID or password.
  • the encrypted master document may be stored in an apparatus that hosts the directory structure 10, alternatively, the encrypted master document may be stored on a portable storage apparatus that can be carried by a user.
  • the method of restoring the directory tree structure 10 from back up comprises:
  • a sub-set of the plurality of documents and the plurality of associated indexes define a directory tree having internal nodes and leaf nodes. Any document of the sub-set comprising one or more indexes represents a respective internal node of the directory tree and the one or more indexes access documents representing child nodes of the respective internal node. Documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the directory tree, and
  • the indexes are used to retrieve further documents containing further indexes or files.
  • the process of using recovered indexes to recover more documents which in turn provide more indices or files is repeated until the directory tree structure 10 is reproduced.
  • Fig 8 schematically illustrates an example of a remote back up system 60.
  • the system 60 comprises a server apparatus 64 connected to the content addressable storage system 2.
  • the server apparatus 64 and the content addressable storage system may be co-located at location 62 or located at different locations.
  • the system 60 also comprises a plurality of client apparatus 68A, 68B, 68C, 68D which connect to the server apparatus 64 via network 66.
  • the network 66 may be a local area network, a metropolitan area network or a large area network for example.
  • the network 66 may be comprised of a plurality of interconnected networks.
  • Each client 68 has a directory structure 10 and is configured to communicate with the server 64 to back up the directory server as described above.
  • the methods illustrated in Figs 4, 5, 6 and 7 (or equivalents or alternatives) may be performed at a client 48.
  • the client 48 at each iteration of the back up block 33 in the method 30, sends the document stored at the current node to the server apparatus 64.
  • the client 48 is also configured to determine which nodes of the directory tree require backing-up.
  • the content addressable storage 2 comprises: one or more storage apparatus 4 (Fig 1 ) that store a plurality of documents 5 and a plurality of associated indexes 3, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document; and an interface 6 (Fig 1 ) configured to receive an index 3 and to provide in reply a document 5 associated with the index.
  • a sub-set of the plurality of documents and the plurality of associated indexes define a directory tree having internal nodes and leaf nodes. Any document of the sub-set comprising one or more indexes represents a respective internal node of the directory tree and the one or more indexes access documents representing child nodes of the respective internal node. Documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the directory tree.
  • FIG 9 schematically illustrates a suitable controller for a host client apparatus 68 or a host server apparatus 64.
  • processor have certain aspects in software including firmware alone or can be a combination of hardware and software (including firmware).
  • the controller may be implemented using instructions that enable hardware functionality, for example, by using executable computer program instructions in a general-purpose or special-purpose processor that may be stored on a computer readable storage medium (disk, memory etc) to be executed by such a processor.
  • a general-purpose or special-purpose processor that may be stored on a computer readable storage medium (disk, memory etc) to be executed by such a processor.
  • the controller comprises at least one processor 70 connected to read from and write to at least one memory 72.
  • the memory 72 stores a computer program 74 comprising machine readable instructions which when loaded in to the processor control the operation of the host.
  • the computer program 74 provides the logic and routines that enables the apparatus to perform the methods illustrated in any one or more of Figs 4, 5 , 6, 7.
  • the processor 70 by reading the memory 72 is able to load and execute the computer program 74.
  • the computer program may arrive at the memory 72 via any suitable delivery mechanism.
  • the delivery mechanism may be, for example, a computer- readable storage medium, a computer program product, a memory device, a record medium such as a CD-ROM or DVD, an article of manufacture that tangibly embodies the computer program 74.
  • the delivery mechanism may be a signal configured to reliably transfer the computer program 74.
  • the controller may propagate or transmit the computer program 74 as a computer data signal.
  • memory 72 is illustrated as a single component it may be implemented as one or more separate components some or all of which may be integrated/removable and/or may provide permanent/semi-permanent/ dynamic/cached storage.
  • the blocks illustrated in the Figs 4 to 7 may represent steps in a method and/or sections of code in the computer program 74.
  • the illustration of a particular order to the blocks does not necessarily imply that there is a required or preferred order for the blocks and the order and arrangement of the block may be varied. Furthermore, it may be possible for some steps to be omitted.
  • a document may be broken-up into parts for back up and each part is then processed separately as a separate document.
  • the document may be represented as a parent node and the parts of the documents represented as dependent nodes from the parent node.
  • the index does not have to be produced directly from a document.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

A method comprising: descending a directory tree structure by identifying a leaf node of the directory tree structure that has not been backed up as a current node, wherein the current node has a current document that has not been backed up; and backing up the current node by: providing the current document to an index-addressable storage; and storing a current index for the current node in a parent document for a parent node of the current node wherein the current index for the current node has been generated using the current document and has a value that uniquely represents the content of the current document.

Description

TITLE
Storing a directory tree structure FIELD OF THE INVENTION
Embodiments of the present invention relate to methods, apparatus, computer programs for storing a directory tree structure and for retrieving a stored directory tree structure. In particular, they relate to back up and/or restore. BACKGROUND TO THE INVENTION
A directory tree structure is a hierarchical organization of data structures typically comprising folders and files. A folder can depend from another folder but cannot depend from a file. A file can depend from a folder but cannot depend from a file.
The directory tree structure forms an acyclic directed rooted tree graph with the files as leaf nodes and the folders as internal nodes. It is desirable to copy a directory tree structure.
It is desirable to copy a directory tree structure to a remote storage facility to back up the directory tree structure and to enable restore of the directory tree structure at a later time.
It would be desirable to have a remote storage facility that can be used to store many directory tree structures for different users.
If two directory tree structures have identical sub-trees or identical files, it would be desirable to avoid duplicating the back up of common sub-trees or files that can be backed up once and then shared. BRIEF DESCRIPTION OF VARIOUS EMBODIMENTS OF THE INVENTION
According to various, but not necessarily all, embodiments of the invention there is provided a method comprising:
descending a directory tree structure by identifying a leaf node of the directory tree structure that has not been backed up as a current node, wherein the current node has a current document that has not been backed up; and backing up the current node by:
providing the current document to an index-addressable storage; and storing a current index for the current node in a parent document for a parent node of the current node wherein the current index for the current node has been generated using the current document and has a value that uniquely represents the content of the current document. According to various, but not necessarily all, embodiments of the invention there is provided an apparatus comprising:
means for descending a directory tree structure by identifying a leaf node of the directory tree structure that has not been backed up as a current node, wherein the current node has a current document that has not been backed up; and
means for backing up the current node comprising:
means for generating a current index for the current node using the current document, wherein the current index has a value that uniquely represents the content of the current document;
means for providing the current document to an index-addressable storage; and
means for storing the current index in a parent document for a parent node of the current node.
According to various, but not necessarily all, embodiments of the invention there is provided a storage system comprising: one or more storage apparatus that stored a plurality of documents and a plurality of associated indexes, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document; and
an interface configured to receive an index and to provide in reply a document associated with the index,
wherein a sub-set of the plurality of documents and the plurality of associated indexes define a directory tree having internal nodes and leaf nodes, wherein any document of the sub-set comprising one or more indexes represents a respective internal node of the directory tree and the one or more indexes access documents representing child nodes of the respective internal node and
wherein documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the directory tree.
According to various, but not necessarily all, embodiments of the invention there is provided a method comprising:
decrypting an encrypted master document comprising an index representing a root node of a directory tree to obtain the index;
using the index to query a database storing a plurality of documents and a plurality of associated indexes, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document;
wherein a sub-set of the plurality of documents and the plurality of associated indexes define the directory tree having internal nodes and leaf nodes, wherein any document of the sub-set comprising one or more indexes represents a respective internal node of the directory tree and the one or more indexes access documents representing child nodes of the respective internal node and
wherein documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the directory tree, and receiving in reply a document of the sub-set comprising one or more indexes and representing a respective root node of the directory tree, wherein the one or more indexes access documents representing child nodes of the root node in the directory tree structure.
According to various, but not necessarily all, embodiments of the invention there is provided a method comprising:
identifying a node of a graph that represents a document that has not processed; and
processing the document by:
providing the document to an index-addressable storage; and storing an index for the provided document in a further document represented by a parent node of the identified node in the graph wherein the index has been generated based on the provided document and has a value that represents the content of the provided document.
According to various, but not necessarily all, embodiments of the invention there is provided an apparatus comprising:
means for identifying a node of a graph that represents a document that has not been processed; and
means for processing the document comprising:
means for providing the document to an index-addressable storage; and
means for storing an index for the provided document in a further document represented by a parent node of the identified node in the graph wherein the index has been generated based on the provided document and has a value that represents the content of the provided document.
According to various, but not necessarily all, embodiments of the invention there is provided a storage system comprising: one or more storage apparatus that stored a plurality of documents and a plurality of associated indexes, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document; and
an interface configured to receive an index and to provide in reply a document associated with the index,
wherein a sub-set of the plurality of documents and the plurality of associated indexes are represented by a graph having internal nodes and leaf nodes, wherein any document of the sub-set comprising one or more indexes represents a respective internal node of the graph and the one or more indexes access documents representing child nodes of the respective internal node and
wherein documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the graph.
According to various, but not necessarily all, embodiments of the invention there is provided a method comprising:
decrypting an encrypted master document comprising an index representing a node of a graph to obtain the index;
using the index to query a database storing a plurality of documents and a plurality of associated indexes, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document;
wherein a sub-set of the plurality of documents and the plurality of associated indexes are represented by the graph having internal nodes and leaf nodes, wherein any document of the sub-set comprising one or more indexes represents a respective internal node of the graph and the one or more indexes access documents representing child nodes of the respective internal node and
wherein documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the graph, and receiving in reply a document of the sub-set comprising one or more indexes and representing a respective root node of the graph, wherein the one or more indexes access documents representing child nodes of the root node in the graph.
BRIEF DESCRIPTION OF THE DRAWINGS
For a better understanding of various examples of embodiments of the present invention reference will now be made by way of example only to the accompanying drawings in which:
Fig. 1 schematically illustrates a content-addressable storage (CAS) system; Fig 2 schematically illustrates a directory tree structure;
Fig 3 schematically illustrates an alternative representation of the directory structure in a form in which it is stored in the content addressable storage system;
Fig 4 schematically illustrates a method suitable for storing the directory structure in the content addressable storage system;
Fig 5 schematically illustrates an optional document encryption block;
Fig 6 illustrates one example of encryption;
Fig 7 schematically illustrates a security method;
Fig 8 schematically illustrates an example of a remote back up system; and Fig 9 schematically illustrates a suitable controller for a host client apparatus or a host server apparatus. DETAILED DESCRIPTION OF VARIOUS EMBODIMENTS OF THE INVENTION
Fig. 1 schematically illustrates a content-addressable storage (CAS) system 2. In this example, the content-addressable storage (CAS) system 2 is an index addressable storage system in which a document dependent index 3 is used to reference a document 5. The system 2 comprises: one or more storage apparatuses 4 and an interface 6.
The document 5 may be any collection of data whether structured or unstructured.
The storage apparatus or apparatuses 4 store a plurality of documents 5 and a plurality of associated indexes 3. Each index 3 is used to access a respective associated document 5 and has a value that represents the content of the respective associated document 5. The value uniquely represents the content at least with respect to the storage apparatus or apparatuses 4 and may, in some embodiments, be a globally unique representation of the content.
The interface 6 is configured to handle database queries in the form of a received index 3. If the received index 3 is associated in the storage apparatus 4 with a document 5 then that document 5 is returned via the interface 6. If the received index 3 is not associated in the storage apparatus
4 with a document 5 then a document 5 is not returned via the interface 6.
The interface 6 is configured to augment the database by storing a document
5 and its associated index 3 when received via the interface 6. Storage only occurs if the index 3 is not already associated with a document 5 in the storage apparatus 4. Alternatively, the system 2 may have an index generator (not shown) which would generate an index 3 from a received document 5. In this implementation, it is not necessary for the index 3 to be received by the interface 6.
The index 3 performs two functions. It operates as content-address in the CAS system 2 and it has a value that uniquely represents the content of the document from which it is formed and with which it is associated in the CAS system 2. This latter property of the index in combination with the fact that the CAS system 2 only stores a document if its index is not already stored, prevents duplication of documents.
The index 3 may, for example be a cryptographic hash value produced by providing the document 5 as input to a cryptographic hash function. A cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value. Any change to the data will change the hash value. Fig 2 schematically illustrates an example of a directory structure 10 for the storage and organization of data structures. The directory structure 10 comprises folders 1 1 and files 12. A folder can contain a file but a file cannot contain a folder. The directory structure forms a tree with a root folder 1 10o- The directory tree structure 10 may be represented as a directed acyclic (rooted tree) graph. The graph has internal nodes that have one or more dependent nodes. The root node is a special case of an internal node as it does not depend from another node. The graph has leaf nodes that depend from one or more internal nodes but that do not have dependent nodes.
A node Xnm in the graph is positioned at hierarchical level n (n=0, 1 , 2..) and at lateral index m (m=0, 1 , 2, 3...). The root node Xoo is a folder (X=1 1 ) separately connected to two dependent child nodes- a first node Xio and a second node Xn via respective edges 13io , 13n . The first node Xio is an internal node (folder, X=1 1 ) and is separately connected to dependent child nodes X20, X21 via respective edges 122o, 122i - The second node X10 is a leaf node (file, X=12) and is not connected to any dependent child nodes.
Fig 3 schematically illustrates an alternative representation of the directory tree structure 10 in a form in which it is stored in the content addressable storage system 2. The directory tree structure 10 is represented as a directed acyclic (rooted tree) graph 20. The graph 20 has internal nodes that have one or more dependent nodes. The root node 21 is a special case of an internal node as it does not depend from another node. The graph 20 has leaf nodes that depend from one or more internal nodes but that do not have dependent nodes. A node that depends from another node is called a 'child' or 'dependent' node. A node from which one or more other nodes dependent is called a 'parent' node. The graph 20 has a similar arrangement of nodes as the graph for the directory tree structure as illustrated in Fig 2. However, the graph 20 represents, at each node, a document 5.
A node 22nm in the graph is positioned at hierarchical level n (n=0, 1 , 2..) and at lateral index m (m=1 , 2, 3...). The root node 220o is separately connected to two dependent child nodes- a first node 22i0 and a second node 22n via respective edges. The first node 22i0 is an internal node and is separately connected to dependent child nodes 222o, 222i via respective edges. The second node 22i0 is a leaf node and is not connected to any dependent child nodes.
Each node 22nm has a corresponding document 5nm. If a node 22nm is a leaf node the corresponding document 5nm is the file 12nm. If a node 22nm is an internal node, the corresponding document 5nm is a document that comprises one or more indexes 3. The one or more indexes 3 are usable to access documents at child nodes of the respective internal node. The one or more indexes 3 may, for example, be cryptographic hash values of the documents at child nodes of the respective internal node.
As an example, referring to Fig 3, the root node 220o is an internal node, the corresponding document 5oo at this node comprises index 3io and index 3n . Index 3io is the cryptographic hash value of document 5io which is stored at internal node 22i0. It can be used to access the document 5io in the CAS system 2. Index 3n is the cryptographic hash value of document 5n which is stored at leaf node 22n . It can be used to access the document 5n in the CAS system 2.
Fig 4 schematically illustrates a method suitable for storing the directory tree structure 10 in the content addressable system 2. The method 10 may be a back up method.
The method starts 21 , then at block 32 the directory tree structure 10 is descended. A leaf node 22 of the directory tree structure 10 that has not been backed up is identified as a 'current' node and its file as a current document 5.
A node 22 is backed up when the document 5 at the node is stored in the CAS system 2. Next at block 33, the current node is backed up. This can be divided into a number of sub-blocks.
At sub-block 34, a current index 3 for the current node is generated using the current document 5. The current index 3 has a value that uniquely represents the content of the current document 5. The current index 3 may, for example, be a cryptographic hash value of the current document 5.
At sub-block 35, the current document 5 is provided to an index-addressable storage 2. It is assumed in this implementation that the index-addressable storage 2 is configured to generate an index 3 from the current document 5. If this were not the case, the index 3 could be sent with the current document 5. At sub-block 36, the current index 3 is stored in a 'parent document', that is the document 5 at a parent node of the current node.
As an example, referring to Fig 3, the root node 220o is an internal node, its corresponding document 5oo at this node comprises index 3io for document 5io at node 22i0 and index 3i for document 5i ^ at node 22n .
Next the method moves to block 37, where the directory tree structure 10 is ascended one level by identifying the parent node of the current node as the new current node and the parent document as the new current document.
At block 37, if all the child nodes of the parent node have been backed up, then the parent document can be closed and the method moves to block 39. At block 39, it is determined whether or not the current node is the root node. If it is the root node, the method ends 40. If it is not the root node, the current node is backed up at block 33 and the method continues as described above.
At block 37, if all the child nodes of the parent node have not been backed up, then the parent document remains open and the method moves to block 38.
At block 38, the directory tree structure 10 is traversed by identifying a child node of the current node that has not been backed up, as the new current node and returning to block 32. At block 32, the directory tree structure 10 is descended, if possible, to a leaf node 22 that directly or indirectly depends from the current node and the leaf node is identified as the new current node. The method then moves to block 33 to back up the leaf node.
It will therefore be appreciated, that ascending the directory tree structure only occurs when traversing is not or no longer available and block 37 exits to block 39 rather than block 38. After having ascended one level in the directory structure to the parent node and after backing up that parent node, traversing the directory structure will occur, if available, (block 38) before further ascending up the directory structure (block 39).
The directory tree structure is, as described above, backed-up from the bottom up. Each document at an internal node inherits, via block 36, an index generated from each of its child node documents.
It should be appreciated that the method described requires the identification of at least leaf nodes 22 of the directory tree structure 10 that have not been backed up. This may be achieved by generating an index 3 for each node using its document. The index 3 has a value that uniquely represents the content of the document 5. The current index 3 may, for example, be a cryptographic hash value of the document. If the index 3 already exists as an index for the accessing a document in the CAS system 2, then it or an equivalent has already been backed up at the CAS system 2. If the index 3 does not already exist as an index for accessing a document in the CAS system 2, then the node needs to be backed up.
Thus the step of generating an index 3 for a node illustrated in step 34 of Fig 4, may occur as prelude to the method 30 when the nodes that require back up are identified. If the indexes that are generated for identifying the nodes that require back up are cached, then they may be re-used later during back up. In this case, at step 34 of Fig 4, rather than generating the index, the step would instead comprise obtaining the cached index.
The following pseudo code defines an alternative example of a back up method 30:
BackupNode(N) {
If N is a directory, then let O = DirectoryObjectFor(N)
Otherwise, let O = contents of N
Let H = Hash(O) if there is no item with index H in the backup store, then {
Store O in the backup store with index H
If N is a directory {
For each entry E in the directory, BackupNode(E) }
}
}
DirectoryObjectFor(D) {
Create an empty directory object N
For each entry E in the directory D {
If E is a directory, then let O = DirectoryObjectFor(E) Otherwise, let O = contents of E
Let H = Hash(O)
Add the metadata for E, and the hash H to N
}
Return N
}
Fig 5 schematically illustrates an optional document encryption block 34A, which may occur in the method 30 after block 34 and before block 35. At block 34A the current document is encrypted before providing the (encrypted) current document to the index-addressable storage at block 35.
Fig 6 illustrates one example of encryption. The document 5 is provided to a cryptographic hash algorithm which produces a hash value 43. The hash value 43 is used as a cryptographic key for encrypting the document. In the illustrated example, the document is provided to symmetric key encryption algorithm and the hash value is used as the symmetric key. The encrypted document 5 is provided to a cryptographic hash algorithm which produces a hash value. This hash value is used as an index for the document. The index and the cryptographic key are both stored in the parent document.
Additional properties of the document 5 may also be stored in the parent document. The properties may, for example, include metadata such as revision date.
Fig 7 schematically illustrates a method 50 that may be performed when the method 30 ends 40. The current node is a root node of the directory tree structure 10.
At block 51 a current index for the current node is generated using the current document. The current index may, for example, be a cryptographic hash value of the current document. The current index has a value that uniquely represents the content of the current document.
Next at block 52, the current document is provided to the index-addressable storage system 2.
Next at block 53, the current index is stored in a master document;
Next at block 54, the master document is encrypted using a secret. The secret is typically personal to the user like a unique ID or password.
Next at block 56, the encrypted master document is stored. The encrypted master document may be stored in an apparatus that hosts the directory structure 10, alternatively, the encrypted master document may be stored on a portable storage apparatus that can be carried by a user.
The method of restoring the directory tree structure 10 from back up comprises:
a) decrypting the encrypted master document comprising an index representing a root node of a directory tree to obtain the index; b) using the index to query a database storing a plurality of documents and a plurality of associated indexes, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document. A sub-set of the plurality of documents and the plurality of associated indexes define a directory tree having internal nodes and leaf nodes. Any document of the sub-set comprising one or more indexes represents a respective internal node of the directory tree and the one or more indexes access documents representing child nodes of the respective internal node. Documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the directory tree, and
c) receiving in reply a document of the sub-set comprising one or more indexes and representing a respective root node of the directory tree, wherein the one or more indexes access documents representing child nodes of the root node in the directory tree structure.
The indexes are used to retrieve further documents containing further indexes or files. The process of using recovered indexes to recover more documents which in turn provide more indices or files is repeated until the directory tree structure 10 is reproduced.
Fig 8 schematically illustrates an example of a remote back up system 60.
The system 60 comprises a server apparatus 64 connected to the content addressable storage system 2. The server apparatus 64 and the content addressable storage system may be co-located at location 62 or located at different locations.
The system 60 also comprises a plurality of client apparatus 68A, 68B, 68C, 68D which connect to the server apparatus 64 via network 66. The network 66 may be a local area network, a metropolitan area network or a large area network for example. The network 66 may be comprised of a plurality of interconnected networks. Each client 68 has a directory structure 10 and is configured to communicate with the server 64 to back up the directory server as described above. The methods illustrated in Figs 4, 5, 6 and 7 (or equivalents or alternatives) may be performed at a client 48. The client 48, at each iteration of the back up block 33 in the method 30, sends the document stored at the current node to the server apparatus 64. The client 48 is also configured to determine which nodes of the directory tree require backing-up.
It will be appreciated that if the client apparatuses 68A, 68B. 68C have subtrees of their directory trees that are identical, then the document(s) used to represent that sub-tree is only stored once rather than once per client.
The content addressable storage 2 comprises: one or more storage apparatus 4 (Fig 1 ) that store a plurality of documents 5 and a plurality of associated indexes 3, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document; and an interface 6 (Fig 1 ) configured to receive an index 3 and to provide in reply a document 5 associated with the index. A sub-set of the plurality of documents and the plurality of associated indexes define a directory tree having internal nodes and leaf nodes. Any document of the sub-set comprising one or more indexes represents a respective internal node of the directory tree and the one or more indexes access documents representing child nodes of the respective internal node. Documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the directory tree.
An index for any document representing a node of the directory tree has a value that uniquely represents the structure and content of the sub-tree that has the node as its root such that if the structure or content of the sub-tree where to change the document would change and the index would change. Furthermore, each document in the path to the root would change. Fig 9 schematically illustrates a suitable controller for a host client apparatus 68 or a host server apparatus 64.
Implementation of controller can be in hardware alone ( a circuit, a
processor...), have certain aspects in software including firmware alone or can be a combination of hardware and software (including firmware).
The controller may be implemented using instructions that enable hardware functionality, for example, by using executable computer program instructions in a general-purpose or special-purpose processor that may be stored on a computer readable storage medium (disk, memory etc) to be executed by such a processor.
In the illustrated example, the controller comprises at least one processor 70 connected to read from and write to at least one memory 72. The memory 72 stores a computer program 74 comprising machine readable instructions which when loaded in to the processor control the operation of the host. The computer program 74 provides the logic and routines that enables the apparatus to perform the methods illustrated in any one or more of Figs 4, 5 , 6, 7. The processor 70 by reading the memory 72 is able to load and execute the computer program 74.
The computer program may arrive at the memory 72 via any suitable delivery mechanism. The delivery mechanism may be, for example, a computer- readable storage medium, a computer program product, a memory device, a record medium such as a CD-ROM or DVD, an article of manufacture that tangibly embodies the computer program 74. The delivery mechanism may be a signal configured to reliably transfer the computer program 74. The controller may propagate or transmit the computer program 74 as a computer data signal.
Although the memory 72 is illustrated as a single component it may be implemented as one or more separate components some or all of which may be integrated/removable and/or may provide permanent/semi-permanent/ dynamic/cached storage.
The blocks illustrated in the Figs 4 to 7 may represent steps in a method and/or sections of code in the computer program 74. The illustration of a particular order to the blocks does not necessarily imply that there is a required or preferred order for the blocks and the order and arrangement of the block may be varied. Furthermore, it may be possible for some steps to be omitted.
Reference has been made to cryptographic hash algorithms. Current suitable algorithms include MD5 and SHA-1 .
Reference has been made to symmetric-key encryption algorithms. Current suitable algorithms include AES-128, AES-256.
It would be a simple and straightforward modification to adapt the above described back up process (Fig 4) so that only modified sub-trees of the directory tree are processed. For example, the date/time of last back up may be stored and compared against property values of the files and folders to determine which may be created or modified since the last back up. The nodes in the directory tree structure may be flagged and traversing the directory tree structure would be selective based on the flags. Block 37 would be modified to ask whether 'all flagged child nodes are backed up?'. Block 38 would be modified to 'go to the next flagged child node that is not backed up'. Block 32 would be modified to 'go to the lowest flagged dependent leaf node'. Although embodiments of the present invention have been described in the preceding paragraphs with reference to various examples, it should be appreciated that modifications to the examples given can be made without departing from the scope of the invention as claimed.
Technically it is possible for two different documents to produce the same hash value. However, with a proper hash function the chances of such a hash collision are sufficiently small to be insignificant. .A hash value may therefore be considered to be practically globally unique.
In some embodiments, a document may be broken-up into parts for back up and each part is then processed separately as a separate document. In this case, the document may be represented as a parent node and the parts of the documents represented as dependent nodes from the parent node..
It should be appreciated that the index does not have to be produced directly from a document.
Features described in the preceding description may be used in combinations other than the combinations explicitly described.
Although functions have been described with reference to certain features, those functions may be performable by other features whether described or not. Although features have been described with reference to certain
embodiments, those features may also be present in other embodiments whether described or not.
Whilst endeavoring in the foregoing specification to draw attention to those features of the invention believed to be of particular importance it should be understood that the Applicant claims protection in respect of any patentable feature or combination of features hereinbefore referred to and/or shown in the drawings whether or not particular emphasis has been placed thereon.
I/we claim:

Claims

1. A method comprising:
descending a directory tree structure by identifying a leaf node of the directory tree structure that has not been backed up as a current node, wherein the current node has a current document that has not been backed up; and backing up the current node by:
providing the current document to an index-addressable storage; and storing a current index for the current node in a parent document for a parent node of the current node wherein the current index for the current node has been generated using the current document and has a value that uniquely represents the content of the current document.
2. A method as claimed in claim 1 , further comprising: traversing the directory tree structure by identifying as a current node a leaf node of the directory tree structure that directly or indirectly depends from the parent node and has not been backed up, wherein the current node has a current document that has not been backed up and backing up the current node.
3. A method as claimed in claim 1 or 2, further comprising: ascending the directory tree by identifying the parent node as the current node and the parent document as the current document and backing up the current node.
4. A method as claimed in claim 3, further comprising: ascending the directory tree structure only when traversing is not or no longer available.
5. A method as claimed in claim 3 or 4, further comprising: after ascending the directory tree a level, traversing the directory tree structure.
6. A method as claimed in any preceding claim, wherein the current index is a cryptographic hash value of the current document.
7. A method as claimed in any preceding claim, further comprising: encrypting the current document before providing the current document to the index- addressable storage.
8. A method as claimed in claim 7, comprising encrypting the current document using the current index.
9. A method as claimed in claim 8, comprising encrypting the current document using the current index as a symmetric key provided to a symmetric key encryption algorithm that operates on the current document.
10. A method as claimed in any preceding claim comprising, when the current node is a root node of the directory tree structure:
generating a current index for the current node using the current document, wherein the current index has a value that uniquely represents the content of the current document;
providing the current document to the index-addressable storage;
storing the current index in a master document.
1 1 . A method as claimed in claim 10, further comprising:
encrypting the master document using personal secret; and
storing encrypted master document.
12. A method as claimed in any preceding claim comprising: identifying a leaf node of the directory tree structure that has not been backed up by generating an index for the leaf node using a document of the leaf node, wherein the index has a value that uniquely represents the content of the document and determining whether the
13. A method as claimed in any preceding claim comprising: identifying a subtree of the directory tree structure that has not been backed up by generating an index for a root node of the sub-tree using a document of the root node of the sub-tree, wherein the index has a value that uniquely represents the content of the document and determining that the generated index is new.
14. A method as claimed in any preceding claim comprising: identifying a subtree of the directory tree structure that has been backed up by generating an index for a root node of the sub-tree using a document of the root node of the sub-tree, wherein the index has a value that uniquely represents the content of the document and determining that the generated index is not newly generated.
15. A method comprising:
identifying a file that has not been backed up;
providing the file and index to an index-addressable storage;
storing the index in a document associated with the folder comprising the file in the directory
16. A computer program which when loaded into a processor instructs the processor to enable the method of any one of claims 1 to 15.
17. An apparatus comprising: at least one processor and at least one memory wherein the at least one memory stores computer program instructions which, when loaded into the at least one processor, enable the apparatus to perform the method of any one of claims 1 to 15.
18. An apparatus comprising:
means for descending a directory tree structure by identifying a leaf node of the directory tree structure that has not been backed up as a current node, wherein the current node has a current document that has not been backed up; and means for backing up the current node comprising:
means for generating a current index for the current node using the current document, wherein the current index has a value that uniquely represents the content of the current document;
means for providing the current document to an index-addressable storage; and
means for storing the current index in a parent document for a parent node of the current node.
19. An apparatus comprising means for performing a method as claimed in any of claims 1 to 15.
20. An apparatus as claimed in claim 18 or 19 configured to operate as a client apparatus.
21. A storage system comprising:
one or more storage apparatus that stored a plurality of documents and a plurality of associated indexes, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document; and
an interface configured to receive an index and to provide in reply a document associated with the index,
wherein a sub-set of the plurality of documents and the plurality of associated indexes define a directory tree having internal nodes and leaf nodes, wherein any document of the sub-set comprising one or more indexes represents a respective internal node of the directory tree and the one or more indexes access documents representing child nodes of the respective internal node and
wherein documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the directory tree.
22. A storage system as claimed in claim 21 , wherein an index for any document representing a node of the directory tree has a value that uniquely represents the structure and content of the sub-tree that has the node as its root such that if the structure or content of the sub-tree where to change the document would change and the index would change.
23. A method comprising:
decrypting an encrypted master document comprising an index representing a root node of a directory tree to obtain the index;
using the index to query a database storing a plurality of documents and a plurality of associated indexes, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document;
wherein a sub-set of the plurality of documents and the plurality of associated indexes define the directory tree having internal nodes and leaf nodes, wherein any document of the sub-set comprising one or more indexes represents a respective internal node of the directory tree and the one or more indexes access documents representing child nodes of the respective internal node and
wherein documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the directory tree, and
receiving in reply a document of the sub-set comprising one or more indexes and representing a respective root node of the directory tree, wherein the one or more indexes access documents representing child nodes of the root node in the directory tree structure.
24. A method comprising:
identifying a node of a graph that represents a document that has not processed; and
processing the document by:
providing the document to an index-addressable storage; and storing an index for the provided document in a further document represented by a parent node of the identified node in the graph wherein the index has been generated based on the provided document and has a value that represents the content of the provided document.
25. A method as claimed in claim 24, wherein the graph is an acyclic graph.
26. A method as claimed in any one of claims 24 to 25 comprising identifying a node of the graph representing a document that has not been processed by generating an index for the node using a document represented by the node, wherein the index has a value that represents the content of the document and determining that the generated index is new.
27. A method as claimed in any one of claims 24 to 26 comprising: identifying a sub-graph of the graph representing a document that has not been processed by generating an index for an initial node of the sub-graph using an initial document represented by the initial node of the sub-graph, wherein the index has a value that represents the content of the initial document and determining that the generated index is new.
28. A method as claimed in any one of claims 24 to 27 comprising identifying all the sub-graphs of at least the sub-graph representing documents that have not been processed by generating an index for each initial node of the subgraphs using an initial document represented by the respective initial nodes of the sub-graphs, wherein the index has a value that represents the content of the initial documents and determining that the generated index is new.
29. A method as claimed in any one of claims 24 to 28, comprising performing an ordered processing of the documents, represented by nodes of the graph, that have not been processed by, for each document, providing the document to an index-addressable storage; and storing an index for the provided document in a further document represented by a parent node of the node representing the provided document wherein the index has been generated based on the provided document and has a value that represents the content of the provided document.
30. A method as claimed in claim 29, wherein ordered processing of the documents that have not been processed requires that a document represented by a parent node in the graph is processed only after all of the documents that are represented by dependent nodes of the parent node have been processed,
31 . A method as claimed in any one of claims 24 to 30 comprising identifying a sub-graph of the graph that has been processed by generating an index for an initial node of the sub-graph using an initial document represented by the initial node of the sub-graph, wherein the index has a value that represents the content of the initial document and determining that the generated index is not new.
32. A method as claimed in any one of claims 24 to 32, wherein if it is not possible to process the document by storing the index for the provided document in a further document represented by a parent node of the identified node in the graph, then storing the index for the provided document in a master document.
33. A method as claimed in claim 32, further comprising:
encrypting the master document using a personal secret; and
storing the encrypted master document.
34. An apparatus comprising: means for identifying a node of a graph that represents a document that has not been processed; and
means for processing the document comprising:
means for providing the document to an index-addressable storage; and
means for storing an index for the provided document in a further document represented by a parent node of the identified node in the graph wherein the index has been generated based on the provided document and has a value that represents the content of the provided document.
35. An apparatus comprising means for performing a method as claimed in any of claims 24 to 33.
36. An apparatus as claimed in claim 34 or 35 configured to operate as a client apparatus.
37. A storage system comprising:
one or more storage apparatus that stored a plurality of documents and a plurality of associated indexes, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document; and
an interface configured to receive an index and to provide in reply a document associated with the index,
wherein a sub-set of the plurality of documents and the plurality of associated indexes are represented by a graph having internal nodes and leaf nodes, wherein any document of the sub-set comprising one or more indexes represents a respective internal node of the graph and the one or more indexes access documents representing child nodes of the respective internal node and
wherein documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the graph.
38. A storage system as claimed in claim 37, wherein an index for any document representing a node of the graph has a value that uniquely represents the structure and content of the sub-graph that has the node as its root such that if the structure or content of the sub-graph where to change the document would change and the index would change.
39. A method comprising:
decrypting an encrypted master document comprising an index representing a node of a graph to obtain the index;
using the index to query a database storing a plurality of documents and a plurality of associated indexes, where each index is used to access a respective document and has a value that uniquely represents the content of the respective document;
wherein a sub-set of the plurality of documents and the plurality of associated indexes are represented by the graph having internal nodes and leaf nodes, wherein any document of the sub-set comprising one or more indexes represents a respective internal node of the graph and the one or more indexes access documents representing child nodes of the respective internal node and
wherein documents of the sub-set comprising data and not comprising one or more indexes represent leaf nodes of the graph, and
receiving in reply a document of the sub-set comprising one or more indexes and representing a respective root node of the graph, wherein the one or more indexes access documents representing child nodes of the root node in the graph.
PCT/EP2011/056067 2010-04-17 2011-04-15 Storing a directory tree structure Ceased WO2011128454A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1006442.6 2010-04-17
GB201006442A GB201006442D0 (en) 2010-04-17 2010-04-17 Storing a directory tree structure

Publications (1)

Publication Number Publication Date
WO2011128454A1 true WO2011128454A1 (en) 2011-10-20

Family

ID=42245376

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2011/056067 Ceased WO2011128454A1 (en) 2010-04-17 2011-04-15 Storing a directory tree structure

Country Status (2)

Country Link
GB (1) GB201006442D0 (en)
WO (1) WO2011128454A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109308284A (en) * 2018-09-28 2019-02-05 中国平安财产保险股份有限公司 Report menu generating method, device, computer equipment and storage medium
CN111339042A (en) * 2020-03-26 2020-06-26 佛山中科芯蔚科技有限公司 Data operation processing method and system and scheduling server
CN111984941A (en) * 2020-06-29 2020-11-24 深圳亿络科技有限公司 File processing method and device, terminal equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006039492A2 (en) * 2004-09-30 2006-04-13 Emc Corporation File index processing
US20070112895A1 (en) * 2005-11-04 2007-05-17 Sun Microsystems, Inc. Block-based incremental backup
US20080104146A1 (en) * 2006-10-31 2008-05-01 Rebit, Inc. System for automatically shadowing encrypted data and file directory structures for a plurality of network-connected computers using a network-attached memory with single instance storage
US20080313236A1 (en) * 2007-06-15 2008-12-18 Bala Vijayakumar Process for cataloging data objects backed up from a content addressed storage system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006039492A2 (en) * 2004-09-30 2006-04-13 Emc Corporation File index processing
US20070112895A1 (en) * 2005-11-04 2007-05-17 Sun Microsystems, Inc. Block-based incremental backup
US20080104146A1 (en) * 2006-10-31 2008-05-01 Rebit, Inc. System for automatically shadowing encrypted data and file directory structures for a plurality of network-connected computers using a network-attached memory with single instance storage
US20080313236A1 (en) * 2007-06-15 2008-12-18 Bala Vijayakumar Process for cataloging data objects backed up from a content addressed storage system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109308284A (en) * 2018-09-28 2019-02-05 中国平安财产保险股份有限公司 Report menu generating method, device, computer equipment and storage medium
CN109308284B (en) * 2018-09-28 2023-09-19 中国平安财产保险股份有限公司 Report menu generation method and device, computer equipment and storage medium
CN111339042A (en) * 2020-03-26 2020-06-26 佛山中科芯蔚科技有限公司 Data operation processing method and system and scheduling server
CN111339042B (en) * 2020-03-26 2024-03-01 北京快映互娱传媒有限公司 Data operation processing method, system and scheduling server
CN111984941A (en) * 2020-06-29 2020-11-24 深圳亿络科技有限公司 File processing method and device, terminal equipment and readable storage medium

Also Published As

Publication number Publication date
GB201006442D0 (en) 2010-06-02

Similar Documents

Publication Publication Date Title
US11709948B1 (en) Systems and methods for generation of secure indexes for cryptographically-secure queries
US11726993B1 (en) Systems and methods for cryptographically-secure queries using filters generated by multiple parties
Cash et al. Dynamic proofs of retrievability via oblivious RAM
US9703981B1 (en) Mobile device data encryption
CN100524153C (en) Additional hash functions in content-based addressing
US20130046974A1 (en) Dynamic symmetric searchable encryption
KR20190079324A (en) Method and system for enhancing integrity of batabase based on the block chain system
CN103119594A (en) Searchable encryption processing system
EP2652646A1 (en) Distributed file systems
KR101801865B1 (en) Method for transporting relational data
CN112532650A (en) Block chain-based multi-backup safe deletion method and system
US20200042497A1 (en) Distributed ledger system
GB2567146A (en) Method and system for secure storage of digital data
Rizomiliotis et al. ORAM based forward privacy preserving dynamic searchable symmetric encryption schemes
CN102033924A (en) Data storage method and system
CN114416720B (en) Efficient, flexible and verifiable multi-attribute range retrieval method and system in cloud environment
CN115225409A (en) Cloud data safety deduplication method based on multi-backup joint verification
Handa et al. A cluster based multi-keyword search on outsourced encrypted cloud data
WO2011128454A1 (en) Storing a directory tree structure
CN107094075A (en) A kind of data block dynamic operation method based on convergent encryption
CN107315539A (en) A kind of date storage method and data extraction method
KR20160050930A (en) Apparatus for Processing Transaction with Modification of Data in Large-Scale Distributed File System and Computer-Readable Recording Medium with Program
US12254108B2 (en) Disallowing reads on files associated with compromised data encryption keys
Aritomo et al. A privacy-preserving similarity search scheme over encrypted word embeddings
JP5174255B2 (en) Storage service providing apparatus, system, service providing method, and service providing program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11716867

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11716867

Country of ref document: EP

Kind code of ref document: A1