[go: up one dir, main page]

WO2008014004A2 - Load-balanced distributed authentication structures - Google Patents

Load-balanced distributed authentication structures Download PDF

Info

Publication number
WO2008014004A2
WO2008014004A2 PCT/US2007/017046 US2007017046W WO2008014004A2 WO 2008014004 A2 WO2008014004 A2 WO 2008014004A2 US 2007017046 W US2007017046 W US 2007017046W WO 2008014004 A2 WO2008014004 A2 WO 2008014004A2
Authority
WO
WIPO (PCT)
Prior art keywords
nodes
node
dag
distributed
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2007/017046
Other languages
French (fr)
Other versions
WO2008014004A3 (en
Inventor
Michael T. Goodrich
Roberto Tamassia
Nikolaos Triandopoulos
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Brown University
Original Assignee
Brown University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Brown University filed Critical Brown University
Publication of WO2008014004A2 publication Critical patent/WO2008014004A2/en
Publication of WO2008014004A3 publication Critical patent/WO2008014004A3/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1061Peer-to-peer [P2P] networks using node-based peer discovery mechanisms
    • H04L67/1065Discovery involving distributed pre-established resource-based relationships among peers, e.g. based on distributed hash tables [DHT] 

Definitions

  • the teachings in accordance with the exemplary embodiments of this invention relate generally to peer-to-peer networks and, more specifically, relate to authenticating membership queries in a peer-to-peer network.
  • An overlay peer-to-peer (p2p) network is a structure imposed on a subset of machines from an underlying larger network.
  • p2p networks the purpose of such p2p networks is to store and share a collection of data amongst the computers owned by a group of like-minded users. Any pair of machines can communicate if one of them is given the network address (or pointer) for the other machine.
  • the overlay network is defined by a graph that specifies which machines are linked by these pointers, and it should provide algorithms for storing and locating data of interest in the overlay network.
  • a method for constructing a distributed authentication structure includes: mapping a plurality of first nodes of a distributed network to a plurality of source nodes of a directed acyclic graph (DAG) structure, wherein each first node stores at least one data element of a plurality of data elements, wherein the DAG structure further includes a plurality of sink nodes and zero or more intermediate nodes disposed between the plurality of source nodes and the plurality of sink nodes, wherein a verification path exists between each source node and each sink node, wherein the DAG structure is traffic load balanced such that all nodes of the DAG structure are accessed with a substantially uniform distribution; mapping zero or more second nodes of the distributed network to the zero or more intermediate nodes; and mapping a plurality of third nodes of the distributed network to the plurality of sink nodes, wherein the plurality of third nodes each store a signed digest for the plurality of data elements.
  • DAG directed acyclic graph
  • a communication network includes: a plurality of first nodes corresponding to a plurality of source nodes of a directed acyclic graph (DAG) structure, wherein each first node stores at least one data element of a plurality of data elements, wherein the DAG structure further includes a plurality of sink nodes and zero or more intermediate nodes disposed between the plurality of source nodes and the plurality of sink nodes, wherein a verification path exists between each source node and each sink node, wherein the DAG structure is traffic load balanced such that all nodes of the DAG structure are accessed with a substantially uniform distribution; zero or more second nodes corresponding to the zero or more intermediate nodes; and a plurality of third nodes corresponding to the plurality of sink nodes, wherein the plurality of third nodes each store a signed digest for the plurality of data elements.
  • DAG directed acyclic graph
  • FIG. 1 illustrates a distributed acyclic graph (DAG) tree having six nodes
  • FIG. 2 depicts an exemplary hashing structure
  • FIG. 3 illustrates another exemplary hashing structure
  • FIG. 4 shows another exemplary hashing structure
  • FIG. 5 shows an exemplary block structure
  • FIG. 6 shows a system in which the exemplary embodiments of the invention may be employed.
  • FIG. 7 depicts a flowchart illustrating one non-limiting example of a method for practicing the exemplary embodiments of this invention.
  • One challenging problem with such networks is that of maintaining correct functionality in the presence of faulty or malicious nodes that actively seek to disrupt the network.
  • One critical dimension of the problem which corresponds to a fundamental security property in data management, is data authentication, i.e., the ability to verify that the retrieved data — distributively stored in the network and shared by the peer nodes — is authentic.
  • data authentication i.e., the ability to verify that the retrieved data — distributively stored in the network and shared by the peer nodes — is authentic.
  • This problem is important since some overlay networks, in particular peer-to- peer ( ⁇ 2p) networks, do not impose restrictions on who may become a member and thus are vulnerable to attacks against data integrity and authenticity. Due tot his, contents may be maliciously modified imposing a serious security threat. While faulty nodes are trouble enough, adversarial p2p nodes can be especially troublesome.
  • adversaries may respond with data that appears to be a file of interest (e.g., a video of a scientific lecture or a data file describing a DNA sequence), but is instead of degraded quality, virus-infected, maliciously altered, contains incorrect material or, simply, is not up-to- date.
  • a file of interest e.g., a video of a scientific lecture or a data file describing a DNA sequence
  • distributed authentication structures are considered, that is, constructions over distributed (e.g., p2p) networks that can provide efficient and secure distributed data authentication even in the presence of faulty and misbehaving network nodes according to some adversarial model. That is, the authentication of contents distributed, stored and accessed over a p2p network is examined. With respect to content protection, it is especially desirable to detect content forgery and old-data replay attacks. Among the design considerations, special attention is placed on the design of authentication schemes that achieve load balance. That is, it is desirable that the authentication structures evenly distribute the load due to authentication, so that hot-spots are reduced or eliminated.
  • DHTs distributed hash tables
  • DHTs distributed hash tables
  • n the number of nodes of the DHT.
  • a network may support queries using O(logr ⁇ ) messages and ⁇ 9(logn) words of memory corresponding to the O(log «) degree of the underlying graph.
  • the Merkle tree is a widely-used scheme in security applications and cryptographic constructions for signing and certifying members of a large data set.
  • the idea is to use a tree and a cryptographic collision-resistant hash function to produce a short cryptographic description of the data set.
  • the data set can then be certified by signing or publishing this short description.
  • the Merkle tree is the simplest form for realizing what is known as the signature amortization technique, which is the state-of-the-art methodology for efficient data authentication. That is, using cryptographic techniques one signature is amortized over a large collection of data elements owned by an individual data source.
  • signature refreshing where individually signed pieces of data along with a time-stamp (denoting the freshness of the data) are periodically resigned.
  • time-stamp denoting the freshness of the data
  • old invalid instances of the data cannot be considered as authentic since they do not belong to the current time quantum (i.e., they do not carry the most up-to-date time- stamp).
  • signature refreshing is used to solve the play-back attack, this generally leads to inefficient schemes: the re-signing overhead is linear on the number of items currently existing in the system owned by a source 5 with a distinct (signing) identity.
  • signature amortization one can achieve efficiency in data authentication since the cost due to signature refreshing is constant with respect to the data size.
  • Tamassia and Triandopoulos introduced a distributed data authentication model in which both query answering and cryptographic proofs are distributed over a distributed (e.g., p2p) network using a distributed version of a Merkle tree for content authentication. See R. Tamassia and Triandopoulos, "Efficient content authentication over distributed hash tables," Technical Report, Center for Geometric Computing, Brown University, 2005. This new model of authentication captures the security needs of many distributed systems or emerging applications built on them, and is useful in peer-to-peer security.
  • any distributed hash table Over any distributed location system — for instance, any distributed hash table — and using only the basic object-location operation, they realized a distributed version of a Merkle tree, which in turn yields distributed versions of numerous security protocols and cryptographic constructions based on it.
  • n nodes In a totally distributed setting over a peer-to-peer network with n nodes, they show that one can efficiently authenticate content membership in a fully dynamic set of m data elements in (2(log «log/w) time using O(mlogm) storage, with similar complexities for supporting insertions and deletions.
  • Their construction is also used to design an efficient authenticated distributed hash table (ADHT), an authentication structure that does not depend on the implementation of the distributed location system, which allows them to leverage existing peer-to-peer architectures. This technique uses signature amortization and, thus, provides a solution.
  • ADHT efficient authenticated distributed hash table
  • a DAG is a directed graph with no directed cycles.
  • a DAG comprises a plurality of nodes including at least one source or source node and at least one sink or sink node.
  • a source node has no incoming edges (e.g., connections) whereas a sink node has no outgoing edges (e.g., connections).
  • indeg(y) denote the in-degree of a node v, i.e., the number of incoming edges of v
  • outdeg(y) denote the out-degree of a node v, i.e., the number of outgoing edges of v.
  • source nodes may also be referred to as- leaf nodes or leaves and are usually shown at the bottom of a visual depiction (e.g., a graph illustrating the structure of the DAG tree).
  • sink nodes may also be referred to as root nodes or roots and are usually shown at the top of the DAG tree graph.
  • nodes that are neither leaf nodes nor root nodes may be referred to as tree nodes (i.e., intermediate nodes).
  • FIG. 1 illustrates a DAG tree 10 having six nodes.
  • Node A is the only root node.
  • Nodes B and C are tree nodes.
  • Nodes D, E and F are leaf nodes.
  • the direction is indicated as generally proceeding from bottom to top, this is more of a convention than a rule.
  • communication between physical nodes assigned to the respective nodes of the DAG tree 10 may be bilateral instead of the depicted unilateral directions.
  • the illustrated direction may generally be viewed as indicating access/retrieval since it is customary to assume that data is stored at source/leaf nodes and accessed or retrieved from sink/root nodes (i.e., queries on the stored data are sent from the sink/root nodes).
  • Nodes of a same relative position within a DAG tree may be said to belong to a same level.
  • Levels are generally numbered from 0 to k, where source/leaf nodes are located at level 0 and sink/root nodes are located at level k.
  • the leaf nodes nodes D, E and F
  • the root node node A
  • Nodes within a DAG tree are often discussed in relation to one another.
  • a parent is considered to be a node of a higher level that has an edge with (i.e., is connected to) the node in question.
  • node B is considered a parent of nodes D and E.
  • a child is considered to be anode of a lower level that has an edge with (i.e., is connected to) the node in question.
  • nodes B and C are considered children of node A.
  • source/leaf nodes have no children while sink/root nodes have no parents.
  • intermediate/tree nodes have both at least one child and at least one parent.
  • a first node may be said to be a. predecessor of a second node if the first node is a child of the second node.
  • a second node may be said to be a successor of a first node if the second node is a parent of the first node.
  • Results are presented with respect to the content authentication problem in ⁇ 2p storage systems.
  • An exemplary efficient authenticated p2p data structure is provided that implements a distributed dictionary with substantially uniform workload.
  • the data structure may be viewed as a load-balanced distributed version of the Merkle hash tree and can be used as a building block for the development of other authenticated distributed data structures.
  • This construction makes use of a distributed hash table (DHT) to store certification information. Note that any suitable DHT may be utilized in conjunction with the exemplary embodiments of the invention.
  • DHT distributed hash table
  • a DHT structure that provides assurances that this information can be located in the presence of faulty or adversarial nodes. That is, authenticating the routing of DHTs over p2p networks, a requirement which is orthogonal to the content authentication security problem, is an important additional property in p2p security.
  • the constructions are built over the minimal functionality offered by DHTs and without making any specific assumptions (thus, without the design depending on any particular DHT implementation) generality and additional security are achieved: one can simply use a DHT implementation that enjoys any additional properties, such as routing authentication, as a non-limiting example.
  • the performance and security properties of the distributed authentication structure are summarized as follows. Given a distributed hash table H with n network nodes such that H supports object location in O( ⁇ og ⁇ ) time using O(logn) storage overhead per network node, one can build over H a distributed authentication structure for storing m data items such that:
  • the data structure uses O(m log m) space.
  • the data structure uses public information consisting of O(log n) signatures.
  • authenticating content distributively stored at network nodes through the use of a distributed hash table (D ⁇ T) is discussed. That is, it is desirable to show that data files claimed to have been added by a user were really put in the p2p network by that user and have not been modified.
  • D ⁇ T distributed hash table
  • the D ⁇ T stores key- value pairs of the type (k,x) , where keys are unique identifiers and values are associated with keys. These pairs are inserted through operation put ⁇ -,-) , where put(Jc,x) inserts and stores data item x under key k . With respect to data retrieval, the D ⁇ T supports query get(Jc) , which returns the value associated with key k . Both operations may be implemented on top of the lower-level object location operation supported by the underlying p2p system, where given an object identifier (e.g., an IP address) the network node storing the object is returned.
  • object identifier e.g., an IP address
  • the fundamental authentication problem that arises in this case is data integrity, namely ensuring that any query returns the correct, authentic corresponding value. So, if (/fc,x) is the currently valid pair that has been (most recently) inserted in the system, in terms of security and information assurance, the property that (at least) must be satisfied is that any query on key k must return the correct corresponding value x .
  • One approach to the authentication of queries is to individually sign each item stored in the data structure. Namely, when data source S wishes to add (A:, x) , it computes the signature ⁇ of the pair (k,x) using its private key and inserts (A ⁇ , (C ⁇ , ⁇ )) into the data structure.
  • a public key infrastructure is assumed for the owners of stored data items in the network and the users of the storage system.
  • a query for key k now returns the pair ( ⁇ , x) , where the signature ⁇ allows one to verify whether x is the valid answer.
  • this "sign-all" approach introduces significant performance overhead and allows for replay attacks for old values.
  • signature freshness that is, a mechanism that ensures that only recent signatures are used to validate answers to queries.
  • This can be achieved by introducing time-stamps in the signed values, so that only recent (up to a convention, e.g., within a specified time quantum) valid signatures are accepted.
  • the signature refreshing cost is linear in the number of the current items in the DHT: if m items exist in the system that are owned by an individual data source, at any fixed time quantum, the signature refreshing cost is O(m) , since all valid key-value pairs need to be re-signed.
  • signature amortization can be used.
  • the technique consists of signing only one, specially designed for the authentication purposes often through the use of a Merkle tree (or extensions of it), digest of the entire collection of stored data items owned by the same source. This way, signatures can be refreshed at a constant computational overhead. Signature amortization may be a valid avenue for achieving efficiency with respect to the overheads due to signature-refreshing.
  • p2p systems achieve load-balancing, meaning that data should be accessed without creating hot-spots, i.e., network nodes accessed with skewed, rather than uniform, distribution.
  • hot-spots i.e., network nodes accessed with skewed, rather than uniform, distribution.
  • realizing efficient authentication schemes for p2p networks that preserve load-balancing is a particularly challenging task. Note that the "sign-all" solution, although inefficient, achieves load- balancing in a trivial way.
  • DHTs are presented that significantly outperform the "sign-all” solution.
  • This distributed authentication structure can be built on top of the core functionality of any existing DHT. That is, it uses the basic DHT functionality locate(k) , which returns the network node
  • locate(K) takes O( ⁇ ogri) time, where n is the number of network nodes.
  • the scheme extends the underlying DHT by supporting authenticated versions of operations put and get while preserving efficiency and load balancing.
  • Exemplary embodiments of the new scheme for realizing a distributed authentication structure have two main design goals: signature amortization, which will aim to incur low signature refreshing cost, and load-balancing, which will attempt to diminish or not create any hot spots with respect to the distribution of traffic in the underlying p2p network.
  • An exemplary authentication structure is used that is similar to Merkle trees and, additionally, a technique that uses redundancy such that accessing the structure can be done in many different ways. By choosing the particular "accessing path" in a randomized fashion one effectively achieves a reduction in or elimination of hot- spots in accessing the structure.
  • the redundancy used by the new structure slightly increases the storage needs and the computational overhead due to updates, but both are increased only by a logarithmic factor as will be seen below.
  • the data structure can achieve signature amortization by systematically applying a hash function (e.g., a collision-resistant hash function) over the data items stored in the DHT.
  • a hashing scheme is a directed acyclic graph (DAG) G where nodes in G correspond to hash values; in this case, the hash value of node v in G is computed as the hash of the concatenation of the hash values corresponding to the predecessor nodes of v . It may be desirable to have the concatenation formed according to a fixed ordering.
  • Source nodes store the hash value of a data item (e.g., the hash of a key- value pair) and sink nodes store the digest of the entire collection of data items (i.e., copies of a same digest that comprises the digest of the entire collection of data items).
  • a data item can be verified to be authentic by providing a verification path in DAG G from the data item to a digest (e.g., comprising or consisting of hash values needed for re-computing the digest given the data item) and by verifying the signed digest to be authentic.
  • a hashing scheme may extend to DAGs using a Merkle tree, for example.
  • DAG G is distributed to the network nodes of the underlying DHT by appropriately indexing the hash values and storing them as special key- value pairs.
  • the structure of the hashing scheme G in the DHT may be preserved, for example, by having each network node storing the hash value of node v in G store the keys that can be used to retrieve the hash values of the successor nodes in G .
  • the hashing scheme consists of k + 1 levels, where at level 0 one has the m leaves of the m trees corresponding to the m data elements.
  • Each level i , O ⁇ i ⁇ k, consists of m graph nodes, each of which is the parent of two nodes of level / — 1.
  • the parent-child relation at each level is defined in a systematic way such that graph nodes at level k are the roots of m perfect binary trees over the data set and such that each graph node at level i is the root of a perfect binary subtree of size exactly 2' +1 — 1.
  • nodes of any level are numbered with ⁇ 0, 1, ..., m - 1 ⁇ and from left to right, one has (see FIG. 2):
  • FlG. 2 depicts the exemplary hashing structure 20. Using redundancy, m
  • V V, , ..., Vg .
  • graph G is a DAG with m source nodes and m sink nodes,
  • h be a cryptographic collision-resistant hash function.
  • Graph G is used as an exemplary hashing scheme to compute m digests of the data elements.
  • d(z)) . Else (v is at level 0 ), define d(y) h(k ⁇ x) , where (k, x) is the data item associated with node v .
  • d(y) is the digest of the set of the data items that the leaves of the subtree defined by v in G correspond to and it is hierarchically computed according to the structure of hashing scheme G . Because of symmetry, the nqdes of .G at level / > 0 store 2 k ⁇ ' distinct digests. Thus, all the nodes of level k (i.e., the sink/root nodes) store the same digest. The source of the data items signs this digest and makes it (the digest and its signature) available as public information.
  • Each node v of G is indexed by a unique id and is inserted in the DHT using this id as the key and the triplet ( ⁇ , v, , V 2 ) as the value, where d is the digest of v , and v, , v 2 are the (ids of the) children of v in G (or special null values if v is a leaf node in G ).
  • the following exemplary method to index the nodes of G may be used.
  • nodes are given ids that encode their position in G .
  • the j th node of level / has the id /(/,/) .
  • f is an injective function (thus, ids are unique).
  • the parent-child relation at each level is defined in a systematic way such that graph nodes at level k are the roots of m perfect binary trees over the data set. Since the exemplary hashing structure 30 is configured around each node having three parents and/or children, this configuration could be represented using a three dimensional model. As a non-limiting example, and in conjunction with such a three dimensional model, the nodes could be identified using three coordinates indicative of each node's relative position within the three dimensional model.
  • the exemplary hashing structures 20, 30 shown in FIGS.2 and 3 are uniformly load-balanced.
  • nodes of level 0 have three parents
  • nodes of level 1 have three children and two parents
  • nodes of level 2 have two children and two parents
  • nodes of level 3 have two children.
  • the exemplary hashing structures 20, 30, 40 shown in FIGS.2, 3 and 4 are presented as non-limiting examples.
  • the exemplary embodiments of the invention may utilize any suitable structure (e.g., hashing structure) or configuration that is traffic load balanced and includes a plurality of source nodes and a plurality of sink nodes with a plurality of verification pathways (e.g., many equivalent verification paths from a source node to a corresponding data digest, such as one stored at the plurality of sink nodes).
  • FIG. 5 shows an exemplary block structure 50.
  • the static structure can be dynamized using Overmars' technique: blocks of consecutive logarithmic size are treated separately.
  • Block B 1 has data digest h t , which is stored at the 2' nodes corresponding to the roots of B 1 . Note that each block may be seen to correspond to a different DAG, similar to the exemplary structures shown in FIGS.2-4, with each having a different size.
  • the system 100 includes a client 102 communicating with a network (e.g., a p2p network 104).
  • the client 102 comprises an electronic device capable of communication with the p2p network 104.
  • the client 102 may comprise at least one data processor, at least one memory, a transceiver, and a user interface comprising a user input and a display device.
  • the client 102 and/or p2p network 104 includes one or more components capable of implementing the exemplary embodiments of the invention.
  • an encryption component may be employed.
  • the encryption component may be a separate entity (e.g. an integrated circuit, an Application Specific Integrated Circuit or ASIC) or may be integrated with other components (e.g. a program run by a data processor, functionality enabled by a data processor).
  • the client 102 and the network e.g., the p2p network 104 may utilize a computer program product to implement the exemplary embodiments of the invention.
  • the computer program product may comprise program instructions tangibly embodied on a computer readable medium, the execution of which result in operations comprising the exemplary embodiments of the invention.
  • node W initiates a randomized process for generating a verification path for (k,x) .
  • W flips a coin" to determine the next node in the path, between its two parents of level 1 in G , to contact next (through a location operation on the corresponding id of the parent node, first).
  • each network node storing the digest d of node v in G knows the id of v (e.g., knows v 's position in G ). Accordingly, in some exemplary embodiments, (a non-leaf in G ) network node W contacts and retrieves the corresponding digest of the sibling node in the verification path of the previously visited network node. In further exemplary embodiments, instead of the ids of the children nodes, one can actually store their corresponding digests when embedding the hashing scheme into the p2p network. In such a manner, this last step can be avoided.
  • a network node V at level J randomly chooses the next node in the verification path network node to be contacted, independently and with probability 1/2.
  • a query results in a verification path of length O(log m) , using O(logw) location operations (thus, resulting in ⁇ 9(log w log «) computation and communication costs).
  • O(logw) location operations thus, resulting in ⁇ 9(log w log «) computation and communication costs.
  • the verification path is returned by the DHT and, given this path, one can authenticate the answer of the operation by processing the verification path and verifying the publicly available signed digest.
  • the total storage required by the authentication structure is O ⁇ m log m) .
  • a caching technique used in Tamassia and Triandopoulos can be also be used in this case to improve the creation of the verification paths.
  • the data structure described in the previous section is generally static. To support updates, it can be modified using a dynamization technique, which allows one to transform a static data structure into a corresponding dynamic structure.
  • the idea is to partition a data set of size m into sequence of O(log m) blocks, where the size of each block is twice the size of the previous block, and to completely rebuild blocks after updates, as necessary (see, e.g., FIG. 5).
  • This exemplary technique may be applied to support insertions of data items with new keys.
  • mapping e.g., a hash function
  • S is partitioned into [log/wj+l blocks S Q ,S x ,...,S k , each a subset of 5 , according to the weights of the bits of m , e.g.,
  • S 1 ⁇ b, ⁇ 2' .
  • G(O , for 0 ⁇ / ⁇ k denote the hashing DAG described in the previous section for the elements of block iS ( .
  • DAG G(Z) has b, - 2' nodes.
  • DAGs G(O), G(I),..., G(k) are used separately as authentication structures.
  • each DAG G(O is distributed over the network nodes as before and for a data item in block S 1 , the corresponding verification path in DAG G(O is retrieved using O( ⁇ ) location operations. Accordingly, ⁇ 9(log m) W
  • expired items may be removed during the construction of a new DAG G(t) .
  • This may entail minor modifications to the above insertion algorithm.
  • Lemma 1 An insertion of a data item in a dynamic graph authentication structure G of size m distributed over a network of size n takes 6>(log m log ri) expected amortized time.
  • O(k2 k ) O ⁇ m log ni) , which gives an amortized per insertion location cost of O(log rri) .
  • a location operation in the underlying DHT takes O(log ⁇ ) expected time to perform.
  • An optimal distributed hash table over n network nodes is where objects are located in 0(log «) time at O(log «) per-network node storage overhead.
  • Theorem 1 There exists a distributed authentication structure for storing m data items over an optimal distributed hash table of size n , such that:
  • New data items can be inserted at (9(log n log rri) expected amortized time complexity.
  • the public information has size O(logn) .
  • the signature freshness cost is O(log rri) .
  • Described above are exemplary methods, computer programs, devices, networks and systems for implementing the exemplary embodiments of the invention.
  • Exemplary methods are proposed for enhancing information assurance in distributed adversarial environments by designing a new distributed authentication structure.
  • the scheme is generally time and space efficient and achieves load-balancing.
  • the scheme provides for efficient authentication of contents distributed and stored, for example, over a DHT and, in particular, is secure against replay attacks.
  • a method as described above, wherein the signed digest comprises a hash value of a distributed authentication structure corresponding to the DAG structure.
  • a method as described above, wherein the distributed authentication structure is based on a cryptographic collision-resistant hash function.
  • the distributed authentication structure comprises a plurality of Merkle trees.
  • the distributed authentication structure comprises a corresponding hash value for each node of the DAG structure, wherein the corresponding hash value for each first node comprises a hash of the at least one data element stored at the respective first node, wherein the corresponding hash value for each second node comprises a concatenation of corresponding hash values for predecessor nodes of the respective second node.
  • the distributed authentication structure comprises a layered butterfly DAG equipped with a hierarchical hashing scheme, wherein the plurality of source nodes of the DAG are associated with the plurality of data elements.
  • a method as in any described above further comprising: authenticating the corresponding answer by hashing along the answer and the proof to obtain a digest and comparing the resulting digest with the signed digest.
  • the distributed authentication structure comprises a plurality of blocks with each block realized as a layered butterfly DAG equipped with a hashing scheme, the method further comprising: updating the distributed authentication structure and corresponding hash values by combining one or more pairs of blocks after a new data element is inserted into the plurality of data elements.
  • a method as in any described above further comprising: in response to a query, providing a corresponding answer and a proof, wherein the corresponding answer comprises at least one corresponding data element, wherein the proof comprises proof information indicative of a randomly-selected verification path in the DAG structure from the source node storing the at least one corresponding data element to a sink node storing the signed digest.
  • a method as described above further comprising: authenticating the corresponding answer by comparing the proof with the signed digest.
  • each data element of the plurality of data elements comprises a key-value pair.
  • the plurality of source nodes consists of m source nodes
  • the plurality of sink nodes consists of m sink nodes
  • the DAG structure comprises m trees that share the m source nodes.
  • a computer program or computer program product includes program instructions embodied on a tangible computer- readable medium, execution of the program instructions resulting in operations including: mapping a plurality of first nodes of a distributed network to a plurality of source nodes of a directed acyclic graph (DAG) structure, wherein each first node stores at least one data element of a plurality of data elements, wherein the DAG structure further comprises a plurality of sink nodes and zero or more intermediate nodes disposed between the plurality of source nodes and the plurality of sink nodes, wherein a verification path exists between each source node and each sink node, wherein the DAG structure is traffic load balanced such that all nodes of the DAG structure are accessed with a substantially uniform distribution; mapping zero or more second nodes of the distributed network to the zero or more intermediate nodes; and mapping a plurality of third nodes of the distributed network to the plurality of sink nodes, wherein the plurality of third nodes each store a signed digest for the pluralit
  • a computer program as described above, wherein the signed digest comprises a hash value of a distributed authentication structure corresponding to the DAG structure.
  • a computer program as described above, wherein the distributed authentication structure is based on a cryptographic collision-resistant hash function.
  • a computer program as in any described above, wherein the distributed authentication structure comprises a plurality of Merkle trees.
  • the distributed authentication structure comprises a corresponding hash value for each node of the DAG structure, wherein the corresponding hash value for each first node comprises a hash of the at least one data element stored at the respective first node, wherein the corresponding hash value for each second node comprises a concatenation of corresponding hash values for predecessor nodes of the respective second node.
  • a computer program as in any described above wherein the distributed authentication structure comprises a layered butterfly DAG equipped with a hierarchical hashing scheme, wherein the plurality of source nodes of the DAG are associated with the plurality of data elements.
  • execution of the program instructions resulting in operations further comprising: authenticating the corresponding answer by hashing along the answer and the proof to obtain a digest and comparing the resulting digest with the signed digest.
  • execution of the program instructions resulting in operations further comprising: authenticating the corresponding answer by comparing the proof with the signed digest.
  • each data element of the plurality of data elements comprises a key-value pair.
  • the plurality of source nodes consists of m source nodes
  • the plurality of sink nodes consists of m sink nodes
  • the DAG structure comprises m trees that share the m source nodes.
  • the nodes of the DAG structure are partitioned into k+ ⁇ levels, wherein m — n k , wherein n > 2.
  • a communication network includes: a plurality of first nodes corresponding to a plurality of source nodes of a directed acyclic graph (DAG) structure, wherein each first node stores at least one data element of a plurality of data elements, wherein the DAG structure further comprises a plurality of sink nodes and zero or more intermediate nodes disposed between the plurality of source nodes and the plurality of sink nodes, wherein a verification path exists between each source node and each sink node, wherein the DAG structure is traffic load balanced such that all nodes of the DAG structure are accessed with a substantially uniform distribution; zero or more second nodes corresponding to the zero or more intermediate nodes; and a plurality of third nodes corresponding to the plurality of sink nodes, wherein the plurality of third nodes each store a signed digest for the plurality of data elements.
  • DAG directed acyclic graph
  • a communication network as described above, wherein the signed digest comprises a hash value of a distributed authentication structure corresponding to the DAG structure.
  • the distributed authentication structure comprises a corresponding hash value for each node of the DAG structure, wherein the corresponding hash value for each first node comprises a hash of the at least one data element stored at the respective first node, wherein the corresponding hash value for each second node comprises a concatenation of corresponding hash values for predecessor nodes of the respective second node.
  • the signed digest comprises a hash value of a block of the distributed authentication structure.
  • the distributed authentication structure comprises a plurality of blocks with each block realized as a layered butterfly DAG equipped with a hashing scheme, wherein the distributed authentication structure and corresponding hash values are configured to be updated by combining one or more pairs of blocks after a new data element is inserted into the plurality of data elements.
  • a communication network as described above, wherein the nodes of the DAG structure are partitioned into k+l levels, wherein m n k , wherein n > 2.
  • various exemplary embodiments of the invention can be implemented in different mediums, such as software, hardware, logic, special purpose circuits or any combination thereof.
  • some aspects may be implemented in software which may be run on a computing device, while other aspects may be implemented in hardware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Multi Processors (AREA)
  • Computer And Data Communications (AREA)

Abstract

The exemplary embodiments of the invention describe methods, computer programs, apparatus, systems and networks that enable distributed content authentication through the use of load-balance distributed authentication structures for realizing signature amortization. These features, namely load balancing and signature amortization, allow for high degrees of efficiency and scalability, which are important properties for applications built over networks, such as p2p networks and other distributed networks. One exemplary method for constructing a distributed authentication structure includes: mapping a plurality of first nodes of a distributed network to a plurality of source nodes of a directed acyclic graph (DAG) structure (10), wherein each first node stores at least one data element of a plurality of data elements, wherein the DAG structure further includes a plurality of sink nodes and zero or more intermediate nodes disposed between the plurality of source nodes and the plurality of sink nodes, wherein a verification path exists between each source node and each sink node, wherein the DAG structure (10) is traffic load balanced such that all nodes of the DAG structure are accessed with a substantially uniform distribution; mapping zero or more second nodes of the distributed network to the zero or more intermediate nodes; and mapping a plurality of third nodes of the distributed network to the plurality of sink nodes, wherein the plurality of third nodes each store a signed digest for the plurality of data elements.

Description

LOAD-BALANCED DISTRIBUTED AUTHENTICATION STRUCTURES
TECHNICAL FIELD:
[0001] The teachings in accordance with the exemplary embodiments of this invention relate generally to peer-to-peer networks and, more specifically, relate to authenticating membership queries in a peer-to-peer network.
BACKGROUND:
[0002] An overlay peer-to-peer (p2p) network is a structure imposed on a subset of machines from an underlying larger network. Generally, the purpose of such p2p networks is to store and share a collection of data amongst the computers owned by a group of like-minded users. Any pair of machines can communicate if one of them is given the network address (or pointer) for the other machine. The overlay network is defined by a graph that specifies which machines are linked by these pointers, and it should provide algorithms for storing and locating data of interest in the overlay network.
SUMMARY:
[0003] In an exemplary aspect of the invention, a method for constructing a distributed authentication structure includes: mapping a plurality of first nodes of a distributed network to a plurality of source nodes of a directed acyclic graph (DAG) structure, wherein each first node stores at least one data element of a plurality of data elements, wherein the DAG structure further includes a plurality of sink nodes and zero or more intermediate nodes disposed between the plurality of source nodes and the plurality of sink nodes, wherein a verification path exists between each source node and each sink node, wherein the DAG structure is traffic load balanced such that all nodes of the DAG structure are accessed with a substantially uniform distribution; mapping zero or more second nodes of the distributed network to the zero or more intermediate nodes; and mapping a plurality of third nodes of the distributed network to the plurality of sink nodes, wherein the plurality of third nodes each store a signed digest for the plurality of data elements.
[0004] In another exemplary aspect of the invention, a communication network includes: a plurality of first nodes corresponding to a plurality of source nodes of a directed acyclic graph (DAG) structure, wherein each first node stores at least one data element of a plurality of data elements, wherein the DAG structure further includes a plurality of sink nodes and zero or more intermediate nodes disposed between the plurality of source nodes and the plurality of sink nodes, wherein a verification path exists between each source node and each sink node, wherein the DAG structure is traffic load balanced such that all nodes of the DAG structure are accessed with a substantially uniform distribution; zero or more second nodes corresponding to the zero or more intermediate nodes; and a plurality of third nodes corresponding to the plurality of sink nodes, wherein the plurality of third nodes each store a signed digest for the plurality of data elements.
BRIEF DESCRIPTION OF THE DRAWINGS:
[0005] The foregoing and other aspects of embodiments of this invention are made more evident in the following Detailed Description, when read in conjunction with the attached Drawing Figures, wherein:
[0006] FIG. 1 illustrates a distributed acyclic graph (DAG) tree having six nodes;
[0007] FIG. 2 depicts an exemplary hashing structure;
[0008] FIG. 3 illustrates another exemplary hashing structure;
[0009] FIG. 4 shows another exemplary hashing structure;
[0010] FIG. 5 shows an exemplary block structure;
[0011] -, FIG. 6 shows a system in which the exemplary embodiments of the invention may be employed; and
[0012] FIG. 7 depicts a flowchart illustrating one non-limiting example of a method for practicing the exemplary embodiments of this invention.
DETAILED DESCRIPTION:
[0013] One challenging problem with such networks is that of maintaining correct functionality in the presence of faulty or malicious nodes that actively seek to disrupt the network. One critical dimension of the problem, which corresponds to a fundamental security property in data management, is data authentication, i.e., the ability to verify that the retrieved data — distributively stored in the network and shared by the peer nodes — is authentic. This problem is important since some overlay networks, in particular peer-to- peer (ρ2p) networks, do not impose restrictions on who may become a member and thus are vulnerable to attacks against data integrity and authenticity. Due tot his, contents may be maliciously modified imposing a serious security threat. While faulty nodes are trouble enough, adversarial p2p nodes can be especially troublesome. For example, adversaries may respond with data that appears to be a file of interest (e.g., a video of a scientific lecture or a data file describing a DNA sequence), but is instead of degraded quality, virus-infected, maliciously altered, contains incorrect material or, simply, is not up-to- date.
[0014] Herein, distributed authentication structures are considered, that is, constructions over distributed (e.g., p2p) networks that can provide efficient and secure distributed data authentication even in the presence of faulty and misbehaving network nodes according to some adversarial model. That is, the authentication of contents distributed, stored and accessed over a p2p network is examined. With respect to content protection, it is especially desirable to detect content forgery and old-data replay attacks. Among the design considerations, special attention is placed on the design of authentication schemes that achieve load balance. That is, it is desirable that the authentication structures evenly distribute the load due to authentication, so that hot-spots are reduced or eliminated.
[0015] The fundamental problems of data authentication and content origination in a distributed setting are considered, where data is stored and shared among the nodes of an overlay peer-to-peer network. A new distributed authentication structure over peer- to-peer networks for authenticating membership queries is designed. The new scheme is efficient, resilient against node failures, secure against replay attacks and achieves load balancing.
[0016] There is a large and growing literature on p2p overlay networks. One popular class of overlay networks is that of distributed hash tables (DHTs), fundamental distributed structures that make use of consistent hashing to efficiently support queries for exact matches with data keys. In particular, distributed hash tables (DHTs) support the basic functionality of put and get operations on key-value pairs. DHTs are based on randomized searching techniques in distributed environments. For a broad class of DHTs, an object is located with O(log ή) expected communication steps, where n is the number of nodes of the DHT. As an example of performance, a network may support queries using O(logrø) messages and <9(logn) words of memory corresponding to the O(log«) degree of the underlying graph. With advances in distributed object searching and the development of DHTs, several practical distributed storage systems over p2p networks have been designed and implemented. Examples include ' PAST, CAN, CFS and OpenHDT. Related distributed data structures for searching ordered data in a p2p network based on the randomized skip-list data structure have been studied. These structures achieve expected O(log«) query time and expected O(logn) update times, using n hosts, each of size O(logή) . In one network, an adaptation of skip graphs is presented that uses OQ.) space per network node.
[0017] The Merkle tree is a widely-used scheme in security applications and cryptographic constructions for signing and certifying members of a large data set. The idea is to use a tree and a cryptographic collision-resistant hash function to produce a short cryptographic description of the data set. The data set can then be certified by signing or publishing this short description. In essence, the Merkle tree is the simplest form for realizing what is known as the signature amortization technique, which is the state-of-the-art methodology for efficient data authentication. That is, using cryptographic techniques one signature is amortized over a large collection of data elements owned by an individual data source.
[0018] Existing p2p storage systems support an elementary authentication service for the stored data, where the retrieval of a stored data object is verified to be authentic by the requesting entity, but the techniques used are of the "sign-all" type in which data items are individually signed. Signature amortization is used in some systems (by adopting the so-called self certified data), but only on an item-by-item basis, not among all items inserted by a data source S . Thus, the "play-back" (or replay) attack is feasible for malicious network nodes: malicious nodes are able to return out-of-date, but still verifiable, instances of data. That is, old data that is currently not valid, expired or simply incorrect can be verified as authentic. This constitutes a serious security threat. A straightforward solution for this type of attack that may be used in conjunction with the "sign-all" authentication technique is signature refreshing where individually signed pieces of data along with a time-stamp (denoting the freshness of the data) are periodically resigned. Thus, although verifiable by the signature verification algorithm, old invalid instances of the data cannot be considered as authentic since they do not belong to the current time quantum (i.e., they do not carry the most up-to-date time- stamp). However, it is easy to observe that if signature refreshing is used to solve the play-back attack, this generally leads to inefficient schemes: the re-signing overhead is linear on the number of items currently existing in the system owned by a source 5 with a distinct (signing) identity. In contrast, using signature amortization one can achieve efficiency in data authentication since the cost due to signature refreshing is constant with respect to the data size.
[0019] * Tamassia and Triandopoulos introduced a distributed data authentication model in which both query answering and cryptographic proofs are distributed over a distributed (e.g., p2p) network using a distributed version of a Merkle tree for content authentication. See R. Tamassia and Triandopoulos, "Efficient content authentication over distributed hash tables," Technical Report, Center for Geometric Computing, Brown University, 2005. This new model of authentication captures the security needs of many distributed systems or emerging applications built on them, and is useful in peer-to-peer security. Over any distributed location system — for instance, any distributed hash table — and using only the basic object-location operation, they realized a distributed version of a Merkle tree, which in turn yields distributed versions of numerous security protocols and cryptographic constructions based on it. In a totally distributed setting over a peer-to-peer network with n nodes, they show that one can efficiently authenticate content membership in a fully dynamic set of m data elements in (2(log«log/w) time using O(mlogm) storage, with similar complexities for supporting insertions and deletions. Their construction is also used to design an efficient authenticated distributed hash table (ADHT), an authentication structure that does not depend on the implementation of the distributed location system, which allows them to leverage existing peer-to-peer architectures. This technique uses signature amortization and, thus, provides a solution.
[0020] However, due to the need for a centralized root node in their use of Merkle tree authentication, their scheme may have issues in terms of load-balanced queries, leading to "hot-spots." In particular, the construction of Tamassia and Triandopoulos realizing a distributed Merkle tree is designed to provide an efficient dynamic scheme. However, this scheme has a particular characteristic: it inherently creates hot-spots, network nodes with high traffic-congestion, namely, network nodes that are accessed more frequently than other nodes in the network. Indeed, in any query or update operation network nodes higher in the tree are accessed more frequently than nodes lower in the tree. For specific applications where one expects high volumes of queries and high update rates, this can be a drawback. Thus, it may be desirable to design load-balanced distributed authentication structures, that is, a scheme where content is authenticated by uniformly accessing network nodes.
[0021] Finally, note that this work is also related to authenticated data structures which provide a client-server model for data authentication where data is queried not from the trusted data source, but rather from a different, untrusted entity. The cryptographic technique of signature amortization can be used, similar to the Merkle tree. A significant amount of work has been done on developing efficient authenticated data structures, starting from the certificate revocation problem and the design of authenticated dictionaries and continuing with authenticated data structures for more general queries.
[0022] Before discussing the exemplary embodiments of the invention, a generalized description of a directed acyclic graph (DAG) is presented. A DAG is a directed graph with no directed cycles. A DAG comprises a plurality of nodes including at least one source or source node and at least one sink or sink node. A source node has no incoming edges (e.g., connections) whereas a sink node has no outgoing edges (e.g., connections). Let indeg(y) denote the in-degree of a node v, i.e., the number of incoming edges of v, and similarly, let outdeg(y) denote the out-degree of a node v, i.e., the number of outgoing edges of v.
[0023] For discussion purposes, within the context of a tree structure (i.e., a DAG constructed in a hierarchical tree-shape, also referred to herein as a "DAG tree"), source nodes may also be referred to as- leaf nodes or leaves and are usually shown at the bottom of a visual depiction (e.g., a graph illustrating the structure of the DAG tree). Similarly, in such a DAG tree, sink nodes may also be referred to as root nodes or roots and are usually shown at the top of the DAG tree graph. Furthermore, nodes that are neither leaf nodes nor root nodes may be referred to as tree nodes (i.e., intermediate nodes).
[0024] FIG. 1 illustrates a DAG tree 10 having six nodes. Node A is the only root node. Nodes B and C are tree nodes. Nodes D, E and F are leaf nodes. Although the direction is indicated as generally proceeding from bottom to top, this is more of a convention than a rule. In practice, communication between physical nodes assigned to the respective nodes of the DAG tree 10 may be bilateral instead of the depicted unilateral directions. Notwithstanding this, the illustrated direction may generally be viewed as indicating access/retrieval since it is customary to assume that data is stored at source/leaf nodes and accessed or retrieved from sink/root nodes (i.e., queries on the stored data are sent from the sink/root nodes).
[0025] Nodes of a same relative position within a DAG tree may be said to belong to a same level. Levels are generally numbered from 0 to k, where source/leaf nodes are located at level 0 and sink/root nodes are located at level k. In the DAG tree 10 of FIG. 1, the leaf nodes (nodes D, E and F) are located at level 0 while the root node (node A) is located at level 2 (i.e., k = 2).
[0026] Nodes within a DAG tree are often discussed in relation to one another. A parent is considered to be a node of a higher level that has an edge with (i.e., is connected to) the node in question. For example, in the DAG tree of FIG. 1, node B is considered a parent of nodes D and E. Similarly, a child is considered to be anode of a lower level that has an edge with (i.e., is connected to) the node in question. For example, in FIG. 1, nodes B and C are considered children of node A. As can be appreciated, source/leaf nodes have no children while sink/root nodes have no parents. In contrast, intermediate/tree nodes have both at least one child and at least one parent. Furthermore, a first node may be said to be a. predecessor of a second node if the first node is a child of the second node. Similarly, a second node may be said to be a successor of a first node if the second node is a parent of the first node. [0027] The exemplary embodiments of the invention describe distributed content authentication through the use of load-balance distributed authentication structures for realizing signature amortization. These features, namely load balancing and signature amortization, allow for high degrees of efficiency and scalability, which are important properties for applications built over networks, such as p2p networks and other distributed networks. However, at the same time, and as explained in further detail below, these properties impose two inherently contradictory and intuitively inconsistent, and thus, hard to meet, design goals. Notwithstanding, exemplary techniques are introduced that achieve both features.
[0028] Results are presented with respect to the content authentication problem in ρ2p storage systems. An exemplary efficient authenticated p2p data structure is provided that implements a distributed dictionary with substantially uniform workload. In some exemplary embodiments, the data structure may be viewed as a load-balanced distributed version of the Merkle hash tree and can be used as a building block for the development of other authenticated distributed data structures. This construction makes use of a distributed hash table (DHT) to store certification information. Note that any suitable DHT may be utilized in conjunction with the exemplary embodiments of the invention.
[0029] While the exemplary embodiments of the invention are discussed below primarily in the context of a p2p network, it should be appreciated that any suitable network (e.g., a distributed network, a network utilizing a DHT) may be used in conjunction with the exemplary embodiments of the invention.
[0030] It is preferable to have a DHT structure that provides assurances that this information can be located in the presence of faulty or adversarial nodes. That is, authenticating the routing of DHTs over p2p networks, a requirement which is orthogonal to the content authentication security problem, is an important additional property in p2p security. The constructions are built over the minimal functionality offered by DHTs and without making any specific assumptions (thus, without the design depending on any particular DHT implementation) generality and additional security are achieved: one can simply use a DHT implementation that enjoys any additional properties, such as routing authentication, as a non-limiting example. [0031 ] The performance and security properties of the distributed authentication structure are summarized as follows. Given a distributed hash table H with n network nodes such that H supports object location in O(\ogή) time using O(logn) storage overhead per network node, one can build over H a distributed authentication structure for storing m data items such that:
[0032] 1. The data structure uses O(m log m) space.
[0033] 2. Stored content retrieved by a query can be authenticated with
(9(log n log m) expected time complexity and uniform workload over the network nodes, using certifications of <?(log m) size.
[0034] 3. The insertion of a new data item has <3(log«logrø) expected amortized time complexity.
[0035] 4. The data structure uses public information consisting of O(log n) signatures.
[0036] 5. The system is secure against replay attacks.
[0037] The problem of authenticating content is considered below. In particular, authenticating content distributively stored at network nodes through the use of a distributed hash table (DΗT) is discussed. That is, it is desirable to show that data files claimed to have been added by a user were really put in the p2p network by that user and have not been modified.
[0038] Consider a standard query model where the DΗT stores key- value pairs of the type (k,x) , where keys are unique identifiers and values are associated with keys. These pairs are inserted through operation put{-,-) , where put(Jc,x) inserts and stores data item x under key k . With respect to data retrieval, the DΗT supports query get(Jc) , which returns the value associated with key k . Both operations may be implemented on top of the lower-level object location operation supported by the underlying p2p system, where given an object identifier (e.g., an IP address) the network node storing the object is returned. The fundamental authentication problem that arises in this case is data integrity, namely ensuring that any query returns the correct, authentic corresponding value. So, if (/fc,x) is the currently valid pair that has been (most recently) inserted in the system, in terms of security and information assurance, the property that (at least) must be satisfied is that any query on key k must return the correct corresponding value x .
[0039] By definition, distributed hash tables only support data item insertions and retrievals (by exporting only the basic functionality put and get ), leading to two important consequences. First, explicit data item deletions are not supported, though in some DHTs data items are automatically deleted after apre-specified amount of time and, effectively, items are being expired after a time interval in the system. Second, updates on data items (values) are performed through a new separate insertion. Therefore, the data authentication problem becomes more challenging in this case since replay attacks can be easily launched: a malicious network node can at any time report an invalid value that corresponds to old data.
[0040] One approach to the authentication of queries is to individually sign each item stored in the data structure. Namely, when data source S wishes to add (A:, x) , it computes the signature σ of the pair (k,x) using its private key and inserts (AΓ, (CΓ, ΛΓ)) into the data structure. A public key infrastructure is assumed for the owners of stored data items in the network and the users of the storage system. A query for key k now returns the pair (σ, x) , where the signature σ allows one to verify whether x is the valid answer. Unfortunately, this "sign-all" approach introduces significant performance overhead and allows for replay attacks for old values. Indeed, this approach does not provide a mechanism for invalidating old signatures on pairs that have been removed from the DHT or whose values have been modified. Therefore, one may end up with the following crucial security threat: a malicious network node can easily perform a "replay" attack, by simply reporting on a get(-) operation an invalid, but still verifiable, value that corresponds to an old (expired or out-of-date) key-value pair.
[0041] In general, it is desirable to provide signature freshness, that is, a mechanism that ensures that only recent signatures are used to validate answers to queries. This can be achieved by introducing time-stamps in the signed values, so that only recent (up to a convention, e.g., within a specified time quantum) valid signatures are accepted. Still, even with this extension, the signature refreshing cost is linear in the number of the current items in the DHT: if m items exist in the system that are owned by an individual data source, at any fixed time quantum, the signature refreshing cost is O(m) , since all valid key-value pairs need to be re-signed. To overcome this linear authentication cost, signature amortization can be used. The technique consists of signing only one, specially designed for the authentication purposes often through the use of a Merkle tree (or extensions of it), digest of the entire collection of stored data items owned by the same source. This way, signatures can be refreshed at a constant computational overhead. Signature amortization may be a valid avenue for achieving efficiency with respect to the overheads due to signature-refreshing.
[0042] With respect to the desired property of load-balancing, however, amortizing a single signature over a large collection of data items may be catastrophic. Indeed, achieving both load-balancing and efficient content authentication in p2p systems turns out to be a challenging problem. Intuitively, satisfying these two requirements seems to imply contradictory design goals. As claimed, the primary technique used for achieving efficient authentication is signature amortization, wherein the data source signs a digest of the entire data set that is computed using a hierarchical hashing structure (e.g., a Merkle tree). However, signature amortization leads to authentication information that is heavily centralized, meaning that the nodes of hashing structure close to the root and the signature over the digest are accessed more often for answer verification. At the same time, by their design goals, it is preferable that p2p systems achieve load-balancing, meaning that data should be accessed without creating hot-spots, i.e., network nodes accessed with skewed, rather than uniform, distribution. Thus, realizing efficient authentication schemes for p2p networks that preserve load-balancing is a particularly challenging task. Note that the "sign-all" solution, although inefficient, achieves load- balancing in a trivial way.
[0043] Exemplary efficient, load-balanced distributed authentication schemes for
DHTs are presented that significantly outperform the "sign-all" solution. This distributed authentication structure can be built on top of the core functionality of any existing DHT. That is, it uses the basic DHT functionality locate(k) , which returns the network node
(id) storing the given key k (object identifier). For typical DHT implementations, locate(K) takes O(\ogri) time, where n is the number of network nodes. The scheme extends the underlying DHT by supporting authenticated versions of operations put and get while preserving efficiency and load balancing.
[0044] For simplicity, in the below discussion, it will be assumed that a single data source is inserting items in the system. For more data sources, one simply makes use of multiple invocations of the scheme. Following the public parameter model, one assumes the existence of some public information that can be easily accessed and updated independently of the underlying DHT structure. More importantly, the size of this information is only logarithmic in the number of data items stored in the system. This public information imposes no limitations on the construction herein. For instance, and as a non-limiting example, in practice this assumption may be implemented easily and efficiently through a web service that reliably posts a small amount of data to a website.
[0045] The performance and security properties of the scheme can be summarized as follows. Given a distributed hash table H with n network nodes such that H supports object location in 0(log«) time using <9(log«) storage overhead per-network node, one can build over H a distributed authentication structure for storing m data items such that: the data structure uses O{m log m) space in total, evenly distributed over the network (data load-balanced), the answer to a query can be authenticated with O(\og n log m) expected time complexity and uniform workload over the network nodes (network traffic load-balance), using certifications of O(log m) size, the insertion of new data items has (9(log n log m) expected amortized time complexity, the public information has size O(log«) , the signature refreshing cost is O(log m) and the system is secure against replay attacks.
[0046] Exemplary embodiments of the new scheme for realizing a distributed authentication structure have two main design goals: signature amortization, which will aim to incur low signature refreshing cost, and load-balancing, which will attempt to diminish or not create any hot spots with respect to the distribution of traffic in the underlying p2p network. An exemplary authentication structure is used that is similar to Merkle trees and, additionally, a technique that uses redundancy such that accessing the structure can be done in many different ways. By choosing the particular "accessing path" in a randomized fashion one effectively achieves a reduction in or elimination of hot- spots in accessing the structure. Naturally enough though, the redundancy used by the new structure slightly increases the storage needs and the computational overhead due to updates, but both are increased only by a logarithmic factor as will be seen below.
[0047] As a non-limiting example, the data structure can achieve signature amortization by systematically applying a hash function (e.g., a collision-resistant hash function) over the data items stored in the DHT. A hashing scheme is a directed acyclic graph (DAG) G where nodes in G correspond to hash values; in this case, the hash value of node v in G is computed as the hash of the concatenation of the hash values corresponding to the predecessor nodes of v . It may be desirable to have the concatenation formed according to a fixed ordering. Source nodes store the hash value of a data item (e.g., the hash of a key- value pair) and sink nodes store the digest of the entire collection of data items (i.e., copies of a same digest that comprises the digest of the entire collection of data items). Thus, only the digests need be signed and a data item can be verified to be authentic by providing a verification path in DAG G from the data item to a digest (e.g., comprising or consisting of hash values needed for re-computing the digest given the data item) and by verifying the signed digest to be authentic. A hashing scheme may extend to DAGs using a Merkle tree, for example.
[0048] The main idea in the construction is to use a DAG G of high expansion rate as the hashing scheme for signature amortization, such that for any data element, there exist many equivalent verification paths to the data digest. DAG G is distributed to the network nodes of the underlying DHT by appropriately indexing the hash values and storing them as special key- value pairs. The structure of the hashing scheme G in the DHT may be preserved, for example, by having each network node storing the hash value of node v in G store the keys that can be used to retrieve the hash values of the successor nodes in G . In some exemplary embodiments, one can then randomize the verification path search procedure over the network such that any two verification paths are totally independent from each other and pass through different network nodes. In this way, signature amortization is implemented and at the same time load-balancing is achieved.
[0049] The intuition behind this construction can be alternatively viewed as follows. In an exemplary embodiment, to build the hashing scheme G over m total data items and distribute this structure to the network nodes, instead of using a single tree, m trees (directed towards their roots) are used that share mlogm tree nodes. These trees may form a special graph structure (see, e.g., FIG. 2) that has certain beneficial properties with respect to the way one creates (e.g., defines in a hierarchical way) and stores (e.g., distributes to the network nodes) the disclosed authentication structure (e.g., hash values and digests). The graph resembles the Fast Fourier Transform (FFT) computation graph or a special form of a butterfly network.
[0050] The hashing structure G for m data items and its embedding into a DHT with n network nodes is described herein in respect to an exemplary embodiment of the invention. Assume for simplicity (and without loss of generality, as will be seen), that m = 2k . The hashing scheme consists of k + 1 levels, where at level 0 one has the m leaves of the m trees corresponding to the m data elements. Each level i , O ≤ i ≤ k, consists of m graph nodes, each of which is the parent of two nodes of level / — 1. The parent-child relation at each level is defined in a systematic way such that graph nodes at level k are the roots of m perfect binary trees over the data set and such that each graph node at level i is the root of a perfect binary subtree of size exactly 2'+1 — 1. In particular, assuming nodes of any level are numbered with {0, 1, ..., m - 1} and from left to right, one has (see FIG. 2):
[0051] the J -th graph node of level i , where 0 ≤ j ≤ m-l and 0 < / < k , is the parent of the / -th and /-th graph nodes of level /" — 1 , where / = [J/ 2' J2' + [(/ + 2'-1 )mod T J .
[0052] FlG. 2 depicts the exemplary hashing structure 20. Using redundancy, m
(where m = 8) shared binary trees (rooted at nodes r,,...,rg) are built on top of leaves
V, , ..., Vg .
[0053] The edges of graph G are directed towards parent nodes (higher-level nodes). Thus, graph G is a DAG with m source nodes and m sink nodes,
[0054] Let h be a cryptographic collision-resistant hash function. Graph G is used as an exemplary hashing scheme to compute m digests of the data elements. Each node v of G stores a digest d(y) (a hash value). If v is at a level / > 0 and has children w and z , define d(v) = h(d(w) || d(z)) . Else (v is at level 0 ), define d(y) = h(k \\ x) , where (k, x) is the data item associated with node v . Thus, d(y) is the digest of the set of the data items that the leaves of the subtree defined by v in G correspond to and it is hierarchically computed according to the structure of hashing scheme G . Because of symmetry, the nqdes of .G at level / > 0 store 2k~' distinct digests. Thus, all the nodes of level k (i.e., the sink/root nodes) store the same digest. The source of the data items signs this digest and makes it (the digest and its signature) available as public information. Each node v of G is indexed by a unique id and is inserted in the DHT using this id as the key and the triplet (β, v, , V2 ) as the value, where d is the digest of v , and v, , v2 are the (ids of the) children of v in G (or special null values if v is a leaf node in G ). In particular, to facilitate the support of updates, the following exemplary method to index the nodes of G may be used. In the exemplary method, nodes are given ids that encode their position in G . For example, the j th node of level / has the id /(/,/) . In further exemplary embodiments, f is an injective function (thus, ids are unique).
[0055] FIG. 3 depicts another exemplary hashing structure 30. While the exemplary hashing structure 20 of FIG. 2 is constructed under the assumption that m = 2k , the exemplary hashing structure 30 of FIG. 3 is constructed under the assumption that m = 3k . Similarly, the hashing scheme 30 consists of k + 1 levels, where at level 0 one has the m leaves of the m trees corresponding to the m data elements. Each level / , 0 < i < k , consists of m nodes, each of which is the parent of three nodes of level / - 1. The parent-child relation at each level is defined in a systematic way such that graph nodes at level k are the roots of m perfect binary trees over the data set. Since the exemplary hashing structure 30 is configured around each node having three parents and/or children, this configuration could be represented using a three dimensional model. As a non-limiting example, and in conjunction with such a three dimensional model, the nodes could be identified using three coordinates indicative of each node's relative position within the three dimensional model.
[0056] It is easy to see that, assuming a uniform distribution of access, the exemplary hashing structures 20, 30 shown in FIGS.2 and 3 are uniformly load-balanced. One can ensure a uniform distribution of access, for example, by randomly-selecting the verification path used for a given source node (i.e., for accessing a data element stored at a corresponding source node). If the verification path is chosen at random, no node is more likely to be accessed than any other node.
[0057] In order for a hashing structure to be considered traffic load balanced, for nodes of each pair of immediately successive levels i and H-I, it should hold that: outdeg(leve\ i) = indeg(leγel z+1).
[0058] FIG. 4 illustrates another exemplary hashing structure 40 where m = 12.
Note that: nodes of level 0 have three parents, nodes of level 1 have three children and two parents, nodes of level 2 have two children and two parents, and nodes of level 3 have two children. Even though nodes of different levels have different numbers of parents and/or children, the exemplary hashing structure 40 is still traffic load balanced since, for nodes of each pair of immediately successive levels / and z+1 , out deg(\eve\ i) = indeg(levQl z+1). Assuming the verification path is chosen at random, no node is more likely to be accessed than any other node.
[0059] The exemplary hashing structures 20, 30, 40 shown in FIGS.2, 3 and 4 are presented as non-limiting examples. The exemplary embodiments of the invention may utilize any suitable structure (e.g., hashing structure) or configuration that is traffic load balanced and includes a plurality of source nodes and a plurality of sink nodes with a plurality of verification pathways (e.g., many equivalent verification paths from a source node to a corresponding data digest, such as one stored at the plurality of sink nodes).
[0060] FIG. 5 shows an exemplary block structure 50. The static structure can be dynamized using Overmars' technique: blocks of consecutive logarithmic size are treated separately. Block B1 has data digest ht , which is stored at the 2' nodes corresponding to the roots of B1 . Note that each block may be seen to correspond to a different DAG, similar to the exemplary structures shown in FIGS.2-4, with each having a different size.
[0061] Referring to FIG. 6, a system 100 is shown in which the exemplary embodiments of the invention may be employed. The system 100 includes a client 102 communicating with a network (e.g., a p2p network 104). The client 102 comprises an electronic device capable of communication with the p2p network 104. As a non-limiting example, the client 102 may comprise at least one data processor, at least one memory, a transceiver, and a user interface comprising a user input and a display device. The client 102 and/or p2p network 104 includes one or more components capable of implementing the exemplary embodiments of the invention. As a non-limiting example, an encryption component may be employed. As further non-limiting examples, the encryption component may be a separate entity (e.g. an integrated circuit, an Application Specific Integrated Circuit or ASIC) or may be integrated with other components (e.g. a program run by a data processor, functionality enabled by a data processor). As a further non- limiting example, one or more of the client 102 and the network (e.g., the p2p network 104) may utilize a computer program product to implement the exemplary embodiments of the invention. The computer program product may comprise program instructions tangibly embodied on a computer readable medium, the execution of which result in operations comprising the exemplary embodiments of the invention.
[0062] How get operations are handled is now described with reference to further exemplary embodiments. For illustration purposes, the below discussion assumes the use of a hashing structure/network 20 as shown in FIG. 2. Begin by performing a query according to the underlying DHT structure. Then, given a data element (k,x) stored at a network node W , node W initiates a randomized process for generating a verification path for (k,x) . For example, W "flips a coin" to determine the next node in the path, between its two parents of level 1 in G , to contact next (through a location operation on the corresponding id of the parent node, first). Assume that each network node storing the digest d of node v in G knows the id of v (e.g., knows v 's position in G ). Accordingly, in some exemplary embodiments, (a non-leaf in G ) network node W contacts and retrieves the corresponding digest of the sibling node in the verification path of the previously visited network node. In further exemplary embodiments, instead of the ids of the children nodes, one can actually store their corresponding digests when embedding the hashing scheme into the p2p network. In such a manner, this last step can be avoided.
[0063] In general, a network node V at level J randomly chooses the next node in the verification path network node to be contacted, independently and with probability 1/2. Thus, it can be seen that a query results in a verification path of length O(log m) , using O(logw) location operations (thus, resulting in <9(log w log «) computation and communication costs). Through the randomized search process, the verification path for a fixed data element is actually an independent and identically distributed random variable. Thus, no hot-spots are created. The verification path is returned by the DHT and, given this path, one can authenticate the answer of the operation by processing the verification path and verifying the publicly available signed digest. The total storage required by the authentication structure is O{m log m) . That is, assuming perfect mapping functions from keys to network nodes (e.g., in some DHTs a cryptographic hash function is used), the storage is logarithmic in n per network node, for m = O(n) , i.e., still optimal (many DHTs use routing tables of logarithmic in n size). A caching technique used in Tamassia and Triandopoulos can be also be used in this case to improve the creation of the verification paths.
[0064] The data structure described in the previous section is generally static. To support updates, it can be modified using a dynamization technique, which allows one to transform a static data structure into a corresponding dynamic structure. The idea is to partition a data set of size m into sequence of O(log m) blocks, where the size of each block is twice the size of the previous block, and to completely rebuild blocks after updates, as necessary (see, e.g., FIG. 5). This exemplary technique may be applied to support insertions of data items with new keys.
[0065] Let S be a set of m data items and let m = (bk, bk_x , ...,i>, ,60 )2 be number m written in binary, with bk = 1 . Note that items in S are not assumed to be sorted.
Recall that any element is located using a particular mapping (e.g., a hash function). Thus, ordering may not be needed for locating an existing element.
[0066] S is partitioned into [log/wj+l blocks SQ,Sx,...,Sk , each a subset of 5 , according to the weights of the bits of m , e.g., | S1 \= b, 2' . Then let G(O , for 0 < / < k , denote the hashing DAG described in the previous section for the elements of block iS( . DAG G(Z) has b, - 2' nodes. DAGs G(O), G(I),..., G(k) are used separately as authentication structures. That is, for i = 0,...,k , if b, — 1 , the source signs the top-level digest h, of DAG G(O (see FIGS. 2 and 5). Each DAG G(O is distributed over the network nodes as before and for a data item in block S1 , the corresponding verification path in DAG G(O is retrieved using O(ι) location operations. Accordingly, <9(log m) W
signed digests (one for each block) are made available as public information.
[0067J As a non-limiting example, insertions of data items may be performed as follows. Let / be the smallest / such that b, = 0 or / = k + 1 if no such / exists. To insert an item x into S 5 merge DAGs G(O), G(I)5... , G(Z-I) to create DAG G(Z) for the new block S1 = S0 u ...u _?,_, u x . Note that | S1 1= 1 + ∑"J02J = 2' .
[0068] As a non-limiting example, to support item expiration, expired items may be removed during the construction of a new DAG G(t) . This may entail minor modifications to the above insertion algorithm. One has that the insertion of a data item into a set of size m stored into a DHT of size n takes O(logwlogra) expected amortized time.
[0069] Furthermore, if one allows for the insertion of new data items that replace old data items, one can work around the replay attack by introducing time-stamps at the data item level, for example, as an extra field in the data item. Even so, one has the following property: the system naturally supports item expiration in the sense that no item can be more than ml 2 steps old.
[0070] Lemma 1. An insertion of a data item in a dynamic graph authentication structure G of size m distributed over a network of size n takes 6>(log m log ri) expected amortized time.
[0071] Proof. One does not exactly destroy the graph structures of the blocks that one needs to discard after an update, but rather they are composed into the new graph structure G(/) . It is easy to see that only O(2J) location operations are needed to update the graph distribution of the graph structure over the network. If instead one used the naive method to rebuild the new structure from scratch, then O(j2J ) location operations would be needed, resulting in O(log2 mlogri) total cost. To facilitate the new block creation, encode in the hash keys information about the block that they correspond to, and also their level and position in the blocks. Using analysis, one gets that for a structure of size m = O(2k+l) , the total number of location operations due to insertions is bounded by
^] / 02k~' OQ! ) = O(k2k ) = O{m log ni) , which gives an amortized per insertion location cost of O(log rri) . A location operation in the underlying DHT takes O(log ή) expected time to perform.
[0072] Overall, one has the following result with respect to the disclosed distributed authentication structure. An optimal distributed hash table over n network nodes is where objects are located in 0(log«) time at O(log«) per-network node storage overhead.
[0073] Theorem 1. There exists a distributed authentication structure for storing m data items over an optimal distributed hash table of size n , such that:
[0074] 1. Stored content is authenticated with 0(log« log m) expected time complexity and <9(log m) space complexity.
[0075] 2. New data items can be inserted at (9(log n log rri) expected amortized time complexity.
[0076] 3. The public information has size O(logn) .
[0077] 4. The signature freshness cost is O(log rri) .
[0078] 5 The system is secure against replay attacks.
[0079] Described above are exemplary methods, computer programs, devices, networks and systems for implementing the exemplary embodiments of the invention. Exemplary methods are proposed for enhancing information assurance in distributed adversarial environments by designing a new distributed authentication structure. The scheme is generally time and space efficient and achieves load-balancing. The scheme provides for efficient authentication of contents distributed and stored, for example, over a DHT and, in particular, is secure against replay attacks.
[0080] In one exemplary embodiment, and as shown in FIG. 7, a method (for constructing a distributed authentication structure) includes: mapping a plurality of first nodes of a distributed network to a plurality of source nodes of a directed acyclic graph (DAG) structure, wherein each first node stores at least one data element of a plurality of data elements, wherein the DAG structure further comprises a plurality of sink nodes and zero or more intermediate nodes disposed between the plurality of source nodes and the plurality of sink nodes, wherein a verification path exists between each source node and each sink node, wherein the DAG structure is traffic load balanced such that all nodes of the DAG structure are accessed with a substantially uniform distribution (box 71); mapping zero or more second nodes of the distributed network to the zero or more intermediate nodes (box 72); and mapping a plurality of third nodes of the distributed network to the plurality of sink nodes, wherein the plurality of third nodes each store a signed digest for the plurality of data elements (box 73).
[0081] A method as described above, wherein the signed digest comprises a hash value of a distributed authentication structure corresponding to the DAG structure. A method as described above, wherein the distributed authentication structure is based on a cryptographic collision-resistant hash function. A method as in any described above, wherein the distributed authentication structure comprises a plurality of Merkle trees. A method as in any described above, wherein the distributed authentication structure comprises a corresponding hash value for each node of the DAG structure, wherein the corresponding hash value for each first node comprises a hash of the at least one data element stored at the respective first node, wherein the corresponding hash value for each second node comprises a concatenation of corresponding hash values for predecessor nodes of the respective second node.
[0082] A method as in any described above, wherein the distributed authentication structure comprises a layered butterfly DAG equipped with a hierarchical hashing scheme, wherein the plurality of source nodes of the DAG are associated with the plurality of data elements. A method as in any described above, further comprising: authenticating the corresponding answer by hashing along the answer and the proof to obtain a digest and comparing the resulting digest with the signed digest. A method as in any described above, wherein the distributed authentication structure comprises a plurality of blocks with each block realized as a layered butterfly DAG equipped with a hashing scheme, the method further comprising: updating the distributed authentication structure and corresponding hash values by combining one or more pairs of blocks after a new data element is inserted into the plurality of data elements.
[0083] A method as in any described above, further comprising: in response to a query, providing a corresponding answer and a proof, wherein the corresponding answer comprises at least one corresponding data element, wherein the proof comprises proof information indicative of a randomly-selected verification path in the DAG structure from the source node storing the at least one corresponding data element to a sink node storing the signed digest. A method as described above, further comprising: authenticating the corresponding answer by comparing the proof with the signed digest.
[0084] A method as in any described above, wherein each data element of the plurality of data elements comprises a key-value pair. A method as in any described above, wherein the plurality of source nodes consists of m source nodes, wherein the plurality of sink nodes consists of m sink nodes, wherein the DAG structure comprises m trees that share the m source nodes. A method as described above, wherein the nodes of the DAG structure are partitioned into k+\ levels, wherein m ?= nk, wherein n ≥ 1. A method as in any described above, implemented as a computer program. A method as in any described above, further comprising one or more additional aspects of the exemplary embodiments of the invention as described in further detail herein.
[0085] In another exemplary embodiment, a computer program or computer program product includes program instructions embodied on a tangible computer- readable medium, execution of the program instructions resulting in operations including: mapping a plurality of first nodes of a distributed network to a plurality of source nodes of a directed acyclic graph (DAG) structure, wherein each first node stores at least one data element of a plurality of data elements, wherein the DAG structure further comprises a plurality of sink nodes and zero or more intermediate nodes disposed between the plurality of source nodes and the plurality of sink nodes, wherein a verification path exists between each source node and each sink node, wherein the DAG structure is traffic load balanced such that all nodes of the DAG structure are accessed with a substantially uniform distribution; mapping zero or more second nodes of the distributed network to the zero or more intermediate nodes; and mapping a plurality of third nodes of the distributed network to the plurality of sink nodes, wherein the plurality of third nodes each store a signed digest for the plurality of data elements.
[0086] A computer program as described above, wherein the signed digest comprises a hash value of a distributed authentication structure corresponding to the DAG structure. A computer program as described above, wherein the distributed authentication structure is based on a cryptographic collision-resistant hash function. A computer program as in any described above, wherein the distributed authentication structure comprises a plurality of Merkle trees. A computer program as in any described above, wherein the distributed authentication structure comprises a corresponding hash value for each node of the DAG structure, wherein the corresponding hash value for each first node comprises a hash of the at least one data element stored at the respective first node, wherein the corresponding hash value for each second node comprises a concatenation of corresponding hash values for predecessor nodes of the respective second node.
[0087] A computer program as in any described above, wherein the distributed authentication structure comprises a layered butterfly DAG equipped with a hierarchical hashing scheme, wherein the plurality of source nodes of the DAG are associated with the plurality of data elements. A computer program as in any described above, execution of the program instructions resulting in operations further comprising: authenticating the corresponding answer by hashing along the answer and the proof to obtain a digest and comparing the resulting digest with the signed digest. A computer program as in any described above, wherein the distributed authentication structure comprises a plurality of blocks with each block realized as a layered butterfly DAG equipped with a hashing scheme, execution of the program instructions resulting in operations further comprising: updating the distributed authentication structure and corresponding hash values by combining one or more pairs of blocks after a new data element is inserted into the plurality of data elements.
[0088] A computer program as in any described above, execution of the program instructions resulting in operations further comprising: in response to a query, providing a corresponding answer and a proof, wherein the corresponding answer comprises at least one corresponding data element, wherein the proof comprises proof information indicative of a randomly-selected verification path in the DAG structure from the source node storing the at least one corresponding data element to a sink node storing the signed digest. A computer program as described above, execution of the program instructions resulting in operations further comprising: authenticating the corresponding answer by comparing the proof with the signed digest.
[0089] A computer program as in any described above, wherein each data element of the plurality of data elements comprises a key-value pair. A computer program as in any described above, wherein the plurality of source nodes consists of m source nodes, wherein the plurality of sink nodes consists of m sink nodes, wherein the DAG structure comprises m trees that share the m source nodes. A computer program as described above, wherein the nodes of the DAG structure are partitioned into k+ϊ levels, wherein m — nk, wherein n > 2. A computer program as in any described above, further comprising one or more additional aspects of the exemplary embodiments of the invention as described in further detail herein.
[0090] In another exemplary embodiment, a communication network includes: a plurality of first nodes corresponding to a plurality of source nodes of a directed acyclic graph (DAG) structure, wherein each first node stores at least one data element of a plurality of data elements, wherein the DAG structure further comprises a plurality of sink nodes and zero or more intermediate nodes disposed between the plurality of source nodes and the plurality of sink nodes, wherein a verification path exists between each source node and each sink node, wherein the DAG structure is traffic load balanced such that all nodes of the DAG structure are accessed with a substantially uniform distribution; zero or more second nodes corresponding to the zero or more intermediate nodes; and a plurality of third nodes corresponding to the plurality of sink nodes, wherein the plurality of third nodes each store a signed digest for the plurality of data elements.
[0091] A communication network as described above, wherein the signed digest comprises a hash value of a distributed authentication structure corresponding to the DAG structure. A communication network as described above, wherein the distributed authentication structure comprises a corresponding hash value for each node of the DAG structure, wherein the corresponding hash value for each first node comprises a hash of the at least one data element stored at the respective first node, wherein the corresponding hash value for each second node comprises a concatenation of corresponding hash values for predecessor nodes of the respective second node. A communication network as in any described above, wherein the signed digest comprises a hash value of a block of the distributed authentication structure. [0092] A communication network as in any described above, wherein the distributed authentication structure comprises a layered butterfly DAG equipped with a hierarchical hashing scheme, wherein the plurality of source nodes of the DAG are associated with the plurality of data elements. A communication network as in any described above, wherein the corresponding answer can be authenticated by hashing along the answer and the proof to obtain a digest and comparing the resulting digest with the signed digest. A communication network as in any described above, wherein the distributed authentication structure comprises a plurality of blocks with each block realized as a layered butterfly DAG equipped with a hashing scheme, wherein the distributed authentication structure and corresponding hash values are configured to be updated by combining one or more pairs of blocks after a new data element is inserted into the plurality of data elements.
[0093] A communication network as in any described above, wherein the communication network is configured, in response to a query, to provide a corresponding answer and a proof, wherein the corresponding answer comprises at least one corresponding data element, wherein the proof comprises proof information indicative of a randomly-selected verification path in the DAG structure from the source node storing the at least one corresponding data element to a sink node storing the signed digest. A communication network as described above, wherein the corresponding answer is authenticated by comparing the proof with the signed digest.
[0094] A communication network as in any described above, wherein each data element of the plurality of data elements comprises a key- value pair. A communication network as in any described above, wherein the plurality of source nodes consists of m source nodes, wherein the plurality of sink nodes consists of m sink nodes, wherein the DAG structure comprises m trees that share the m source nodes. A communication network as described above, wherein the nodes of the DAG structure are partitioned into k+l levels, wherein m = nk, wherein n > 2. A communication network as in any described above, wherein the communication network comprises a peer-to-peer network. A communication network as in any described above, further comprising one or more additional aspects of the exemplary embodiments of the invention as described in further detail herein. [0095] Generally, various exemplary embodiments of the invention can be implemented in different mediums, such as software, hardware, logic, special purpose circuits or any combination thereof. As a non-limiting example, some aspects may be implemented in software which may be run on a computing device, while other aspects may be implemented in hardware.
[0096] The foregoing description has provided by way of exemplary and non- limiting examples a full and informative description of the best method and apparatus presently contemplated by the inventors for carrying out the invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention.
[0097] Furthermore, some of the features of the preferred embodiments of this invention could be used to advantage without the corresponding use of other features. As such, the. foregoing description should be considered as merely illustrative of the principles of the invention, and not in limitation thereof.

Claims

CLAIMSWhat is claimed is:
1. A method for constructing a distributed authentication structure comprising: mapping a plurality of first nodes of a distributed network to a plurality of source nodes of a directed acyclic graph (DAG) structure, wherein each first node stores at least one data element of a plurality of data elements, wherein the DAG structure further comprises a plurality of sink nodes and zero or more intermediate nodes disposed between the plurality of source nodes and the plurality of sink nodes, wherein a verification path exists between each source node and each sink node, wherein the DAG structure is traffic load balanced such that all nodes of the DAG structure are accessed with a substantially uniform distribution; mapping zero or more second nodes of the distributed network to the zero or more intermediate nodes; and mapping a plurality of third nodes of the distributed network to the plurality of sink nodes, wherein the plurality of third nodes each store a signed digest for the plurality of data elements.
2. The method of claim 1, wherein the signed digest comprises a hash value of a distributed authentication structure corresponding to the DAG structure.
3. The method of claim 2, wherein the distributed authentication structure is based on a cryptographic collision-resistant hash function.
4. The method of claim 2 or 3, wherein the distributed authentication structure comprises a layered butterfly DAG equipped with a hierarchical hashing scheme, wherein the plurality of source nodes of the DAG are associated with the plurality of data elements.
5. The method of claim 2, 3 or 4, wherein the distributed authentication structure comprises a corresponding hash value for each node of the DAG structure, wherein the corresponding hash value for each first node comprises a hash of the at least one data element stored at the respective first node, wherein the corresponding hash value for each second node comprises a concatenation of corresponding hash values for predecessor nodes of the respective second node.
6. The method of claim 1, 2, 3, 4 or 5, further comprising: in response to a query, providing a corresponding answer and a proof, wherein the corresponding answer comprises at least one corresponding data element, wherein the proof comprises proof information indicative of a randomly-selected verification path in the DAG structure from the source node storing the at least one corresponding data element to a sink node storing the signed digest.
7. The method of claim 6, further comprising: authenticating the corresponding answer by hashing along the answer and the proof to obtain a digest and comparing the resulting digest with the signed digest.
8. The method of claim 6, wherein the distributed authentication structure comprises a plurality of blocks with each block realized as a layered butterfly DAG equipped with a hashing scheme, the method further comprising: updating the distributed authentication structure and corresponding hash values by combining one or more pairs of blocks after a new data element is inserted into the plurality of data elements.
9. The method of claim 1, 2, 3, 4, 5, 6, 7 or' 8, wherein each data element of the plurality of data elements comprises a key- value pair.
10. The method of claim 1, 2, 3, 4, 5, 6, 7, 8 or 9, implemented as a computer program.
11. A communication network comprising: a plurality of first nodes corresponding to a plurality of source nodes of a directed acyclic graph (DAG) structure, wherein each first node stores at least one data element of a plurality of data elements, wherein the DAG structure further comprises a plurality of sink nodes and zero or more intermediate nodes disposed between the plurality of source nodes and the plurality of sink nodes, wherein a verification path exists between each source node and each sink node, wherein the DAG structure is traffic load balanced such that all nodes of the DAG structure are accessed with a substantially uniform distribution; zero or more second nodes corresponding to the zero or more intermediate nodes; and a plurality of third nodes corresponding to the plurality of sink nodes, wherein the plurality of third nodes each store a signed digest for the plurality of data elements.
12. The communication network of claim 11, wherein the signed digest comprises a hash value of a block of the distributed authentication structure.
13. The communication network of claim 12, wherein the distributed authentication structure comprises a corresponding hash value for each node of the DAG structure, wherein the corresponding hash value for each first node comprises a hash of the at least one data element stored at the respective first node, wherein the corresponding hash value for each second node comprises a concatenation of corresponding hash values for predecessor nodes of the respective second node.
14. The communication network of claim 11, 12 or 13, wherein the communication network is configured, in response to a query, to provide a corresponding answer and a proof, wherein the corresponding answer comprises at least one corresponding data element, wherein the proof comprises proof information indicative of a randomly- selected verification path in the DAG structure from the source node storing the at least one corresponding data element to a sink node storing the signed digest.
15. The communication network of claim 14, wherein the corresponding answer is authenticated by hashing along the answer and the proof and comparing the resulting digest with the signed digest..
16. The communication network of claim 11, 12, 13, 14 or 15, wherein each data element of the plurality of data elements comprises a key- value pair.
17. The communication network of claim 11, 12, 13, 14, 15 or 16, wherein the communication network comprises a peer-to-peer network.
PCT/US2007/017046 2006-07-28 2007-07-30 Load-balanced distributed authentication structures Ceased WO2008014004A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US83387606P 2006-07-28 2006-07-28
US60/833,876 2006-07-28

Publications (2)

Publication Number Publication Date
WO2008014004A2 true WO2008014004A2 (en) 2008-01-31
WO2008014004A3 WO2008014004A3 (en) 2008-09-04

Family

ID=38982145

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/017046 Ceased WO2008014004A2 (en) 2006-07-28 2007-07-30 Load-balanced distributed authentication structures

Country Status (1)

Country Link
WO (1) WO2008014004A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8402530B2 (en) 2010-07-30 2013-03-19 Microsoft Corporation Dynamic load redistribution among distributed servers
US8726034B2 (en) 2008-08-29 2014-05-13 Brown University Cryptographic accumulators for authenticated hash tables
CN113901485A (en) * 2021-12-07 2022-01-07 展讯通信(天津)有限公司 Application program loading method, electronic device and storage medium
CN114896434A (en) * 2022-07-13 2022-08-12 之江实验室 Hash code generation method and device based on center similarity learning
CN119847775A (en) * 2025-03-20 2025-04-18 广东奥飞数据科技股份有限公司 Intelligent management method and system for computing network resources

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100606341B1 (en) * 2003-12-22 2006-07-28 엘지노텔 주식회사 Port number setting method in subscriber device of ATM switching system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8726034B2 (en) 2008-08-29 2014-05-13 Brown University Cryptographic accumulators for authenticated hash tables
US9098725B2 (en) 2008-08-29 2015-08-04 Brown University Cryptographic accumulators for authenticated hash tables
US8402530B2 (en) 2010-07-30 2013-03-19 Microsoft Corporation Dynamic load redistribution among distributed servers
CN113901485A (en) * 2021-12-07 2022-01-07 展讯通信(天津)有限公司 Application program loading method, electronic device and storage medium
CN114896434A (en) * 2022-07-13 2022-08-12 之江实验室 Hash code generation method and device based on center similarity learning
CN114896434B (en) * 2022-07-13 2022-11-18 之江实验室 Hash code generation method and device based on center similarity learning
CN119847775A (en) * 2025-03-20 2025-04-18 广东奥飞数据科技股份有限公司 Intelligent management method and system for computing network resources

Also Published As

Publication number Publication date
WO2008014004A3 (en) 2008-09-04

Similar Documents

Publication Publication Date Title
US7974221B2 (en) Efficient content authentication in peer-to-peer networks
Tamassia Authenticated data structures
Sun et al. Verifiable privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking
Li et al. Authenticated index structures for aggregation queries
Rady et al. Integrity and confidentiality in cloud outsourced data
Li et al. Integrity-verifiable conjunctive keyword searchable encryption in cloud storage
US20040111608A1 (en) Secure recovery in a serverless distributed file system
Hahn et al. Enabling fast public auditing and data dynamics in cloud services
Wang et al. A dynamic-efficient structure for secure and verifiable location-based skyline queries
Wang et al. Efficient and secure storage for outsourced data: a survey
Liu et al. Secure similarity-based cloud data deduplication in ubiquitous city
Cheng et al. Query assurance verification for outsourced multi-dimensional databases
Eaton et al. Improving the privacy of Tor onion services
WO2008014004A2 (en) Load-balanced distributed authentication structures
Tamassia et al. Efficient content authentication in peer-to-peer networks
He et al. Enabling Decentralized and Dynamic Data Integrity Verification for Secure Cloud Storage via T‐Merkle Hash Tree Based Blockchain
Wang et al. A lightweight data integrity verification with data dynamics for mobile edge computing
Liu et al. Forward secure searchable encryption with conjunctive-keyword supporting multi-user
Prakasha et al. Efficient digital certificate verification in wireless public key infrastructure using enhanced certificate revocation list
Lu et al. Secure dynamic big graph data: Scalable, low-cost remote data integrity checking
Zhang et al. Achieving public verifiability and data dynamics for cloud data in the standard model
Wang et al. Bucket‐based authentication for outsourced databases
Hou et al. Provable Multiple-Replica Dynamic Data Possession for Big Data Storage in Cloud Computing.
Ghosh et al. Efficient verifiable range and closest point queries in zero-knowledge
Ren et al. Blockchain-based cross-domain query integrity verification mechanism for outsourced database

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07810909

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

122 Ep: pct application non-entry in european phase

Ref document number: 07810909

Country of ref document: EP

Kind code of ref document: A2