BitTorrent extension for arbitrary DHT store
+Author: | +Arvid Norberg, arvid@rasterbar.com |
---|---|
Version: | +Draft |
Table of contents
+ +This is a proposal for an extension to the BitTorrent DHT to allow +storing and retrieving of arbitrary data.
+It supports both storing immutable items, where the key is +the SHA-1 hash of the data itself, and mutable items, where +the key is the public key of the key pair used to sign the data.
+There are two new proposed messages, put and get.
+terminology
+In this document, a storage node refers to the node in the DHT to which +an item is being announced and stored on. A subscribing node refers to +a node which makes look-ups in the DHT to find the storage nodes, to +request items from them, and possibly re-announce those items to keep them +alive.
+messages
+The proposed new messages get and put are similar to the existing get_peers +and announce_peer.
+Responses to get should always include nodes and nodes6 has the same +semantics as in its get_peers response. It should also include a write token, +token, with the same semantics as get_peers.
+The id field in these messages has the same semantics as the standard DHT messages, +i.e. the node ID of the node sending the message, to maintain the structure of the DHT +network.
+The token field also has the same semantics as the standard DHT message get_peers +and announce_peer, when requesting an item and to write an item respectively.
+The distinction between storing mutable and immutable items is the inclusion +of a public key, a sequence number and signature (k, seq and sig). +The distinction betwewn retrieving a mutable and immutable item is the inclusion of +the public key spill-over (k) in the get request.
+The v key is the value to be stored. It is allowed to be any bencoded type (list, +dict, string or integer). When it's being hashed (for verifying its signature or to calculate +its key), its flattened, bencoded, form is used.
+Storing nodes are SHOULD reject put requests where the bencoded form of v is longer +than 767 bytes.
+immutable items
+Immutable items are stored under their SHA-1 hash, and since they cannot be modified, +there is no need to authenticate the origin of them. This makes immutable items simple.
+put message
+Request:
++{ + "a": + { + "id": <20 byte id of sending node (string)>, + "v": <any bencoded type, whose encoded size < 768> + }, + "t": <transaction-id (string)>, + "y": "q", + "q": "put" +} ++
Response:
++{ + "r": { "id": <20 byte id of sending node (string)> }, + "t": <transaction-id (string)>, + "y": "r", +} ++
get message
+Request:
++{ + "a": + { + "id": <20 byte id of sending node (string)>, + "target": <SHA-1 hash of item (string)>, + }, + "t": <transaction-id (string)>, + "y": "q", + "q": "get" +} ++
Response:
++{ + "r": + { + "id": <20 byte id of sending node (string)>, + "token": <write token (string)>, + "v": <any bencoded type whose SHA-1 hash matches 'target'>, + "nodes": <IPv4 nodes close to 'target'> + "nodes6": <IPv6 nodes close to 'target'> + }, + "t": <transaction-id>, + "y": "r", +} ++
mutable items
+Mutable items can be updated, without changing their DHT keys. To authenticate +that only the original publisher can update an item, it is signed by a private key +generated by the original publisher.
+In order to avoid a malicious node to overwrite the list head with an old +version, the sequence number seq must be monotonically increasing for each update, +and a node hosting the list node MUST not downgrade a list head from a higher sequence +number to a lower one, only upgrade.
+The signature is a 2048 bit RSA signature of the SHA-1 hash of the bencoded sequence +number and v key. e.g. something like this:: 3:seqi4e1:v12:Hello world!.
+put message
+Request:
++{ + "a": + { + "id": <20 byte id of sending node (string)>, + "k": <RSA-2048 public key (268 bytes string)>, + "seq": <monotonically increasing sequence number (integer)>, + "sig": <RSA-2048 signature (256 bytes string)>, + "token": <write-token (string)>, + "v": <any bencoded type, whose encoded size < 768> + }, + "t": <transaction-id (string)>, + "y": "q", + "q": "put" +} ++
Storing nodes receiving a put request where seq is lower than what's already +stored on the node, MUST reject the request.
+Response:
++{ + "r": { "id": <20 byte id of sending node (string)> }, + "t": <transaction-id (string)>, + "y": "r", +} ++
get message
+Request:
++{ + "r": + { + "id": <20 byte id of sending node (string)>, + "target:" <first 20 bytes of public key (string)>, + "k": <remaining 248 bytes of public key (string)> + }, + "t": <transaction-id (string)>, + "y": "r", + "q": "get" +} ++
Response:
++{ + "r": + { + "id": <20 byte id of sending node (string)>, + "k": <RSA-2048 public key (268 bytes string)>, + "seq": <monotonically increasing sequence number (integer)>, + "sig": <RSA-2048 signature (256 bytes string)>, + "token": <write-token (string)>, + "v": <any bencoded type, whose encoded size < 768> + + }, + "t": <transaction-id (string)>, + "y": "r", +} ++
signature verification
+In order to make it maximally difficult to attack the bencoding parser, signing and verification of the +value and sequence number should be done as follows:
+-
+
- encode value and sequence number separately +
- concatenate "3:seqi" seq "e1:v" and the encoded value. +sequence number 1 of value "Hello World!" would be converted to: 3:seqi1e1:v12:Hello World! +In this way it is not possible to convince a node that part of the length is actually part of the +sequence number even if the parser contains certain bugs. Furthermore it is not possible to have a +verification failure if a bencoding serializer alters the order of entries in the dictionary. +
- hash the concatenated string with SHA-1 +
- sign or verify the hash digest. +
expiration
+Without re-announcement, these items MAY expire in 2 hours. In order +to keep items alive, they SHOULD be re-announced once an hour.
+Subscriber nodes MAY help out in announcing items the are interested in to the DHT, +to keep them alive.
+