premiere-libtorrent/docs/dht_store.rst

245 lines
6.5 KiB
ReStructuredText
Raw Normal View History

============================================
BitTorrent extension for arbitrary DHT store
============================================
:Author: Arvid Norberg, arvid@rasterbar.com
:Version: Draft
.. contents:: Table of contents
:depth: 2
:backlinks: none
This is a proposal for an extension to the BitTorrent DHT to allow
storing and retrieving of arbitrary data.
It supports both storing *immutable* items, where the key is
the SHA-1 hash of the data itself, and *mutable* items, where
the key is the public key of the key pair used to sign the data.
There are two new proposed messages, ``put`` and ``get``.
terminology
-----------
In this document, a *storage node* refers to the node in the DHT to which
an item is being announced and stored on. A *subscribing node* refers to
a node which makes look-ups in the DHT to find the storage nodes, to
request items from them, and possibly re-announce those items to keep them
alive.
messages
--------
The proposed new messages ``get`` and ``put`` are similar to the existing ``get_peers``
and ``announce_peer``.
Responses to ``get`` should always include ``nodes`` and ``nodes6`` has the same
semantics as in its ``get_peers`` response. It should also include a write token,
``token``, with the same semantics as ``get_peers``.
The ``id`` field in these messages has the same semantics as the standard DHT messages,
i.e. the node ID of the node sending the message, to maintain the structure of the DHT
network.
The ``token`` field also has the same semantics as the standard DHT message ``get_peers``
and ``announce_peer``, when requesting an item and to write an item respectively.
The distinction between storing mutable and immutable items is the inclusion
of a public key, a sequence number and signature (``k``, ``seq`` and ``sig``).
The distinction betwewn retrieving a mutable and immutable item is the inclusion of
the public key spill-over (``k``) in the ``get`` request.
The ``v`` key is the *value* to be stored. It is allowed to be any bencoded type (list,
dict, string or integer). When it's being hashed (for verifying its signature or to calculate
its key), its flattened, bencoded, form is used.
Storing nodes are SHOULD reject ``put`` requests where the bencoded form of ``v`` is longer
than 767 bytes.
immutable items
---------------
Immutable items are stored under their SHA-1 hash, and since they cannot be modified,
there is no need to authenticate the origin of them. This makes immutable items simple.
put message
...........
Request:
.. parsed-literal::
{
"a":
{
"id": *<20 byte id of sending node (string)>*,
"v": *<any bencoded type, whose encoded size < 768>*
},
"t": *<transaction-id (string)>*,
"y": "q",
"q": "put"
}
Response:
.. parsed-literal::
{
"r": { "id": *<20 byte id of sending node (string)>* },
"t": *<transaction-id (string)>*,
"y": "r",
}
get message
...........
Request:
.. parsed-literal::
{
"a":
{
"id": *<20 byte id of sending node (string)>*,
"target": *<SHA-1 hash of item (string)>*,
},
"t": *<transaction-id (string)>*,
"y": "q",
"q": "get"
}
Response:
.. parsed-literal::
{
"r":
{
"id": *<20 byte id of sending node (string)>*,
"token": *<write token (string)>*,
"v": *<any bencoded type whose SHA-1 hash matches 'target'>*,
"nodes": *<IPv4 nodes close to 'target'>*
"nodes6": *<IPv6 nodes close to 'target'>*
},
"t": *<transaction-id>*,
"y": "r",
}
mutable items
-------------
Mutable items can be updated, without changing their DHT keys. To authenticate
that only the original publisher can update an item, it is signed by a private key
generated by the original publisher.
In order to avoid a malicious node to overwrite the list head with an old
version, the sequence number ``seq`` must be monotonically increasing for each update,
and a node hosting the list node MUST not downgrade a list head from a higher sequence
number to a lower one, only upgrade.
The signature is a 2048 bit RSA signature of the SHA-1 hash of the bencoded sequence
number and ``v`` key. e.g. something like this:: ``3:seqi4e1:v12:Hello world!``.
put message
...........
Request:
.. parsed-literal::
{
"a":
{
"id": *<20 byte id of sending node (string)>*,
"k": *<RSA-2048 public key (268 bytes string)>*,
"seq": *<monotonically increasing sequence number (integer)>*,
"sig": *<RSA-2048 signature (256 bytes string)>*,
"token": *<write-token (string)>*,
"v": *<any bencoded type, whose encoded size < 768>*
},
"t": *<transaction-id (string)>*,
"y": "q",
"q": "put"
}
Storing nodes receiving a ``put`` request where ``seq`` is lower than what's already
stored on the node, MUST reject the request.
Response:
.. parsed-literal::
{
"r": { "id": *<20 byte id of sending node (string)>* },
"t": *<transaction-id (string)>*,
"y": "r",
}
get message
...........
Request:
.. parsed-literal::
{
"r":
{
"id": *<20 byte id of sending node (string)>*,
"target:" *<first 20 bytes of public key (string)>*,
"k": *<remaining 248 bytes of public key (string)>*
},
"t": *<transaction-id (string)>*,
"y": "r",
"q": "get"
}
Response:
.. parsed-literal::
{
"r":
{
"id": *<20 byte id of sending node (string)>*,
"k": *<RSA-2048 public key (268 bytes string)>*,
"seq": *<monotonically increasing sequence number (integer)>*,
"sig": *<RSA-2048 signature (256 bytes string)>*,
"token": *<write-token (string)>*,
"v": *<any bencoded type, whose encoded size < 768>*
},
"t": *<transaction-id (string)>*,
"y": "r",
}
signature verification
----------------------
In order to make it maximally difficult to attack the bencoding parser, signing and verification of the
value and sequence number should be done as follows:
1. encode value and sequence number separately
2. concatenate "3:seqi" ``seq`` "e1:v" and the encoded value.
sequence number 1 of value "Hello World!" would be converted to: 3:seqi1e1:v12:Hello World!
In this way it is not possible to convince a node that part of the length is actually part of the
sequence number even if the parser contains certain bugs. Furthermore it is not possible to have a
verification failure if a bencoding serializer alters the order of entries in the dictionary.
3. hash the concatenated string with SHA-1
4. sign or verify the hash digest.
expiration
----------
Without re-announcement, these items MAY expire in 2 hours. In order
to keep items alive, they SHOULD be re-announced once an hour.
Subscriber nodes MAY help out in announcing items the are interested in to the DHT,
to keep them alive.
test vectors
------------