271 lines
9.4 KiB
ReStructuredText
271 lines
9.4 KiB
ReStructuredText
======================================
|
|
BitTorrent extension for DHT RSS feeds
|
|
======================================
|
|
|
|
:Author: Arvid Norberg, arvid@rasterbar.com
|
|
:Version: Draft
|
|
|
|
.. contents:: Table of contents
|
|
:depth: 2
|
|
:backlinks: none
|
|
|
|
This is a proposal for an extension to the BitTorrent DHT to allow
|
|
for decentralized RSS feed like functionality.
|
|
|
|
The intention is to allow the creation of repositories of torrents
|
|
where only a single identity has the authority to add new content. For
|
|
this repository to be robust against network failures and resilient
|
|
to attacks at the source.
|
|
|
|
The target ID under which the repository is stored in the DHT, is the
|
|
SHA-1 hash of a feed name and the 512 bit public key. This private key
|
|
in this pair MUST be used to sign every item stored in the repository.
|
|
Every message that contain signed items MUST also include this key, to
|
|
allow the receiver to verify the key itself against the target ID as well
|
|
as the validity of the signatures of the items. Every recipient of a
|
|
message with feed items in it MUST verify both the validity of the public
|
|
key against the target ID it is stored under, as well as the validity of
|
|
the signatures of each individual item.
|
|
|
|
Any peer who is subscribing to a DHT feed SHOULD also participate in
|
|
regularly re-announcing items that it knows about. Every participant
|
|
SHOULD store items in long term storage, across sessions, in order to
|
|
keep items alive for as long as possible, with as few sources as possible.
|
|
|
|
As with normal DHT announces, the write-token mechanism is used to
|
|
prevent spoof attacks.
|
|
|
|
There are two new proposed messages, ``announce_item`` and ``get_item``.
|
|
Every valid item that is announced, should be stored. In a request to get items,
|
|
as many items as can fit in a normal UDP packet size should be returned. If
|
|
there are more items than can fit, a random sub-set should be returned.
|
|
|
|
*Is there a better heuristic here? Should there be a bias towards newer items?
|
|
If so, there needs to be a signed timestamp as well, which might get messy*
|
|
|
|
target ID
|
|
---------
|
|
|
|
The target, i.e. the ID in the DHT key space feeds are announced to, MUST always
|
|
be SHA-1(*feed_name* + *public_key*). Any request where this condition is not met,
|
|
MUST be dropped.
|
|
|
|
Using the feed name as part of the target means a feed publisher only needs one
|
|
public-private keypair for any number of feeds, as long as the feeds have different
|
|
names.
|
|
|
|
messages
|
|
--------
|
|
|
|
These are the proposed new message formats.
|
|
|
|
requesting items
|
|
................
|
|
|
|
.. parsed-literal::
|
|
|
|
{
|
|
"a":
|
|
{
|
|
"filter": *<variable size bloom-filter>*,
|
|
"id": *<20 byte id of origin node>*,
|
|
"key": *<64 byte public curve25519 key for this feed>*,
|
|
"n": *<feed-name>*
|
|
"target": *<target-id as derived from public key>*
|
|
},
|
|
"q": "get_item",
|
|
"t": *<transaction-id>*,
|
|
"y": "q",
|
|
}
|
|
|
|
The ``target`` MUST always be SHA-1(*feed_name* + *public_key*). Any request where
|
|
this condition is not met, MUST be dropped.
|
|
|
|
The ``n`` field is the name of this feed. It MUST be UTF-8 encoded string and it
|
|
MUST match the name of the feed in the receiving node.
|
|
|
|
The bloom filter argument (``filter``) in the ``get_item`` requests is optional.
|
|
If included in a request, it represents info-hashes that should be excluded from
|
|
the response. In this case, the response should be a random subset of the non-excluded
|
|
items, or all of the non-excluded items if they all fit within a packet size.
|
|
|
|
If the bloom filter is specified, its size MUST be an even multiple of 8 bits. The size
|
|
is implied by the length of the string. For each info-hash to exclude from the response,
|
|
|
|
There are no hash functions for the bloom filter. Since the info-hash is already a
|
|
hash digest, each pair of bytes, starting with the first bytes (MSB), are used as the
|
|
results from the imaginary hash functions for the bloom filter. k is 3 in this bloom
|
|
filter. This means the first 6 bytes of the info-hash is used to set 3 bits in the bloom
|
|
filter. The pairs of bytes pulled out from the info-hash are interpreted as a big-endian
|
|
16 bit value.
|
|
|
|
Bits are indexed in bytes from left to right, and within bytes from LSB to MSB. i.e., to
|
|
set bit 12: ``bitfield[12/8] |= (12 % 8)``.
|
|
|
|
Example:
|
|
To indicate that you are not interested in knowing about the info-hash that
|
|
starts with 0x4f7d25a... and you choose a bloom filter of size 80 bits. Set bits
|
|
(0x4f % 80), (0x7d % 80) and (0x25 % 80) in the bloom filter bitmask.
|
|
|
|
|
|
request item response
|
|
.....................
|
|
|
|
.. parsed-literal::
|
|
|
|
{
|
|
"r":
|
|
{
|
|
"ih":
|
|
[
|
|
*<n * 20 byte(s) info-hash>*,
|
|
...
|
|
],
|
|
"sig":
|
|
[
|
|
*<64 byte curve25519 signature of info-hash>*,
|
|
...
|
|
],
|
|
"id": *<20 byte id of origin node>*,
|
|
"token": *<write-token>*
|
|
"nodes": *<n * compact IPv4-port pair>*
|
|
"nodes6": *<n * compact IPv6-port pair>*
|
|
},
|
|
"t": *<transaction-id>*,
|
|
"y": "r",
|
|
}
|
|
|
|
Since the data that's being signed by the public key already is a hash (i.e.
|
|
an info-hash), the signature of each hash-entry is simply the hash encrypted
|
|
by the feed's private key.
|
|
|
|
The ``ih`` and ``sig`` lists MUST have equal number of items. Each item in ``sig``
|
|
is the signature of the full string in the corresponding item in the ``ih`` list.
|
|
|
|
Each item in the ``ih`` list may contain any positive number of 20 byte info-hashes.
|
|
|
|
The rationale behind using lists of strings where the strings contain multiple
|
|
info-hashes is to allow the publisher of a feed to sign multiple info-hashes
|
|
together, and thus saving space in the UDP packets, allowing nodes to transfer more
|
|
info-hashes per packet. Original publishers of a feed MAY re-announce items lumped
|
|
together over time to make the feed more efficient.
|
|
|
|
A client receiving a ``get_item`` response MUST verify each signature in the ``sig``
|
|
list against each corresponding item in the ``ih`` list using the feed's public key.
|
|
Any item whose signature
|
|
|
|
``nodes`` and ``nodes6`` are optional and have the same semantics as the standard
|
|
``get_peers`` request. The intention is to be able to use this ``get_item`` request
|
|
in the same way, searching for the nodes responsible for the feed.
|
|
|
|
announcing items
|
|
................
|
|
|
|
.. parsed-literal::
|
|
|
|
{
|
|
"a":
|
|
{
|
|
"ih":
|
|
[
|
|
*<n * 20 byte info-hash(es)>*,
|
|
...
|
|
],
|
|
"sig":
|
|
[
|
|
*<64 byte curve25519 signature of info-hash(es)>*,
|
|
...
|
|
],
|
|
"id": *<20 byte node-id of origin node>*,
|
|
"key": *<64 byte public curve25519 key for this feed>*,
|
|
"n": *<feed name>*
|
|
"target": *<target-id as derived from public key>*,
|
|
"token": *<write-token as obtained by previous req.>*
|
|
},
|
|
"y": "q",
|
|
"q": "announce_item",
|
|
"t": *<transaction-id>*
|
|
}
|
|
|
|
An announce can include any number of items, as long as they fit in a packet.
|
|
|
|
Subscribers to a feed SHOULD also announce items that they know of, to the feed.
|
|
In order to make the repository of torrents as reliable as possible, subscribers
|
|
SHOULD announce random items from their local repository of items. When re-announcing
|
|
items, a random subset of all known items should be announced, randomized
|
|
independently for each node it's announced to. This makes it a little bit harder
|
|
to determine the IP address an item originated from, since it's a matter of
|
|
seeing the first announce, and knowing that it wasn't announced anywhere else
|
|
first.
|
|
|
|
Any subscriber and publisher SHOULD re-announce items every 30 minutes. If
|
|
a feed does not receive any announced items in 60 minutes, a peer MAY time
|
|
it out and remove it.
|
|
|
|
Subscribers and publishers SHOULD announce random items.
|
|
|
|
example
|
|
.......
|
|
|
|
This is an example of an ``announce_item`` message::
|
|
|
|
{
|
|
"a":
|
|
{
|
|
"ih":
|
|
[
|
|
"7ea94c240691311dc0916a2a91eb7c3db2c6f3e4",
|
|
"0d92ad53c052ac1f49cf4434afffafa4712dc062e4168d940a48e45a45a0b10808014dc267549624"
|
|
],
|
|
"sig":
|
|
[
|
|
"980774404e404941b81aa9da1da0101cab54e670cff4f0054aa563c3b5abcb0fe3c6df5dac1ea25266035f09040bf2a24ae5f614787f1fe7404bf12fee5e6101",
|
|
"3fee52abea47e4d43e957c02873193fb9aec043756845946ec29cceb1f095f03d876a7884e38c53cd89a8041a2adfb2d9241b5ec5d70268714d168b9353a2c01"
|
|
],
|
|
"id": "b46989156404e8e0acdb751ef553b210ef77822e",
|
|
"key": "6bc1de5443d1a7c536cdf69433ac4a7163d3c63e2f9c92d78f6011cf63dbcd5b638bbc2119cdad0c57e4c61bc69ba5e2c08b918c2db8d1848cf514bd9958d307",
|
|
"n": "my stuff"
|
|
"target": "b4692ef0005639e86d7165bf378474107bf3a762"
|
|
"token": "23ba"
|
|
},
|
|
"y": "q",
|
|
"q": "announce_item",
|
|
"t": "a421"
|
|
}
|
|
|
|
Strings are printed in hex for printability, but actual encoding is binary. The
|
|
response contains 3 feed items, starting with "7ea94c", "0d92ad" and "e4168d".
|
|
These 3 items are not published optimally. If they were to be merged into a single
|
|
string in the ``ih`` list, more than 64 bytes would be saved (because of having
|
|
one less signature).
|
|
|
|
Note that ``target`` is in fact SHA1('my stuff' + 'key'). The private key
|
|
used in this example is 980f4cd7b812ae3430ea05af7c09a7e430275f324f42275ca534d9f7c6d06f5b.
|
|
|
|
|
|
URI scheme
|
|
----------
|
|
|
|
The proposed URI scheme for DHT feeds is:
|
|
|
|
.. parsed-literal::
|
|
|
|
magnet:?xt=btfd:*<base16-curve25519-public-key>* &dn= *<feed name>*
|
|
|
|
Note that a difference from regular torrent magnet links is the **btfd**
|
|
versus **btih** used in regular magnet links to torrents.
|
|
|
|
The *feed name* is mandatory since it is used in the request and when
|
|
calculating the target ID.
|
|
|
|
rationale
|
|
---------
|
|
|
|
The reason to use curve25519_ instead of, for instance, RSA is to fit more signatures
|
|
(i.e. items) in a single DHT packet. One packet is typically restricted to between
|
|
1280 - 1480 bytes. According to http://cr.yp.to/, curve25519 is free from patent claims
|
|
and there are open implementations in both C and Java.
|
|
|
|
.. _curve25519: http://cr.yp.to/ecdh.html
|
|
|