BitTorrent extension for DHT RSS feeds

Author: Arvid Norberg, arvid@rasterbar.com
Version: Draft

This is a proposal for an extension to the BitTorrent DHT to allow for decentralized RSS feed like functionality.

The intention is to allow the creation of repositories of torrents where only a single identity has the authority to add new content. For this repository to be robust against network failures and resilient to attacks at the source.

The target ID under which the repository is stored in the DHT, is the SHA-1 hash of a feed name and the 512 bit public key. This private key in this pair MUST be used to sign every item stored in the repository. Every message that contain signed items MUST also include this key, to allow the receiver to verify the key itself against the target ID as well as the validity of the signatures of the items. Every recipient of a message with feed items in it MUST verify both the validity of the public key against the target ID it is stored under, as well as the validity of the signatures of each individual item.

Any peer who is subscribing to a DHT feed SHOULD also participate in regularly re-announcing items that it knows about. Every participant SHOULD store items in long term storage, across sessions, in order to keep items alive for as long as possible, with as few sources as possible.

As with normal DHT announces, the write-token mechanism is used to prevent spoof attacks.

There are two new proposed messages, announce_item and get_item. Every valid item that is announced, should be stored. In a request to get items, as many items as can fit in a normal UDP packet size should be returned. If there are more items than can fit, a random sub-set should be returned.

Is there a better heuristic here? Should there be a bias towards newer items? If so, there needs to be a signed timestamp as well, which might get messy

target ID

The target, i.e. the ID in the DHT key space feeds are announced to, MUST always be SHA-1(feed_name + public_key). Any request where this condition is not met, MUST be dropped.

Using the feed name as part of the target means a feed publisher only needs one public-private keypair for any number of feeds, as long as the feeds have different names.

messages

These are the proposed new message formats.

requesting items

{
        "a":
        {
                "filter": <variable size bloom-filter>,
                "id": <20 byte id of origin node>,
                "key": <64 byte public curve25519 key for this feed>,
                "n": <feed-name>
                "target": <target-id as derived from public key>
        },
        "q": "get_item",
        "t": <transaction-id>,
        "y": "q",
}

The target MUST always be SHA-1(feed_name + public_key). Any request where this condition is not met, MUST be dropped.

The n field is the name of this feed. It MUST be UTF-8 encoded string and it MUST match the name of the feed in the receiving node.

The bloom filter argument (filter) in the get_item requests is optional. If included in a request, it represents info-hashes that should be excluded from the response. In this case, the response should be a random subset of the non-excluded items, or all of the non-excluded items if they all fit within a packet size.

If the bloom filter is specified, its size MUST be an even multiple of 8 bits. The size is implied by the length of the string. For each info-hash to exclude from the response,

There are no hash functions for the bloom filter. Since the info-hash is already a hash digest, each pair of bytes, starting with the first bytes (MSB), are used as the results from the imaginary hash functions for the bloom filter. k is 3 in this bloom filter. This means the first 6 bytes of the info-hash is used to set 3 bits in the bloom filter. The pairs of bytes pulled out from the info-hash are interpreted as a big-endian 16 bit value.

Bits are indexed in bytes from left to right, and within bytes from LSB to MSB. i.e., to set bit 12: bitfield[12/8] |= (12 % 8).

Example:
To indicate that you are not interested in knowing about the info-hash that starts with 0x4f7d25a... and you choose a bloom filter of size 80 bits. Set bits (0x4f % 80), (0x7d % 80) and (0x25 % 80) in the bloom filter bitmask.

request item response

{
        "r":
        {
                "ih":
                [
                        <n * 20 byte(s) info-hash>,
                        ...
                ],
                "sig":
                [
                        <64 byte curve25519 signature of info-hash>,
                        ...
                ],
                "id": <20 byte id of origin node>,
                "token": <write-token>
                "nodes": <n * compact IPv4-port pair>
                "nodes6": <n * compact IPv6-port pair>
        },
        "t": <transaction-id>,
        "y": "r",
}

Since the data that's being signed by the public key already is a hash (i.e. an info-hash), the signature of each hash-entry is simply the hash encrypted by the feed's private key.

The ih and sig lists MUST have equal number of items. Each item in sig is the signature of the full string in the corresponding item in the ih list.

Each item in the ih list may contain any positive number of 20 byte info-hashes.

The rationale behind using lists of strings where the strings contain multiple info-hashes is to allow the publisher of a feed to sign multiple info-hashes together, and thus saving space in the UDP packets, allowing nodes to transfer more info-hashes per packet. Original publishers of a feed MAY re-announce items lumped together over time to make the feed more efficient.

A client receiving a get_item response MUST verify each signature in the sig list against each corresponding item in the ih list using the feed's public key. Any item whose signature

nodes and nodes6 are optional and have the same semantics as the standard get_peers request. The intention is to be able to use this get_item request in the same way, searching for the nodes responsible for the feed.

announcing items

{
        "a":
        {
                "ih":
                [
                        <n * 20 byte info-hash(es)>,
                        ...
                ],
                "sig":
                [
                        <64 byte curve25519 signature of info-hash(es)>,
                        ...
                ],
                "id": <20 byte node-id of origin node>,
                "key": <64 byte public curve25519 key for this feed>,
                "n": <feed name>
                "target": <target-id as derived from public key>,
                "token": <write-token as obtained by previous req.>
        },
        "y": "q",
        "q": "announce_item",
        "t": <transaction-id>
}

An announce can include any number of items, as long as they fit in a packet.

Subscribers to a feed SHOULD also announce items that they know of, to the feed. In order to make the repository of torrents as reliable as possible, subscribers SHOULD announce random items from their local repository of items. When re-announcing items, a random subset of all known items should be announced, randomized independently for each node it's announced to. This makes it a little bit harder to determine the IP address an item originated from, since it's a matter of seeing the first announce, and knowing that it wasn't announced anywhere else first.

Any subscriber and publisher SHOULD re-announce items every 30 minutes. If a feed does not receive any announced items in 60 minutes, a peer MAY time it out and remove it.

Subscribers and publishers SHOULD announce random items.

example

This is an example of an announce_item message:

{
        "a":
        {
                "ih":
                [
                        "7ea94c240691311dc0916a2a91eb7c3db2c6f3e4",
                        "0d92ad53c052ac1f49cf4434afffafa4712dc062e4168d940a48e45a45a0b10808014dc267549624"
                ],
                "sig":
                [
                        "980774404e404941b81aa9da1da0101cab54e670cff4f0054aa563c3b5abcb0fe3c6df5dac1ea25266035f09040bf2a24ae5f614787f1fe7404bf12fee5e6101",
                        "3fee52abea47e4d43e957c02873193fb9aec043756845946ec29cceb1f095f03d876a7884e38c53cd89a8041a2adfb2d9241b5ec5d70268714d168b9353a2c01"
                ],
                "id": "b46989156404e8e0acdb751ef553b210ef77822e",
                "key": "6bc1de5443d1a7c536cdf69433ac4a7163d3c63e2f9c92d78f6011cf63dbcd5b638bbc2119cdad0c57e4c61bc69ba5e2c08b918c2db8d1848cf514bd9958d307",
                "n": "my stuff"
                "target": "b4692ef0005639e86d7165bf378474107bf3a762"
                "token": "23ba"
        },
        "y": "q",
        "q": "announce_item",
        "t": "a421"
}

Strings are printed in hex for printability, but actual encoding is binary. The response contains 3 feed items, starting with "7ea94c", "0d92ad" and "e4168d". These 3 items are not published optimally. If they were to be merged into a single string in the ih list, more than 64 bytes would be saved (because of having one less signature).

Note that target is in fact SHA1('my stuff' + 'key'). The private key used in this example is 980f4cd7b812ae3430ea05af7c09a7e430275f324f42275ca534d9f7c6d06f5b.

URI scheme

The proposed URI scheme for DHT feeds is:

magnet:?xt=btfd:<base16-curve25519-public-key> &dn= <feed name>

Note that a difference from regular torrent magnet links is the btfd versus btih used in regular magnet links to torrents.

The feed name is mandatory since it is used in the request and when calculating the target ID.

rationale

The reason to use curve25519 instead of, for instance, RSA is to fit more signatures (i.e. items) in a single DHT packet. One packet is typically restricted to between 1280 - 1480 bytes. According to http://cr.yp.to/, curve25519 is free from patent claims and there are open implementations in both C and Java.