From ba0aed2282a0f933b3065e34fb2197937a5079e3 Mon Sep 17 00:00:00 2001 From: Arvid Norberg Date: Wed, 19 Jan 2011 05:57:44 +0000 Subject: [PATCH] initial support for DHT RSS feeds --- ChangeLog | 1 + Makefile.am | 2 + bindings/python/src/session_settings.cpp | 2 + docs/dht_rss.html | 447 ++++++++++++------- docs/dht_rss.rst | 451 +++++++++++++------- docs/manual.html | 8 +- docs/manual.rst | 10 +- include/libtorrent/kademlia/node.hpp | 52 ++- include/libtorrent/kademlia/rpc_manager.hpp | 6 +- include/libtorrent/session_settings.hpp | 12 +- src/kademlia/dht_tracker.cpp | 5 +- src/kademlia/node.cpp | 287 +++++++++++-- src/kademlia/rpc_manager.cpp | 9 +- test/test_dht.cpp | 311 +++++++++++--- test/test_primitives.cpp | 79 ++++ 15 files changed, 1253 insertions(+), 429 deletions(-) diff --git a/ChangeLog b/ChangeLog index abd82514f..698b22718 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,4 @@ + * added support for DHT rss feeds (storing only) * added support for RSS feeds * fixed up some edge cases in DHT routing table and improved unit test of it * added error category and error codes for HTTP errors diff --git a/Makefile.am b/Makefile.am index 430c69edb..aa7426495 100644 --- a/Makefile.am +++ b/Makefile.am @@ -78,6 +78,8 @@ DOCS_PAGES = \ docs/projects.rst \ docs/python_binding.html \ docs/python_binding.rst \ + docs/dht_rss.html \ + docs/dht_rss.rst \ docs/running_tests.html \ docs/running_tests.rst \ docs/tuning.html \ diff --git a/bindings/python/src/session_settings.cpp b/bindings/python/src/session_settings.cpp index fb90b8483..b8e5882f4 100644 --- a/bindings/python/src/session_settings.cpp +++ b/bindings/python/src/session_settings.cpp @@ -183,6 +183,8 @@ void bind_session_settings() .def_readwrite("service_port", &dht_settings::service_port) #endif .def_readwrite("max_fail_count", &dht_settings::max_fail_count) + .def_readwrite("max_torrents", &dht_settings::max_torrents) + .def_readwrite("max_feed_items", &dht_settings::max_feed_items) .def_readwrite("restrict_routing_ips", &dht_settings::restrict_routing_ips) .def_readwrite("restrict_search_ips", &dht_settings::restrict_search_ips) ; diff --git a/docs/dht_rss.html b/docs/dht_rss.html index 9f9001e1a..f94eaddaf 100644 --- a/docs/dht_rss.html +++ b/docs/dht_rss.html @@ -57,16 +57,24 @@

Table of contents

This is a proposal for an extension to the BitTorrent DHT to allow @@ -84,195 +92,303 @@ as the validity of the signatures of the items. Every recipient of a message with feed items in it MUST verify both the validity of the public key against the target ID it is stored under, as well as the validity of the signatures of each individual item.

-

Any peer who is subscribing to a DHT feed SHOULD also participate in -regularly re-announcing items that it knows about. Every participant -SHOULD store items in long term storage, across sessions, in order to -keep items alive for as long as possible, with as few sources as possible.

As with normal DHT announces, the write-token mechanism is used to -prevent spoof attacks.

+prevent IP spoof attacks.

There are two new proposed messages, announce_item and get_item. -Every valid item that is announced, should be stored. In a request to get items, -as many items as can fit in a normal UDP packet size should be returned. If -there are more items than can fit, a random sub-set should be returned.

-

Is there a better heuristic here? Should there be a bias towards newer items? -If so, there needs to be a signed timestamp as well, which might get messy

-
-

target ID

-

The target, i.e. the ID in the DHT key space feeds are announced to, MUST always -be SHA-1(feed_name + public_key). Any request where this condition is not met, -MUST be dropped.

-

Using the feed name as part of the target means a feed publisher only needs one -public-private keypair for any number of feeds, as long as the feeds have different -names.

+Every valid item that is announced, should be stored.

+
+

terminology

+

In this document, a storage node refers to the node in the DHT to which +an item is being announce. A subscribing node refers to a node which +makes look ups in the DHT to find the storage nodes, to request items +from them.

+
+
+

linked lists

+

Items are chained together in a geneal singly linked list. A linked +list does not necessarily contain RSS items, and no RSS related items +are mandatory. However, RSS items will be used as examples in this BEP:

+
+key = SHA1(name + key)
++---------+
+| head    |           key = SHA1(bencode(item))
+| +---------+         +---------+
+| | next    |-------->| item    |          key = SHA1(bencode(item))
+| | key     |         | +---------+        +---------+
+| | name    |         | | next    |------->| item    |
+| | seq     |         | | key     |        | +---------+
+| | ...     |         | | ...     |        | | next    |--->0
+| +---------+         | +---------+        | | key     |
+| sig     |           | sig     |          | | ...     |
++---------+           +---------+          | +---------+
+                                           | sig     |
+                                           +---------+
+
+

The next pointer is at least 20 byte ID in the DHT key space pointing to where the next +item in the list is announced. The list is terminated with an ID of all zeroes.

+

The ID an items is announced to is determined by the SHA1 hash of the bencoded representation +of the item iteself. This contains all fields in the item, except the signature. +The only mandatory fields in an item are next, key and sig.

+

The key field MUST match the public key of the list head node. The sig field +MUST be the signature of the bencoded representation of item or head (whichever +is included in the message).

+

All subscribers MUST verify that the item is announced under the correct DHT key +and MUST verify the signature is valid and MUST verify the public key is the same +as the list-head. If a node fails any of these checks, it must be ignored and the +chain of items considered terminated.

+

Each item holds a bencoded dictionary with arbitrary keys, except two mandatory keys: +next and key. The signature sig is transferred outside of this dictionary +and is the signature of all of it. An implementation should stora any arbitrary keys that +are announced to an item, within reasonable restriction such as nesting, size and numeric +range of integers.

+
+
+

skip lists

+

The next key stored in the list head and the items is a string of at least length +20 bytes, it may be any length divisible by 20. Each 20 bytes are the ID of the next +item in the list, the item 2 hops away, 4 hops away, 8 hops away, and so on. For +simplicity, only the first ID (1 hop) in the next field is illustrated above.

+

A publisher of an item SHOULD include as many IDs in the next field as the remaining +size of the list warrants, within reason.

+

These skip lists allow for parallelized lookups of items and also makes it more efficient +to search for specific items. It also mitigates breaking lists missing some items.

+

Figure of the skip list in the first list item:

+
+n      Item0  Item1  Item2  Item3  Item4  Item5  Item6  Item7  Item8  Item9  Item10
+0        O----->
+20       O------------>
+40       O-------------------------->
+60       O------------------------------------------------------>
+
+

n refers to the byte offset into the next field.

+
+
+

list-head

+

The list head item is special in that it can be updated, without changing its +DHT key. This is required to prepend new items to the linked list. To authenticate +that only the original publisher can update the head, the whole linked list head +is signed. In order to avoid a malicious node to overwrite the list head with an old +version, the sequence number seq must be monotonically increasing for each update, +and a node hosting the list node MUST not downgrade a list head from a higher sequence +number to a lower one, only upgrade.

+

The list head's DHT key (which it is announced to) MUST be the SHA1 hash of the name +(n) and key fields concatenated.

+

Any node MUST reject any list head which is announced under any other ID.

messages

-

These are the proposed new message formats.

+

These are the messages to deal with linked lists.

+

The id field in these messages has the same semantics as the standard DHT messages, +i.e. the node ID of the node sending the message, to maintain the structure of the DHT +network.

+

The token field also has the same semantics as the standard DHT message get_peers +and announce_peer, when requesting an item and to write an item respectively.

+

nodes and nodes6 has the same semantics as in its get_peers response.

requesting items

+

This message can be used to request both a list head and a list item. When requesting +a list head, the n (name) field MUST be specified. When requesting a list item the +n field is not required.

 {
-        "a":
-        {
-                "filter": <variable size bloom-filter>,
-                "id": <20 byte id of origin node>,
-                "key": <64 byte public curve25519 key for this feed>,
-                "n": <feed-name>
-                "target": <target-id as derived from public key>
-        },
-        "q": "get_item",
-        "t": <transaction-id>,
-        "y": "q",
+   "a":
+   {
+      "id": <20 byte ID of sending node>,
+      "key": <64 byte public curve25519 key for this list>,
+      "n": <list name>
+      "target": <target-id for 'head' or 'item'>
+   },
+   "q": "get_item",
+   "t": <transaction-id>,
+   "y": "q",
 }
 
-

The target MUST always be SHA-1(feed_name + public_key). Any request where -this condition is not met, MUST be dropped.

-

The n field is the name of this feed. It MUST be UTF-8 encoded string and it -MUST match the name of the feed in the receiving node.

-

The bloom filter argument (filter) in the get_item requests is optional. -If included in a request, it represents info-hashes that should be excluded from -the response. In this case, the response should be a random subset of the non-excluded -items, or all of the non-excluded items if they all fit within a packet size.

-

If the bloom filter is specified, its size MUST be an even multiple of 8 bits. The size -is implied by the length of the string. For each info-hash to exclude from the response,

-

There are no hash functions for the bloom filter. Since the info-hash is already a -hash digest, each pair of bytes, starting with the first bytes (MSB), are used as the -results from the imaginary hash functions for the bloom filter. k is 3 in this bloom -filter. This means the first 6 bytes of the info-hash is used to set 3 bits in the bloom -filter. The pairs of bytes pulled out from the info-hash are interpreted as a big-endian -16 bit value.

-

Bits are indexed in bytes from left to right, and within bytes from LSB to MSB. i.e., to -set bit 12: bitfield[12/8] |= (12 % 8).

-
-
Example:
-
To indicate that you are not interested in knowing about the info-hash that -starts with 0x4f7d25a... and you choose a bloom filter of size 80 bits. Set bits -(0x4f % 80), (0x7d % 80) and (0x25 % 80) in the bloom filter bitmask.
-
+

When requesting a list-head the target MUST always be SHA-1(feed_name + public_key). +target is the target node ID the item was written to.

+

The n field is the name of the list. If specified, It MUST be UTF-8 encoded string +and it MUST match the name of the feed in the receiving node.

request item response

+

This is the format of a response of a list head:

 {
-        "r":
-        {
-                "ih":
-                [
-                        <n * 20 byte(s) info-hash>,
-                        ...
-                ],
-                "sig":
-                [
-                        <64 byte curve25519 signature of info-hash>,
-                        ...
-                ],
-                "id": <20 byte id of origin node>,
-                "token": <write-token>
-                "nodes": <n * compact IPv4-port pair>
-                "nodes6": <n * compact IPv6-port pair>
-        },
-        "t": <transaction-id>,
-        "y": "r",
+   "r":
+   {
+      "head":
+      {
+         "key": <64 byte public curve25519 key for this list>,
+         "next": <20 bytes item ID>,
+         "n": <name of the linked list>,
+         "seq": <monotonically increasing sequence number>
+      },
+      "sig": <curve25519 signature of 'head' entry (in bencoded form)>,
+      "id": <20 byte id of sending node>,
+      "token": <write-token>,
+      "nodes": <n * compact IPv4-port pair>,
+      "nodes6": <n * compact IPv6-port pair>
+   },
+   "t": <transaction-id>,
+   "y": "r",
 }
 
-

Since the data that's being signed by the public key already is a hash (i.e. -an info-hash), the signature of each hash-entry is simply the hash encrypted -by the feed's private key.

-

The ih and sig lists MUST have equal number of items. Each item in sig -is the signature of the full string in the corresponding item in the ih list.

-

Each item in the ih list may contain any positive number of 20 byte info-hashes.

-

The rationale behind using lists of strings where the strings contain multiple -info-hashes is to allow the publisher of a feed to sign multiple info-hashes -together, and thus saving space in the UDP packets, allowing nodes to transfer more -info-hashes per packet. Original publishers of a feed MAY re-announce items lumped -together over time to make the feed more efficient.

-

A client receiving a get_item response MUST verify each signature in the sig -list against each corresponding item in the ih list using the feed's public key. -Any item whose signature

-

nodes and nodes6 are optional and have the same semantics as the standard -get_peers request. The intention is to be able to use this get_item request -in the same way, searching for the nodes responsible for the feed.

+

This is the format of a response of a list item:

+
+{
+   "r":
+   {
+      "item":
+      {
+         "key": <64 byte public curve25519 key for this list>,
+         "next": <20 bytes item ID>,
+         ...
+      },
+      "sig": <curve25519 signature of 'item' entry (in bencoded form)>,
+      "id": <20 byte id of sending node>,
+      "token": <write-token>,
+      "nodes": <n * compact IPv4-port pair>,
+      "nodes6": <n * compact IPv6-port pair>
+   },
+   "t": <transaction-id>,
+   "y": "r",
+}
+
+

A client receiving a get_item response MUST verify the signature in the sig +field against the bencoded representation of the item field, using the key as +the public key. The key MUST match the public key of the feed.

+

The item dictionary MAY contain arbitrary keys, and all keys MUST be stored for +items.

announcing items

+

The message format for announcing a list head:

 {
-        "a":
-        {
-                "ih":
-                [
-                        <n * 20 byte info-hash(es)>,
-                        ...
-                ],
-                "sig":
-                [
-                        <64 byte curve25519 signature of info-hash(es)>,
-                        ...
-                ],
-                "id": <20 byte node-id of origin node>,
-                "key": <64 byte public curve25519 key for this feed>,
-                "n": <feed name>
-                "target": <target-id as derived from public key>,
-                "token": <write-token as obtained by previous req.>
-        },
-        "y": "q",
-        "q": "announce_item",
-        "t": <transaction-id>
+   "a":
+   {
+      "head":
+      {
+         "key": <64 byte public curve25519 key for this list>,
+         "next": <20 bytes item ID>,
+         "n": <name of the linked list>,
+         "seq": <monotonically increasing sequence number>
+      },
+      "sig": <curve25519 signature of 'head' entry (in bencoded form)>,
+      "id": <20 byte node-id of origin node>,
+      "target": <target-id as derived from public key and name>,
+      "token": <write-token as obtained by previous request>
+   },
+   "y": "q",
+   "q": "announce_item",
+   "t": <transaction-id>
 }
 
-

An announce can include any number of items, as long as they fit in a packet.

-

Subscribers to a feed SHOULD also announce items that they know of, to the feed. -In order to make the repository of torrents as reliable as possible, subscribers -SHOULD announce random items from their local repository of items. When re-announcing -items, a random subset of all known items should be announced, randomized -independently for each node it's announced to. This makes it a little bit harder -to determine the IP address an item originated from, since it's a matter of -seeing the first announce, and knowing that it wasn't announced anywhere else -first.

-

Any subscriber and publisher SHOULD re-announce items every 30 minutes. If -a feed does not receive any announced items in 60 minutes, a peer MAY time -it out and remove it.

-

Subscribers and publishers SHOULD announce random items.

+

The message format for announcing a list item:

+
+{
+   "a":
+   {
+      "item":
+      {
+         "key": <64 byte public curve25519 key for this list>,
+         "next": <20 bytes item ID>,
+         ...
+      },
+      "sig": <curve25519 signature of 'item' entry (in bencoded form)>,
+      "id": <20 byte node-id of origin node>,
+      "target": <target-id as derived from item dict>,
+      "token": <write-token as obtained by previous request>
+   },
+   "y": "q",
+   "q": "announce_item",
+   "t": <transaction-id>
+}
+
+

A storage node MAY reject items and heads whose bencoded representation is +greater than 1024 bytes.

+
+
+

re-announcing

+

In order to keep feeds alive, subscriber nodes SHOULD help out in announcing +items they have downloaded to the DHT.

+

Every subscriber node SHOULD store items in long term storage, across sessions, +in order to keep items alive for as long as possible, with as few sources as possible.

+

Subscribers to a feed SHOULD also announce items that they know of, to the feed. +Since a feed may have many subscribers and many items, subscribers should re-announce +items according to the following algorithm.

+
+1. pick one random item (i) from the local repository (except
+   items already announced this round)
+2. If all items in the local repository have been announced
+  2.1 terminate
+3. look up item i in the DHT
+4. If fewer than 8 nodes returned the item
+  4.1 announce i to the DHT
+  4.2 goto 1
+
+

This ensures a balanced load on the DHT while still keeping items alive

+
+
+

timeouts

+

Items SHOULD be announced to the DHT every 30 minutes. A storage node MAY time +out an item after 60 minutes of no one announcing it.

+

A storing node MAY extend the timeout when it receives a request for it. Since +items are immutable, the data doesn't go stale. Therefore it doesn't matter if +the storing node no longer is in the set of the 8 closest nodes.

+
+
+

RSS feeds

+

For RSS feeds, following keys are mandatory in the list item's item dictionary.

+
+
ih
+
The torrent's info hash
+
size
+
The size (in bytes) of all files the torrent
+
n
+
name of the torrent
+

example

This is an example of an announce_item message:

 {
-        "a":
-        {
-                "ih":
-                [
-                        "7ea94c240691311dc0916a2a91eb7c3db2c6f3e4",
-                        "0d92ad53c052ac1f49cf4434afffafa4712dc062e4168d940a48e45a45a0b10808014dc267549624"
-                ],
-                "sig":
-                [
-                        "980774404e404941b81aa9da1da0101cab54e670cff4f0054aa563c3b5abcb0fe3c6df5dac1ea25266035f09040bf2a24ae5f614787f1fe7404bf12fee5e6101",
-                        "3fee52abea47e4d43e957c02873193fb9aec043756845946ec29cceb1f095f03d876a7884e38c53cd89a8041a2adfb2d9241b5ec5d70268714d168b9353a2c01"
-                ],
-                "id": "b46989156404e8e0acdb751ef553b210ef77822e",
-                "key": "6bc1de5443d1a7c536cdf69433ac4a7163d3c63e2f9c92d78f6011cf63dbcd5b638bbc2119cdad0c57e4c61bc69ba5e2c08b918c2db8d1848cf514bd9958d307",
-                "n": "my stuff"
-                "target": "b4692ef0005639e86d7165bf378474107bf3a762"
-                "token": "23ba"
-        },
-        "y": "q",
-        "q": "announce_item",
-        "t": "a421"
+   "a":
+   {
+      "item":
+      {
+         "key": "6bc1de5443d1a7c536cdf69433ac4a7163d3c63e2f9c92d
+            78f6011cf63dbcd5b638bbc2119cdad0c57e4c61bc69ba5e2c08
+            b918c2db8d1848cf514bd9958d307",
+         "info-hash": "7ea94c240691311dc0916a2a91eb7c3db2c6f3e4",
+         "size": 24315329,
+         "n": "my stuff",
+         "next": "c68f29156404e8e0aas8761ef5236bcagf7f8f2e"
+      }
+      "sig": <signature>
+      "id": "b46989156404e8e0acdb751ef553b210ef77822e",
+      "target": "b4692ef0005639e86d7165bf378474107bf3a762"
+      "token": "23ba"
+   },
+   "y": "q",
+   "q": "announce_item",
+"t": "a421"
 }
 
-

Strings are printed in hex for printability, but actual encoding is binary. The -response contains 3 feed items, starting with "7ea94c", "0d92ad" and "e4168d". -These 3 items are not published optimally. If they were to be merged into a single -string in the ih list, more than 64 bytes would be saved (because of having -one less signature).

-

Note that target is in fact SHA1('my stuff' + 'key'). The private key -used in this example is 980f4cd7b812ae3430ea05af7c09a7e430275f324f42275ca534d9f7c6d06f5b.

+

Strings are printed in hex for printability, but actual encoding is binary.

+

Note that target is in fact SHA1 hash of the same data the signature sig +is the signature of, i.e.:

+
+d9:info-hash20:7ea94c240691311dc0916a2a91eb7c3db2c6f3e43:key64:6bc1de5443d1
+a7c536cdf69433ac4a7163d3c63e2f9c92d78f6011cf63dbcd5b638bbc2119cdad0c57e4c61
+bc69ba5e2c08b918c2db8d1848cf514bd9958d3071:n8:my stuff4:next20:c68f29156404
+e8e0aas8761ef5236bcagf7f8f2e4:sizei24315329ee
+
+

(note that binary data is printed as hex)

-
-

URI scheme

+
+

RSS feed URI scheme

The proposed URI scheme for DHT feeds is:

 magnet:?xt=btfd:<base16-curve25519-public-key> &dn= <feed name>
@@ -284,10 +400,9 @@ calculating the target ID.

rationale

-

The reason to use curve25519 instead of, for instance, RSA is to fit more signatures -(i.e. items) in a single DHT packet. One packet is typically restricted to between -1280 - 1480 bytes. According to http://cr.yp.to/, curve25519 is free from patent claims -and there are open implementations in both C and Java.

+

The reason to use curve25519 instead of, for instance, RSA is compactness. According to +http://cr.yp.to/, curve25519 is free from patent claims and there are open implementations +in both C and Java.