updated IP->ID formula for DHT

This commit is contained in:
Arvid Norberg 2011-05-26 17:04:53 +00:00
parent cfbd5bfc4b
commit e6640de205
10 changed files with 349 additions and 57 deletions

View File

@ -58,10 +58,12 @@
<p class="topic-title first">Table of contents</p> <p class="topic-title first">Table of contents</p>
<ul class="simple"> <ul class="simple">
<li><a class="reference internal" href="#id1" id="id2">BitTorrent DHT security extension</a></li> <li><a class="reference internal" href="#id1" id="id2">BitTorrent DHT security extension</a></li>
<li><a class="reference internal" href="#node-ids" id="id3">node IDs</a></li> <li><a class="reference internal" href="#considerations" id="id3">considerations</a></li>
<li><a class="reference internal" href="#bootstrapping" id="id4">bootstrapping</a></li> <li><a class="reference internal" href="#node-id-restriction" id="id4">Node ID restriction</a></li>
<li><a class="reference internal" href="#enforcement" id="id5">enforcement</a></li> <li><a class="reference internal" href="#bootstrapping" id="id5">bootstrapping</a></li>
<li><a class="reference internal" href="#backwards-compatibility-and-transition" id="id6">backwards compatibility and transition</a></li> <li><a class="reference internal" href="#enforcement" id="id6">enforcement</a></li>
<li><a class="reference internal" href="#backwards-compatibility-and-transition" id="id7">backwards compatibility and transition</a></li>
<li><a class="reference internal" href="#forward-compatibility" id="id8">forward compatibility</a></li>
</ul> </ul>
</div> </div>
<div class="section" id="id1"> <div class="section" id="id1">
@ -78,18 +80,106 @@ peers.</p>
<p>The proposed guard against this is to enforce restrictions on which node-ID <p>The proposed guard against this is to enforce restrictions on which node-ID
a node can choose, based on its external IP address.</p> a node can choose, based on its external IP address.</p>
</div> </div>
<div class="section" id="node-ids"> <div class="section" id="considerations">
<h1>node IDs</h1> <h1>considerations</h1>
<p>The proposed formula for restricting node IDs is that the 4 first bytes of <p>One straight forward scheme to tie the node ID to an IP would be to hash
the node ID MUST match the 4 first bytes of <tt class="docutils literal"><span class="pre">SHA-1(IP_address)</span></tt>. That is, the IP and force the node ID to share the prefix of that hash. One main
the raw, big endian, storage of the address, either IPv4 or IPv6, hashed draw back of this approach is that an entities control over the DHT key
with SHA-1.</p> space grows linearly with its control over the IP address space.</p>
<p>Example:</p> <p>In order to successfully launch an attack, you just need to find 8 IPs
whose hash will be <em>closest</em> to the target info-hash. Given the current
size of the DHT, that is quite likely to be possible by anyone in control
of a /8 IP block.</p>
<p>The size of the DHT is approximately 8.4 million nodes. This is estmiated
by observing that a typical routing table typically has about 20 of its
top routing table buckets full. That means the key space is dense enough
to contain 8 nodes for every combination of the 20 top bits of node IDs.</p>
<blockquote> <blockquote>
An IP address 89.5.5.5 has a big endian byte representation of <tt class="docutils literal"><span class="pre">2^20</span> <span class="pre">*</span> <span class="pre">8</span> <span class="pre">=</span> <span class="pre">8388608</span></tt></blockquote>
<tt class="docutils literal"><span class="pre">0x59</span> <span class="pre">0x05</span> <span class="pre">0x05</span> <span class="pre">0x05</span></tt>. The SHA-1 hash of this byte sequence is <p>By controlling that many IP addresses, an attacker could snoop any info-hash.
<tt class="docutils literal"><span class="pre">656d41da810a0a6d92fd2f6a8ba3b466e35ab368</span></tt>. The DHT node must choose By controlling 8 times that many IP addresses, an attacker could actually
a node ID which starts with <tt class="docutils literal"><span class="pre">656d41da</span></tt>.</blockquote> take over any info-hash.</p>
<p>With IPv4, snooping would require a /8 IP block, giving access to 16.7 million
Ips.</p>
<p>Another problem with hashing the IP is that multiple users behind a NAT are
forced to run their DHT nodes on the same node ID.</p>
</div>
<div class="section" id="node-id-restriction">
<h1>Node ID restriction</h1>
<p>In order to avoid the number node IDs controlled to grow linearly by the number
of IPs, as well as allowing more than one node ID per external IP, the node
ID can be restricted at each class level of the IP.</p>
<p>The expression to calculate a valid ID prefix (from an IPv4 address) is:</p>
<pre class="literal-block">
sha1((A * (B * (C * (D * (rand() % 8) % 0x100) % 0x4000) % 0x100000)) % 0x4000000)
</pre>
<p>Where <tt class="docutils literal"><span class="pre">A</span></tt>, <tt class="docutils literal"><span class="pre">B</span></tt>, <tt class="docutils literal"><span class="pre">C</span></tt> and <tt class="docutils literal"><span class="pre">D</span></tt> are the four octets of an IPv4 address.</p>
<p>The pattern is that the modulus constant is shifted left by 6 for each octet.
It generalizes to IPv6 by only considering the first 64 bit of the IP (since
the low 64 bits are controlled by the host) and shifting the modulus by 3 for
each octet instead.</p>
<p>The details of implementing this is to evaluate the expression, store the
result in a big endian 32 bit integer and hash those 4 bytes with SHA-1.
The first 4 bytes of the node ID used in the DHT MUST match the first 4
bytes in the resulting hash. The last byte of the hash MUST match the
random number used to generate the hash.</p>
<img alt="ip_id_v4.png" src="ip_id_v4.png" />
<img alt="ip_id_v6.png" src="ip_id_v6.png" />
<p>Example code code for calculating a valid node ID:</p>
<pre class="literal-block">
uint8_t* ip; // our external IPv4 or IPv6 address (network byte order)
int num_octets; // the number of octets to consider in ip (4 or 8)
uint8_t node_id[20]; // resulting node ID
uint32_t rand = rand() &amp; 0xff;
uint32_t modulus = 0x100;
uint32_t seed = rand &amp; 0x7;
int mod_shift = 6 * 4 / num_octets; // 6 or 3, depending on IPv4 and IPv6
while (num_octets)
{
seed *= ip[num_octets];
seed &amp;= (modulus-1);
modulus &lt;&lt;= mod_shift;
--num_octets;
}
seed = htonl(seed);
SHA_CTX ctx;
SHA1_Init(&amp;ctx);
SHA1_Update(&amp;ctx, (unsigned char*)&amp;seed, sizeof(seed));
SHA1_Final(&amp;ctx, node_id);
for (int i = 4; i &lt; 19; ++i) node_id[i] = rand();
node_id[19] = rand;
</pre>
<p>Example code to verify a node ID:</p>
<pre class="literal-block">
uint8_t* ip; // incoming IPv4 or IPv6 address (network byte order)
int num_octets; // the number of octets to consider in ip (4 or 8)
uint8_t node_id[20]; // incoming node ID
uint32_t modulus = 0x100;
uint32_t seed = node_id[19] &amp; 0x7;
int mod_shift = 6 * 4 / num_octets; // 6 or 3, depending on IPv4 and IPv6
while (num_octets)
{
seed *= ip[num_octets];
seed &amp;= (modulus-1);
modulus &lt;&lt;= mod_shift;
--num_octets;
}
seed = htonl(seed);
SHA_CTX ctx;
SHA1_Init(&amp;ctx);
SHA1_Update(&amp;ctx, (unsigned char*)&amp;seed, sizeof(seed));
uint8_t digest[20];
SHA1_Final(&amp;ctx, digest);
if (memcmp(digest, node_id, 4) != 0)
return false; // failed verification
else
return true; // verification passed
</pre>
<p>test vectors:</p>
</div> </div>
<div class="section" id="bootstrapping"> <div class="section" id="bootstrapping">
<h1>bootstrapping</h1> <h1>bootstrapping</h1>
@ -108,9 +198,9 @@ nodes, from separate searches, tells you your node ID is incorrect.</p>
</div> </div>
<div class="section" id="enforcement"> <div class="section" id="enforcement">
<h1>enforcement</h1> <h1>enforcement</h1>
<p>Write tokens from peers whose node ID does not match its external IP should be <p>Once enforced, write tokens from peers whose node ID does not match its external
considered dropped. In other words, a peer that uses a non-matching ID MUST IP should be considered dropped. In other words, a peer that uses a non-matching
never be used to store information on, regardless of which request. In the ID MUST never be used to store information on, regardless of which request. In the
original DHT specification only <tt class="docutils literal"><span class="pre">announce_peer</span></tt> stores data in the network, original DHT specification only <tt class="docutils literal"><span class="pre">announce_peer</span></tt> stores data in the network,
but any future extension which stores data in the network SHOULD use the same but any future extension which stores data in the network SHOULD use the same
restriction.</p> restriction.</p>
@ -137,6 +227,13 @@ should not be blocked.</p>
<p>Requests from peers whose node ID does not match their external IP should <p>Requests from peers whose node ID does not match their external IP should
always be serviced, even after the transition period. The attack this protects always be serviced, even after the transition period. The attack this protects
from is storing data on an attacker's node, not servicing an attackers request.</p> from is storing data on an attacker's node, not servicing an attackers request.</p>
</div>
<div class="section" id="forward-compatibility">
<h1>forward compatibility</h1>
<p>If the total size of the DHT grows to the point where the inherent size limit
in this proposal is too small, the modulus constants can be updated in a new
proposal, and another transition period where both sets of modulus constants
are accepted.</p>
</div> </div>
</div> </div>
<div id="footer"> <div id="footer">

View File

@ -26,20 +26,118 @@ peers.
The proposed guard against this is to enforce restrictions on which node-ID The proposed guard against this is to enforce restrictions on which node-ID
a node can choose, based on its external IP address. a node can choose, based on its external IP address.
node IDs considerations
-------- --------------
The proposed formula for restricting node IDs is that the 4 first bytes of One straight forward scheme to tie the node ID to an IP would be to hash
the node ID MUST match the 4 first bytes of ``SHA-1(IP_address)``. That is, the IP and force the node ID to share the prefix of that hash. One main
the raw, big endian, storage of the address, either IPv4 or IPv6, hashed draw back of this approach is that an entities control over the DHT key
with SHA-1. space grows linearly with its control over the IP address space.
Example: In order to successfully launch an attack, you just need to find 8 IPs
whose hash will be *closest* to the target info-hash. Given the current
size of the DHT, that is quite likely to be possible by anyone in control
of a /8 IP block.
An IP address 89.5.5.5 has a big endian byte representation of The size of the DHT is approximately 8.4 million nodes. This is estmiated
``0x59 0x05 0x05 0x05``. The SHA-1 hash of this byte sequence is by observing that a typical routing table typically has about 20 of its
``656d41da810a0a6d92fd2f6a8ba3b466e35ab368``. The DHT node must choose top routing table buckets full. That means the key space is dense enough
a node ID which starts with ``656d41da``. to contain 8 nodes for every combination of the 20 top bits of node IDs.
``2^20 * 8 = 8388608``
By controlling that many IP addresses, an attacker could snoop any info-hash.
By controlling 8 times that many IP addresses, an attacker could actually
take over any info-hash.
With IPv4, snooping would require a /8 IP block, giving access to 16.7 million
Ips.
Another problem with hashing the IP is that multiple users behind a NAT are
forced to run their DHT nodes on the same node ID.
Node ID restriction
-------------------
In order to avoid the number node IDs controlled to grow linearly by the number
of IPs, as well as allowing more than one node ID per external IP, the node
ID can be restricted at each class level of the IP.
The expression to calculate a valid ID prefix (from an IPv4 address) is::
sha1((A * (B * (C * (D * (rand() % 8) % 0x100) % 0x4000) % 0x100000)) % 0x4000000)
Where ``A``, ``B``, ``C`` and ``D`` are the four octets of an IPv4 address.
The pattern is that the modulus constant is shifted left by 6 for each octet.
It generalizes to IPv6 by only considering the first 64 bit of the IP (since
the low 64 bits are controlled by the host) and shifting the modulus by 3 for
each octet instead.
The details of implementing this is to evaluate the expression, store the
result in a big endian 32 bit integer and hash those 4 bytes with SHA-1.
The first 4 bytes of the node ID used in the DHT MUST match the first 4
bytes in the resulting hash. The last byte of the hash MUST match the
random number used to generate the hash.
.. image:: ip_id_v4.png
.. image:: ip_id_v6.png
Example code code for calculating a valid node ID::
uint8_t* ip; // our external IPv4 or IPv6 address (network byte order)
int num_octets; // the number of octets to consider in ip (4 or 8)
uint8_t node_id[20]; // resulting node ID
uint32_t rand = rand() & 0xff;
uint32_t modulus = 0x100;
uint32_t seed = rand & 0x7;
int mod_shift = 6 * 4 / num_octets; // 6 or 3, depending on IPv4 and IPv6
while (num_octets)
{
seed *= ip[num_octets];
seed &= (modulus-1);
modulus <<= mod_shift;
--num_octets;
}
seed = htonl(seed);
SHA_CTX ctx;
SHA1_Init(&ctx);
SHA1_Update(&ctx, (unsigned char*)&seed, sizeof(seed));
SHA1_Final(&ctx, node_id);
for (int i = 4; i < 19; ++i) node_id[i] = rand();
node_id[19] = rand;
Example code to verify a node ID::
uint8_t* ip; // incoming IPv4 or IPv6 address (network byte order)
int num_octets; // the number of octets to consider in ip (4 or 8)
uint8_t node_id[20]; // incoming node ID
uint32_t modulus = 0x100;
uint32_t seed = node_id[19] & 0x7;
int mod_shift = 6 * 4 / num_octets; // 6 or 3, depending on IPv4 and IPv6
while (num_octets)
{
seed *= ip[num_octets];
seed &= (modulus-1);
modulus <<= mod_shift;
--num_octets;
}
seed = htonl(seed);
SHA_CTX ctx;
SHA1_Init(&ctx);
SHA1_Update(&ctx, (unsigned char*)&seed, sizeof(seed));
uint8_t digest[20];
SHA1_Final(&ctx, digest);
if (memcmp(digest, node_id, 4) != 0)
return false; // failed verification
else
return true; // verification passed
test vectors:
bootstrapping bootstrapping
------------- -------------
@ -61,9 +159,9 @@ nodes, from separate searches, tells you your node ID is incorrect.
enforcement enforcement
----------- -----------
Write tokens from peers whose node ID does not match its external IP should be Once enforced, write tokens from peers whose node ID does not match its external
considered dropped. In other words, a peer that uses a non-matching ID MUST IP should be considered dropped. In other words, a peer that uses a non-matching
never be used to store information on, regardless of which request. In the ID MUST never be used to store information on, regardless of which request. In the
original DHT specification only ``announce_peer`` stores data in the network, original DHT specification only ``announce_peer`` stores data in the network,
but any future extension which stores data in the network SHOULD use the same but any future extension which stores data in the network SHOULD use the same
restriction. restriction.
@ -94,3 +192,11 @@ Requests from peers whose node ID does not match their external IP should
always be serviced, even after the transition period. The attack this protects always be serviced, even after the transition period. The attack this protects
from is storing data on an attacker's node, not servicing an attackers request. from is storing data on an attacker's node, not servicing an attackers request.
forward compatibility
---------------------
If the total size of the DHT grows to the point where the inherent size limit
in this proposal is too small, the modulus constants can be updated in a new
proposal, and another transition period where both sets of modulus constants
are accepted.

BIN
docs/ip_id_v4.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.0 KiB

BIN
docs/ip_id_v6.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.3 KiB

50
docs/ips.py Normal file
View File

@ -0,0 +1,50 @@
#/bin/python
import os
import sys
def num_ids(bits, total_bits):
ret = 8;
modulus = 0x100
mod_shift = 6 * 32 / total_bits
while bits >= 0:
ret *= min(1 << bits, 256)
ret = min(ret, modulus)
bits -= 8
modulus <<= mod_shift
return ret
f = open('ip_id_v4.dat', 'w+')
for i in range(0, 33):
print >>f, '%d\t%d\t%d' % (i, num_ids(i, 32), 1 << i)
f.close()
f = open('ip_id_v6.dat', 'w+')
for i in range(0, 65):
print >>f, '%d\t%d\t%d' % (i, num_ids(i, 64), 1 << i)
f.close()
f = open('ip_id.gnuplot', 'w+')
f.write('''
set term png size 600,300
set output "ip_id_v4.png"
set logscale y
set title "Number of possible node IDs"
set ylabel "possible node IDs"
set xlabel "bits controlled in IPv4"
set xtics 4
set grid
plot "ip_id_v4.dat" using 1:2 title "octet-wise modulus" with lines, \
"ip_id_v4.dat" using 1:3 title "hash of IP" with lines
set output "ip_id_v6.png"
set title "Number of possible node IDs"
set xlabel "bits controlled in IPv6"
plot "ip_id_v6.dat" using 1:2 title "octet-wise modulus" with lines, \
"ip_id_v6.dat" using 1:3 title "hash of IP" with lines
''')
f.close()
os.system('gnuplot ip_id.gnuplot')

View File

@ -56,7 +56,8 @@ bool TORRENT_EXPORT compare_ref(node_id const& n1, node_id const& n2, node_id co
// usefult for finding out which bucket a node belongs to // usefult for finding out which bucket a node belongs to
int TORRENT_EXPORT distance_exp(node_id const& n1, node_id const& n2); int TORRENT_EXPORT distance_exp(node_id const& n1, node_id const& n2);
node_id TORRENT_EXPORT generate_id(address const& external_ip = address()); node_id TORRENT_EXPORT generate_id(address const& external_ip);
node_id TORRENT_EXPORT generate_random_id();
bool TORRENT_EXPORT verify_id(node_id const& nid, address const& source_ip); bool TORRENT_EXPORT verify_id(node_id const& nid, address const& source_ip);

View File

@ -97,6 +97,60 @@ int distance_exp(node_id const& n1, node_id const& n2)
struct static_ { static_() { std::srand((unsigned int)std::time(0)); } } static__; struct static_ { static_() { std::srand((unsigned int)std::time(0)); } } static__;
node_id generate_id_impl(address const& ip, boost::uint32_t r)
{
boost::uint32_t seed = r & 0x7;
uint32_t modulus = 0x100;
boost::uint8_t* p = 0;
int num_octets = 0;
int mod_shift = 0;
address_v4::bytes_type b4;
#if TORRENT_USE_IPV6
address_v6::bytes_type b6;
if (ip.is_v6())
{
b6 = ip.to_v6().to_bytes();
p = &b6[0];
num_octets = 8;
mod_shift = 3;
}
else
#endif
{
b4 = ip.to_v4().to_bytes();
p = &b4[0];
num_octets = 4;
mod_shift = 6;
}
while (num_octets)
{
seed *= p[num_octets];
seed &= (modulus-1);
modulus <<= mod_shift;
--num_octets;
}
seed = htonl(seed);
node_id id = hasher((const char*)&seed, sizeof(seed)).final();
for (int i = 4; i < 19; ++i) id[i] = rand();
id[19] = r;
return id;
}
node_id generate_random_id()
{
char random[20];
for (int i = 0; i < 20; ++i) random[i] = rand();
return hasher(random, 20).final();
}
// verifies whether a node-id matches the IP it's used from // verifies whether a node-id matches the IP it's used from
// returns true if the node-id is OK coming from this source // returns true if the node-id is OK coming from this source
// and false otherwise. // and false otherwise.
@ -105,29 +159,13 @@ bool verify_id(node_id const& nid, address const& source_ip)
// no need to verify local IPs, they would be incorrect anyway // no need to verify local IPs, they would be incorrect anyway
if (is_local(source_ip)) return true; if (is_local(source_ip)) return true;
node_id h; node_id h = generate_id_impl(source_ip, nid[19]);
hash_address(source_ip, h);
return memcmp(&nid[0], &h[0], 4) == 0; return memcmp(&nid[0], &h[0], 4) == 0;
} }
node_id generate_id(address const& external_ip) node_id generate_id(address const& ip)
{ {
node_id h; return generate_id_impl(ip, rand());
char random[20];
#ifdef _MSC_VER
std::generate(random, random + 20, &rand);
#else
std::generate(random, random + 20, &std::rand);
#endif
h = hasher(random, 20).final();
if (!is_local(external_ip))
{
node_id ph;
hash_address(external_ip, ph);
memcpy(&h[0], &ph[0], 4);
}
return h;
} }
} } // namespace libtorrent::dht } } // namespace libtorrent::dht

View File

@ -219,7 +219,7 @@ bool routing_table::need_refresh(node_id& target) const
if (now - m_last_refresh < seconds(45)) return false; if (now - m_last_refresh < seconds(45)) return false;
// generate a random node_id within the given bucket // generate a random node_id within the given bucket
target = generate_id(address()); target = generate_random_id();
int num_bits = std::distance(m_buckets.begin(), i) + 1; int num_bits = std::distance(m_buckets.begin(), i) + 1;
node_id mask(0); node_id mask(0);
for (int i = 0; i < num_bits; ++i) mask[i/8] |= 0x80 >> (i&7); for (int i = 0; i < num_bits; ++i) mask[i/8] |= 0x80 >> (i&7);

View File

@ -40,7 +40,7 @@ POSSIBILITY OF SUCH DAMAGE.
#include <libtorrent/io.hpp> #include <libtorrent/io.hpp>
#include <libtorrent/invariant_check.hpp> #include <libtorrent/invariant_check.hpp>
#include <libtorrent/kademlia/node_id.hpp> // for generate_id #include <libtorrent/kademlia/node_id.hpp> // for generate_random_id
#include <libtorrent/kademlia/rpc_manager.hpp> #include <libtorrent/kademlia/rpc_manager.hpp>
#include <libtorrent/kademlia/logging.hpp> #include <libtorrent/kademlia/logging.hpp>
#include <libtorrent/kademlia/routing_table.hpp> #include <libtorrent/kademlia/routing_table.hpp>
@ -169,7 +169,7 @@ rpc_manager::rpc_manager(node_id const& our_id
, m_our_id(our_id) , m_our_id(our_id)
, m_table(table) , m_table(table)
, m_timer(time_now()) , m_timer(time_now())
, m_random_number(generate_id()) , m_random_number(generate_random_id())
, m_allocated_observers(0) , m_allocated_observers(0)
, m_destructing(false) , m_destructing(false)
, m_ext_ip(ext_ip) , m_ext_ip(ext_ip)

View File

@ -107,7 +107,7 @@ void traversal_algorithm::add_entry(node_id const& id, udp::endpoint addr, unsig
observer_ptr o = new_observer(ptr, addr, id); observer_ptr o = new_observer(ptr, addr, id);
if (id.is_all_zeros()) if (id.is_all_zeros())
{ {
o->set_id(generate_id()); o->set_id(generate_random_id());
o->flags |= observer::flag_no_id; o->flags |= observer::flag_no_id;
} }