update dht_sec document
This commit is contained in:
parent
d86c8dcc4d
commit
1179c137d6
|
@ -55,14 +55,14 @@
|
|||
<div class="contents topic" id="table-of-contents">
|
||||
<p class="topic-title first">Table of contents</p>
|
||||
<ul class="simple">
|
||||
<li><a class="reference internal" href="#id1" id="id3">BitTorrent DHT security extension</a></li>
|
||||
<li><a class="reference internal" href="#considerations" id="id4">considerations</a></li>
|
||||
<li><a class="reference internal" href="#node-id-restriction" id="id5">Node ID restriction</a></li>
|
||||
<li><a class="reference internal" href="#bootstrapping" id="id6">bootstrapping</a></li>
|
||||
<li><a class="reference internal" href="#rationale" id="id7">rationale</a></li>
|
||||
<li><a class="reference internal" href="#enforcement" id="id8">enforcement</a></li>
|
||||
<li><a class="reference internal" href="#backwards-compatibility-and-transition" id="id9">backwards compatibility and transition</a></li>
|
||||
<li><a class="reference internal" href="#forward-compatibility" id="id10">forward compatibility</a></li>
|
||||
<li><a class="reference internal" href="#id1" id="id2">BitTorrent DHT security extension</a></li>
|
||||
<li><a class="reference internal" href="#considerations" id="id3">considerations</a></li>
|
||||
<li><a class="reference internal" href="#node-id-restriction" id="id4">Node ID restriction</a></li>
|
||||
<li><a class="reference internal" href="#bootstrapping" id="id5">bootstrapping</a></li>
|
||||
<li><a class="reference internal" href="#rationale" id="id6">rationale</a></li>
|
||||
<li><a class="reference internal" href="#enforcement" id="id7">enforcement</a></li>
|
||||
<li><a class="reference internal" href="#backwards-compatibility-and-transition" id="id8">backwards compatibility and transition</a></li>
|
||||
<li><a class="reference internal" href="#forward-compatibility" id="id9">forward compatibility</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="section" id="id1">
|
||||
|
@ -109,21 +109,21 @@ forced to run their DHT nodes on the same node ID.</p>
|
|||
of IPs, as well as allowing more than one node ID per external IP, the node
|
||||
ID can be restricted at each class level of the IP.</p>
|
||||
<p>Another important property of the restriction put on node IDs is that the
|
||||
distribution of the IDs remoain uniform. This is why CRC32 was chosen
|
||||
as the hash function. See <a class="reference external" href="http://blog.libtorrent.org/2012/12/dht-security/">comparisons of hash functions</a>.</p>
|
||||
distribution of the IDs remoain uniform. This is why CRC32C (Castagnoli) was
|
||||
chosen as the hash function.</p>
|
||||
<p>The expression to calculate a valid ID prefix (from an IPv4 address) is:</p>
|
||||
<pre class="literal-block">
|
||||
crc32((ip & 0x030f3fff) .. r)
|
||||
crc32c((ip & 0x030f3fff) | (r << 29))
|
||||
</pre>
|
||||
<p>And for an IPv6 address (<tt class="docutils literal">ip</tt> is the high 64 bits of the address):</p>
|
||||
<pre class="literal-block">
|
||||
crc32((ip & 0x0103070f1f3f7fff) .. r)
|
||||
crc32c((ip & 0x0103070f1f3f7fff) | (r << 61))
|
||||
</pre>
|
||||
<p><tt class="docutils literal">r</tt> is a random number in the range [0, 7]. The resulting integer,
|
||||
representing the masked IP address is supposed to be big-endian before
|
||||
hashed. The ".." means concatenation.</p>
|
||||
hashed. The "|" operator means bit-wise OR.</p>
|
||||
<p>The details of implementing this is to evaluate the expression, store the
|
||||
result in a big endian 64 bit integer and hash those 8 bytes with CRC32.</p>
|
||||
result in a big endian 64 bit integer and hash those 8 bytes with CRC32C.</p>
|
||||
<p>The first (most significant) 21 bits of the node ID used in the DHT MUST
|
||||
match the first 21 bits of the resulting hash. The last byte of the hash MUST
|
||||
match the random number (<tt class="docutils literal">r</tt>) used to generate the hash.</p>
|
||||
|
@ -144,10 +144,10 @@ for (int i = 0; i < num_octets; ++i)
|
|||
|
||||
uint32_t rand = std::rand() & 0xff;
|
||||
uint8_t r = rand & 0x7;
|
||||
ip[0] |= r << 5;
|
||||
|
||||
uint32_t crc = crc32(0, nullptr, 0);
|
||||
crc = crc32(crc, ip, num_octets);
|
||||
crc = crc32(crc, &r, 1);
|
||||
uint32_t crc = 0;
|
||||
crc = crc32c(crc, ip, num_octets);
|
||||
|
||||
// only take the top 21 bits from crc
|
||||
node_id[0] = (crc >> 24) & 0xff;
|
||||
|
@ -160,15 +160,15 @@ node_id[19] = rand;
|
|||
<pre class="literal-block">
|
||||
IP rand example node ID
|
||||
============ ===== ==========================================
|
||||
124.31.75.21 1 <strong>d2a6df</strong> f10c5d6a4ec8a88e4c6ab4c28b95eee4 <strong>01</strong>
|
||||
21.75.31.124 86 <strong>51d029</strong> c14e7a08645677bbd1cfe7d8f956d532 <strong>56</strong>
|
||||
65.23.51.170 22 <strong>fd334a</strong> 20bc8f112a3d426c84764f8c2a1150e6 <strong>16</strong>
|
||||
84.124.73.14 65 <strong>6aa169</strong> dd1bb1fe518101ceef99462b947a01ff <strong>41</strong>
|
||||
43.213.53.83 90 <strong>eb6434</strong> bf5b7c4be0237986d5243b87aa6d5130 <strong>5a</strong>
|
||||
124.31.75.21 1 <strong>5fbfbf</strong> f10c5d6a4ec8a88e4c6ab4c28b95eee4 <strong>01</strong>
|
||||
21.75.31.124 86 <strong>5a3ce9</strong> c14e7a08645677bbd1cfe7d8f956d532 <strong>56</strong>
|
||||
65.23.51.170 22 <strong>a5d432</strong> 20bc8f112a3d426c84764f8c2a1150e6 <strong>16</strong>
|
||||
84.124.73.14 65 <strong>1b0321</strong> dd1bb1fe518101ceef99462b947a01ff <strong>41</strong>
|
||||
43.213.53.83 90 <strong>e56f6c</strong> bf5b7c4be0237986d5243b87aa6d5130 <strong>5a</strong>
|
||||
</pre>
|
||||
<p>The bold parts of the node ID are the important parts. The rest are
|
||||
random numbers. The last bold number of each row has only its most significant
|
||||
bit pulled from the CRC function. The lower 3 bits are random.</p>
|
||||
bit pulled from the CRC32C function. The lower 3 bits are random.</p>
|
||||
</div>
|
||||
<div class="section" id="bootstrapping">
|
||||
<h1>bootstrapping</h1>
|
||||
|
@ -191,17 +191,19 @@ nodes, from separate searches, tells you your node ID is incorrect.</p>
|
|||
</div>
|
||||
<div class="section" id="rationale">
|
||||
<h1>rationale</h1>
|
||||
<p>The choice of using CRC32 instead of a more traditional cryptographic hash
|
||||
<p>The choice of using CRC32C instead of a more traditional cryptographic hash
|
||||
function is justified primarily of these reasons:</p>
|
||||
<ol class="arabic simple">
|
||||
<li>it is a fast function</li>
|
||||
<li>produces well distributed results</li>
|
||||
<li>there is no need for the hash function to be one-way (the input set is
|
||||
so small that any hash function could be reversed).</li>
|
||||
<li>CRC32C (Castagnoli) is supported in hardware by SSE 4.2, which can
|
||||
significantly speed up computation</li>
|
||||
</ol>
|
||||
<p>There are primarily two tests run on SHA-1 and CRC32 to establish the
|
||||
<p>There are primarily two tests run on SHA-1 and CRC32C to establish the
|
||||
distribution of results. The first one is the number of bits in the output
|
||||
set that contain every possible combination of bits. The CRC function
|
||||
set that contain every possible combination of bits. The CRC32C function
|
||||
has a longer such prefix in its output than SHA-1. This means nodes will still
|
||||
have well uniformly distributed IDs, even when IP addresses in use are not
|
||||
uniformly distributed.</p>
|
||||
|
@ -213,7 +215,7 @@ reserved for local networks, multicast and other things. It also takes into
|
|||
account that some /8 blocks are not in use by end-users and exremely unlikely
|
||||
to ever run a DHT node. This makes the results likely to be very similar to
|
||||
what we would see in the wild.</p>
|
||||
<p>These results indicate that CRC32 provides the best uniformity in the results
|
||||
<p>These results indicate that CRC32C provides the best uniformity in the results
|
||||
in terms of bit prefixes where all possibilities are represented, and that
|
||||
no more than 21 bits should be used from the result. If more than 21 bits
|
||||
were to be used, there would be certain node IDs that would be impossible to
|
||||
|
@ -223,9 +225,13 @@ The target space (32 bit interger) is divided up into 1000 buckets. Every valid
|
|||
IP and <tt class="docutils literal">r</tt> input is run through the algorithm and the result is put in the
|
||||
bucket it falls in. The expectation is that each bucket has roughly an equal
|
||||
number of results falling into it. The following graph shows the resulting
|
||||
histogram, comparing SHA-1 and CRC32.</p>
|
||||
histogram, comparing SHA-1 and CRC32C.</p>
|
||||
<img alt="hash_distribution.png" src="hash_distribution.png" />
|
||||
<p>The source code for these tests can be found <a class="reference external" href="https://github.com/arvidn/hash_complete_prefix">here</a>.</p>
|
||||
<p>The reason to use CRC32C instead of the CRC32 implemented by zlib is that
|
||||
Intel CPUs have hardware support for the CRC32C calculations. The input
|
||||
being exactly 4 bytes is also deliberate, to make it fit in a single
|
||||
instruction.</p>
|
||||
</div>
|
||||
<div class="section" id="enforcement">
|
||||
<h1>enforcement</h1>
|
||||
|
|
|
@ -64,25 +64,23 @@ of IPs, as well as allowing more than one node ID per external IP, the node
|
|||
ID can be restricted at each class level of the IP.
|
||||
|
||||
Another important property of the restriction put on node IDs is that the
|
||||
distribution of the IDs remoain uniform. This is why CRC32 was chosen
|
||||
as the hash function. See `comparisons of hash functions`__.
|
||||
|
||||
__ http://blog.libtorrent.org/2012/12/dht-security/
|
||||
distribution of the IDs remoain uniform. This is why CRC32C (Castagnoli) was
|
||||
chosen as the hash function.
|
||||
|
||||
The expression to calculate a valid ID prefix (from an IPv4 address) is::
|
||||
|
||||
crc32((ip & 0x030f3fff) .. r)
|
||||
crc32c((ip & 0x030f3fff) | (r << 29))
|
||||
|
||||
And for an IPv6 address (``ip`` is the high 64 bits of the address)::
|
||||
|
||||
crc32((ip & 0x0103070f1f3f7fff) .. r)
|
||||
crc32c((ip & 0x0103070f1f3f7fff) | (r << 61))
|
||||
|
||||
``r`` is a random number in the range [0, 7]. The resulting integer,
|
||||
representing the masked IP address is supposed to be big-endian before
|
||||
hashed. The ".." means concatenation.
|
||||
hashed. The "|" operator means bit-wise OR.
|
||||
|
||||
The details of implementing this is to evaluate the expression, store the
|
||||
result in a big endian 64 bit integer and hash those 8 bytes with CRC32.
|
||||
result in a big endian 64 bit integer and hash those 8 bytes with CRC32C.
|
||||
|
||||
The first (most significant) 21 bits of the node ID used in the DHT MUST
|
||||
match the first 21 bits of the resulting hash. The last byte of the hash MUST
|
||||
|
@ -106,10 +104,10 @@ Example code code for calculating a valid node ID::
|
|||
|
||||
uint32_t rand = std::rand() & 0xff;
|
||||
uint8_t r = rand & 0x7;
|
||||
ip[0] |= r << 5;
|
||||
|
||||
uint32_t crc = crc32(0, nullptr, 0);
|
||||
crc = crc32(crc, ip, num_octets);
|
||||
crc = crc32(crc, &r, 1);
|
||||
uint32_t crc = 0;
|
||||
crc = crc32c(crc, ip, num_octets);
|
||||
|
||||
// only take the top 21 bits from crc
|
||||
node_id[0] = (crc >> 24) & 0xff;
|
||||
|
@ -124,15 +122,15 @@ test vectors:
|
|||
|
||||
IP rand example node ID
|
||||
============ ===== ==========================================
|
||||
124.31.75.21 1 **d2a6df** f10c5d6a4ec8a88e4c6ab4c28b95eee4 **01**
|
||||
21.75.31.124 86 **51d029** c14e7a08645677bbd1cfe7d8f956d532 **56**
|
||||
65.23.51.170 22 **fd334a** 20bc8f112a3d426c84764f8c2a1150e6 **16**
|
||||
84.124.73.14 65 **6aa169** dd1bb1fe518101ceef99462b947a01ff **41**
|
||||
43.213.53.83 90 **eb6434** bf5b7c4be0237986d5243b87aa6d5130 **5a**
|
||||
124.31.75.21 1 **5fbfbf** f10c5d6a4ec8a88e4c6ab4c28b95eee4 **01**
|
||||
21.75.31.124 86 **5a3ce9** c14e7a08645677bbd1cfe7d8f956d532 **56**
|
||||
65.23.51.170 22 **a5d432** 20bc8f112a3d426c84764f8c2a1150e6 **16**
|
||||
84.124.73.14 65 **1b0321** dd1bb1fe518101ceef99462b947a01ff **41**
|
||||
43.213.53.83 90 **e56f6c** bf5b7c4be0237986d5243b87aa6d5130 **5a**
|
||||
|
||||
The bold parts of the node ID are the important parts. The rest are
|
||||
random numbers. The last bold number of each row has only its most significant
|
||||
bit pulled from the CRC function. The lower 3 bits are random.
|
||||
bit pulled from the CRC32C function. The lower 3 bits are random.
|
||||
|
||||
bootstrapping
|
||||
-------------
|
||||
|
@ -160,17 +158,19 @@ nodes, from separate searches, tells you your node ID is incorrect.
|
|||
rationale
|
||||
---------
|
||||
|
||||
The choice of using CRC32 instead of a more traditional cryptographic hash
|
||||
The choice of using CRC32C instead of a more traditional cryptographic hash
|
||||
function is justified primarily of these reasons:
|
||||
|
||||
1. it is a fast function
|
||||
2. produces well distributed results
|
||||
3. there is no need for the hash function to be one-way (the input set is
|
||||
so small that any hash function could be reversed).
|
||||
4. CRC32C (Castagnoli) is supported in hardware by SSE 4.2, which can
|
||||
significantly speed up computation
|
||||
|
||||
There are primarily two tests run on SHA-1 and CRC32 to establish the
|
||||
There are primarily two tests run on SHA-1 and CRC32C to establish the
|
||||
distribution of results. The first one is the number of bits in the output
|
||||
set that contain every possible combination of bits. The CRC function
|
||||
set that contain every possible combination of bits. The CRC32C function
|
||||
has a longer such prefix in its output than SHA-1. This means nodes will still
|
||||
have well uniformly distributed IDs, even when IP addresses in use are not
|
||||
uniformly distributed.
|
||||
|
@ -186,7 +186,7 @@ account that some /8 blocks are not in use by end-users and exremely unlikely
|
|||
to ever run a DHT node. This makes the results likely to be very similar to
|
||||
what we would see in the wild.
|
||||
|
||||
These results indicate that CRC32 provides the best uniformity in the results
|
||||
These results indicate that CRC32C provides the best uniformity in the results
|
||||
in terms of bit prefixes where all possibilities are represented, and that
|
||||
no more than 21 bits should be used from the result. If more than 21 bits
|
||||
were to be used, there would be certain node IDs that would be impossible to
|
||||
|
@ -197,7 +197,7 @@ The target space (32 bit interger) is divided up into 1000 buckets. Every valid
|
|||
IP and ``r`` input is run through the algorithm and the result is put in the
|
||||
bucket it falls in. The expectation is that each bucket has roughly an equal
|
||||
number of results falling into it. The following graph shows the resulting
|
||||
histogram, comparing SHA-1 and CRC32.
|
||||
histogram, comparing SHA-1 and CRC32C.
|
||||
|
||||
.. image:: hash_distribution.png
|
||||
|
||||
|
@ -205,6 +205,11 @@ The source code for these tests can be found here_.
|
|||
|
||||
.. _here: https://github.com/arvidn/hash_complete_prefix
|
||||
|
||||
The reason to use CRC32C instead of the CRC32 implemented by zlib is that
|
||||
Intel CPUs have hardware support for the CRC32C calculations. The input
|
||||
being exactly 4 bytes is also deliberate, to make it fit in a single
|
||||
instruction.
|
||||
|
||||
enforcement
|
||||
-----------
|
||||
|
||||
|
|
Loading…
Reference in New Issue