129 lines
5.9 KiB
ReStructuredText
129 lines
5.9 KiB
ReStructuredText
Streaming implementation
|
|
========================
|
|
|
|
.. include:: header.rst
|
|
|
|
This documents describes the algorithm libtorrent uses to satisfy time critical
|
|
piece requests, i.e. streaming.
|
|
|
|
piece picking
|
|
-------------
|
|
|
|
The standard bittorrent piece picker is peer-centric. A peer unchokes us or we
|
|
complte a block from a peer and we want to make another request to that peer.
|
|
The piece picker answers the question: which block should we request from this
|
|
peer.
|
|
|
|
When streaming, we have a number of *time critical* pieces, the ones the video
|
|
or audio player will need next to keep up with the stream. To keep the deadlines
|
|
of these pieces, we need a mechanism to answer the question: I want to request
|
|
blocks from this piece, which peer is the most likely to be able to deliver it
|
|
to me the soonest.
|
|
|
|
This question is answered by ``torrent::request_time_critical_pieces()`` in
|
|
libtorrent.
|
|
|
|
At a high level, this algorithm keeps a list of peers, sorted by the estimated
|
|
download queue time. That is, the estimated time for a new request to this
|
|
peer to be received. The bottom 10th percentile of the peers (the 10% slowest
|
|
peers) are ignored and not included in the peer list. Peers that have choked
|
|
us, are not interesting, is on parole, disconnecting, have too many outstanding
|
|
block requests or is snubbed are also excluded from the peer list.
|
|
|
|
The time critical pieces are also kept sorted by their deadline. Pieces with
|
|
an earlier deadline first. This list of pieces is iterated, starting at the
|
|
top, and blocks are requested from a piece until we cannot make any more
|
|
requests from it. We then move on to the next piece and request blocks from it
|
|
until we cannot make any more. The peer each request is sent to is the one
|
|
with the lowest `download queue time`_. Each time a request is made, this
|
|
estimate is updated and the peer is resorted in this list.
|
|
|
|
Any peer that doesn't have the piece is ignored until we move on to the next
|
|
piece.
|
|
|
|
If the top peer's download queue time is more than 2 seconds, the loop is
|
|
terminated. This is to not over-request. ``request_time_critical_pieces()``
|
|
is called once per second, so this will keep the queue full with margin.
|
|
|
|
download queue time
|
|
-------------------
|
|
|
|
Each peer maintains the number of bytes that have been requested from it but
|
|
not yet been received. This is referred to as ``outstanding_bytes``. This number
|
|
is incremented by the size of each outgoing request and decremented for each
|
|
*payload* byte received.
|
|
|
|
This counter is divided by an estimated download rate from the peer to form
|
|
the estimated *download queue time*. That is, the estimated time it will take
|
|
any new request to this peer to begin being received.
|
|
|
|
The estimated download rate of a peer is not trivial. There may not be any
|
|
outstanding requests to the peer, in which case the payload download rate
|
|
will be zero. That would not be a reasonable estimate of the rate we would see
|
|
once we make a request.
|
|
|
|
If we have not received any payload from a peer in the last 30 seconds, we
|
|
must use an alternative estimate of the download rate. If we have received
|
|
payload from this peer previously, we can use the peak download rate.
|
|
|
|
If we have received less than 2 blocks (32 kiB) and we have been unchoked for
|
|
less than 5 seconds ago, use the average download rate of all peers (that have
|
|
outstanding requests).
|
|
|
|
timeouts
|
|
--------
|
|
|
|
An observation that is useful to keep in mind when streaming is that your
|
|
download capacity is likely to be saturated by your peers. In this case, if the
|
|
swarm is well seeded, most peers will send data to you at close to the same
|
|
rate. This makes it important to support streaming from many slow peers. For
|
|
instance, this means you can't make assumptions about the download time of a
|
|
block being less than some absolute time. You may be downloading at well above
|
|
the bitrate of the video, but each individual peer only transfers at 5 kiB/s.
|
|
|
|
In this state, your download rate is a zero-sum-game. Any block you request
|
|
that is not urgent, will take away from the bandwidth you get for peers that
|
|
are urgent. Make sure to limit requests to useful blocks only.
|
|
|
|
Some requests will stall. It appears to be very hard to have enough accuracy in
|
|
the prediction of download queue time such that all requests come back within a
|
|
reasonable amount of time.
|
|
|
|
To support adaptive timeuts, each torrent maintains a running average of how
|
|
long it takes to complete a piece. There is also a running average of the
|
|
deviation from the mean download time.
|
|
|
|
This download time is used as the benchmark to determine when blocks have
|
|
timed out, and should be re-requested from another peer.
|
|
|
|
If any time-critical piece has taken more than the average piece download
|
|
time + a half average deviation form that, the piece is considered to have
|
|
timed out. This means we are allowed to double-request blocks. Subsequent
|
|
passes over this piece will make sure that any blocks we don't already have
|
|
are requested one more time.
|
|
|
|
In fact, this scales to multiple time-outs. The time since a download was
|
|
started is divided by average download time + average deviation time / 2.
|
|
The resulting integer is the number if *times* the piece has timed out.
|
|
|
|
Each time a piece times out, another *busy request* is allowed to try to make
|
|
it complete sooner. A busy request is where a block is requested from a peer
|
|
even though it has already been requested from another peer.
|
|
|
|
This has the effect of getting more and more aggressive in requesting blocks
|
|
the longer it takes to complete the piece. If this mechanism is too aggressive,
|
|
a significant amount of bandwidht may be lost in redundant download (keep in
|
|
mind the zero-sum game).
|
|
|
|
It never makes sense to request a block twice from the same peer. There is logic
|
|
in place to prevent this.
|
|
|
|
optimizations
|
|
-------------
|
|
|
|
One optimization is to buffer all piece requests while looping over the time-
|
|
critical pieces and not send them until one round is complete. This increases
|
|
the chances that the request messages are coalesced into the same packet.
|
|
This in turn lowers the number of system calls and network overhead.
|
|
|