premiere-libtorrent/docs/fuzzing.rst

103 lines
3.3 KiB
ReStructuredText

==================
fuzzing libtorrent
==================
.. include:: header.rst
.. contents:: Table of contents
:depth: 1
:backlinks: none
overview
========
Libtorrent comes with a set of fuzzers. They are not included in the distribution
tar ball, instead download the `repository snapshot`_ or clone the repository_.
The fuzzers can be found in the `fuzzers` subdirectory and come with a `Jamfile`
to build them, and a `run.sh` bash script to run them.
.. _`repository snapshot`: https://github.com/arvidn/libtorrent/releases
.. _repository: https://github.com/arvidn/libtorrent
building
--------
The fuzzers use clang's libFuzzer, which means they can only be built with clang.
Clang must be configured in your `user-config.jam`, for example::
using clang : 7 : clang++-7 ;
When building, you most likely want to stage the resulting binaries into a
well known location. Invoke `b2` like this::
b2 clang stage -j$(nproc)
This will build and stage all fuzzers into the `fuzzers/fuzzers` directory.
corpus
------
Fuzzers work best if they have a relevant seed corpus of example inputs. You
can either generate one using `fuzzers/tools/generate_initial_corpus.py` or download
the `corpus.zip` from the github `releases page`_.
To run the script to generate initial corpus, run it with `fuzzers` as the
current working directory, like this::
python tools/generate_initial_corpus.py
The corpus should be placed in the `fuzzers` directory, which should also be the
current working directory when invoking the fuzzer binaries.
.. _`releases page`: https://github.com/arvidn/libtorrent/releases
running fuzzers
---------------
The `run.sh` script will run all fuzzers in parallel for 48 hours. It can easily
be tweaked and mostly serve as an example of how to invoke them.
large and small fuzzers
-----------------------
Since APIs can have different complexity, fuzz targets will also explore
code of varying complexity. Some fuzzers cover a very small amount of code
(e.g. `parse_int`) where other fuzz targets cover very large amount of code and
can potentially go very deep into call stacks (e.g. `torrent_info`).
Small fuzz targets can fairly quickly exhaust all possible code paths and have
quite limited utility after that, other than as regression tests. When putting
a lot of CPU into long running fuzzing, it is better spent on large fuzz targets.
For this reason, there's another alias in the `Jamfile` to only build and stage
large fuzz targets. Call `b2` like this::
b2 clang stage-large -j$(nproc)
fast+slow
---------
When building an initial corpus, it can be useful to quickly build a corpus with
a large code coverage. To speed up this process, you can build the fuzzers
without sanitizers, asserts and invariant checks. This won't find as many errors,
but build a good corpus which can then be run against a fully instrumented
fuzzer.
To build the fuzzers in this "fast" mode, there's a build variant `build_coverage`.
Invoke `b2` like this::
b2 clang stage build_coverage -j$(nproc)
For more details on "fast + slow" see `Paul Dreik's talk`_.
.. _`Paul Dreik's talk`: https://youtu.be/e_Oc9SkCo5s?t=1679
sharing corpora
---------------
Before sharing your fuzz corpus, it should be minimized. There is a script
called `minimize.sh` which moves `corpus` to `prev-corpus` and copies over
a minimized set of inputs to a new `corpus` directory.