initial documentation for bundled support for fuzzing

2019-04-25 11:02:11 +02:00 · 2019-04-25 11:02:11 +02:00 · bb0ae48a75
parent ba731ef0cf
commit bb0ae48a75
7 changed files with 139 additions and 3 deletions
--- a/Makefile.am
+++ b/Makefile.am
@ -59,6 +59,7 @@ DOCS_PAGES = \
  docs/projects.html              \
  docs/python_binding.html        \
  docs/tuning.html                \
+  docs/fuzzing.html               \
  docs/settings.rst               \
  docs/stats_counters.rst         \
  docs/troubleshooting.html       \
@ -80,6 +81,7 @@ DOCS_PAGES = \
  docs/projects.rst               \
  docs/python_binding.rst         \
  docs/tuning.rst                 \
+  docs/fuzzing.rst                \
  docs/troubleshooting.rst        \
  docs/udp_tracker_protocol.rst   \
  docs/utp.rst                    \
--- a/docs/fuzzing.rst
+++ b/docs/fuzzing.rst
@ -0,0 +1,77 @@
+==================
+fuzzing libtorrent
+==================
+
+.. include:: header.rst
+
+.. contents:: Table of contents
+  :depth: 1
+  :backlinks: none
+
+overview
+========
+
+Libtorrent comes with a set of fuzzers. They are not included in the distribution
+tar ball, instead download the `repository snapshot`_ or clone the repository_.
+
+The fuzzers can be found in the `fuzzers` subdirectory and come with a `Jamfile`
+to build them, and a `run.sh` bash script to run them.
+
+.. _`repository snapshot`: https://github.com/arvidn/libtorrent/releases
+.. _repository: https://github.com/arvidn/libtorrent
+
+building
+--------
+
+The fuzzers use clang's libFuzzer, which means they can only be built with clang.
+Clang must be configured in your `user-config.jam`, for example::
+
+	using clang : 7 : clang++-7 ;
+
+When building, you most likely want to stage the resulting binaries into a
+well known location. Invoke `b2` like this::
+
+	b2 clang stage -j$(nproc)
+
+This will build and stage all fuzzers into the `fuzzers/fuzzers` directory.
+
+corpus
+------
+
+Fuzzers work best if they have a relevant seed corpus of example inputs. You
+can either generate one using `fuzzers/tools/generate_initial_corpus.py` or download
+the `corpus.zip` from the github `releases page`_.
+
+To run the script to generate initial corpus, run it with `fuzzers` as the
+current working directory, like this::
+
+	python tools/generate_initial_corpus.py
+
+The corpus should be placed in the `fuzzers` directory, which should also be the
+current working directory when invoking the fuzzer binaries.
+
+.. _`releases page`: https://github.com/arvidn/libtorrent/releases
+
+running fuzzers
+---------------
+
+The `run.sh` script will run all fuzzers in parallel for 48 hours. It can easily
+be tweaked and mostly serve as an example of how to invoke them.
+
+large and small fuzzers
+-----------------------
+
+Since APIs can have different complexity, fuzz targets will also explore
+code of varying complexity. Some fuzzers cover a very small amount of code
+(e.g. `parse_int`) where other fuzz targets cover very large amount of code and
+can potentially go very deep into call stacks (e.g. `torrent_info`).
+
+Small fuzz targets can fairly quickly exhaust all possible code paths and have
+quite limited utility after that, other than as regression tests. When putting
+a lot of CPU into long running fuzzing, it is better spent on large fuzz targets.
+
+For this reason, there's another alias in the `Jamfile` to only build and stage
+large fuzz targets. Call `b2` like this::
+
+	b2 clang stage-large -j$(nproc)
+
--- a/docs/hunspell/libtorrent.dic
+++ b/docs/hunspell/libtorrent.dic
@ -431,8 +431,6 @@ leechers
 printability
 podcasts
 todo
-
-
 0x10
 0x41727101980
 0x7fffffffffffffff
@ -466,3 +464,7 @@ txt
 un
 v12
 v2
+fuzzers
+fuzzer
+libFuzzer
+clang's
--- a/docs/index.rst
+++ b/docs/index.rst
@ -15,6 +15,7 @@
 * building_
 * troubleshooting_
 * `tuning`_
+* fuzzing_
 * screenshot_
 * `mailing list`_ (archive_)
 * `who's using libtorrent?`_
@ -70,6 +71,7 @@ libtorrent
 .. _`libtorrent 1.2`: upgrade_to_1.2-ref.html
 .. _troubleshooting: troubleshooting.html
 .. _`tuning`: tuning.html
+.. _fuzzing: fuzzing.html
 .. _screenshot: client_test.png
 .. _`uTP`: utp.html
 .. _`extensions protocol`: extension_protocol.html
--- a/docs/makefile
+++ b/docs/makefile
@ -50,7 +50,8 @@ MANUAL_TARGETS = index \
 	tuning \
 	hacking \
 	streaming \
-	tutorial
+	tutorial \
+	fuzzing

 TARGETS = single-page-ref \
 	$(MANUAL_TARGETS) \
--- a/fuzzers/Jamfile
+++ b/fuzzers/Jamfile
@ -66,7 +66,21 @@ fuzzer dht_node ;
 fuzzer utp ;
 fuzzer resume_data ;

+local LARGE_TARGETS =
+	torrent_info
+	lazy_bdecode
+	bdecode_node
+	http_parser
+	dht_node
+	utp
+	resume_data
+	file_storage_add_file
+	sanitize_path
+	;
+
 install stage : $(TARGETS) : <install-type>EXE <location>fuzzers ;
+install stage-large : $(LARGE_TARGETS) : <install-type>EXE <location>fuzzers ;

 explicit stage ;
+explicit stage-large ;

--- a/fuzzers/tools/generate_initial_corpus.py
+++ b/fuzzers/tools/generate_initial_corpus.py
@ -0,0 +1,38 @@
+import os
+import shutil
+import hashlib
+
+corpus_dirs = [
+    'torrent_info', 'upnp', 'gzip' 'base32decode', 'base32encode',
+    'base64encode', 'bdecode_node', 'convert_from_native', 'convert_to_native',
+    'dht_node', 'escape_path', 'escape_string', 'file_storage_add_file', 'gzip',
+    'http_parser', 'lazy_bdecode', 'parse_int', 'parse_magnet_uri', 'resume_data',
+    'sanitize_path', 'torrent_info', 'upnp', 'utf8_codepoint', 'utf8_wchar', 'utp',
+    'verify_encoding', 'wchar_utf8']
+
+for p in corpus_dirs:
+    try:
+        os.makedirs(os.path.join('corpus', p))
+    except Exception as e:
+        print(e)
+
+torrent_dir = '../test/test_torrents'
+for f in os.listdir(torrent_dir):
+    shutil.copy(os.path.join(torrent_dir, f), os.path.join('corpus', 'torrent_info'))
+
+xml_tests = [
+    '<a blah="b"></a>', '<a b=c></a>', '<a b"c"></a>', '<a b="c></a>',
+    '<![CDATA[<sender>John Smith</sender>]]>', '<![CDATA[<sender>John S',
+    '<!-- comment -->', '<empty></empty>', '<tag',
+    '''<?xml version="1.0" encoding="ISO-8859-1" ?>
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"></xs:schema>''',
+    '<selfclosing />']
+
+for x in xml_tests:
+    name = hashlib.sha1(x).hexdigest()
+    with open(os.path.join('corpus', 'upnp', name), 'w+') as f:
+        f.write(x)
+
+gzip_dir = '../test'
+for f in ['zeroes.gz', 'corrupt.gz', 'invalid1.gz']:
+    shutil.copy(os.path.join(gzip_dir, f), os.path.join('corpus', 'gzip'))