Move program utility functions that are used only by "test programs"
(i.e. not by gzip/gunzip) from prog_util.{c,h} into test_util.{c,h}.
This reduces the code that is compiled for the default build target,
which excludes the test programs.
Another build_decode_table() optimization: rather than filling all the
entries for each codeword using strided stores, just fill one initially
and fill the rest by memcpy()s as the table is incrementally expanded.
Also make some other cleanups and small optimizations.
Improve libdeflate's worst-case performance decompressing malicious
DEFLATE streams by about 14x, bringing it within a factor of about 2x of
zlib, by skipping rebuilding the decode tables for the static Huffman
codes when they're already loaded into the decompressor.
This improves performance decompressing a stream of all empty static
Huffman blocks from about 0.36 MB/s to 175 MB/s, or the original
reproducer given on the Github issue from about 3.3 MB/s to 219 MB/s.
A regression test is added for these cases as well as the empty dynamic
Huffman blocks case to verify worst-case performance comparable to zlib.
Resolves https://github.com/ebiggers/libdeflate/issues/33