Improve libdeflate's worst-case performance decompressing malicious
DEFLATE streams by about 14x, bringing it within a factor of about 2x of
zlib, by skipping rebuilding the decode tables for the static Huffman
codes when they're already loaded into the decompressor.
This improves performance decompressing a stream of all empty static
Huffman blocks from about 0.36 MB/s to 175 MB/s, or the original
reproducer given on the Github issue from about 3.3 MB/s to 219 MB/s.
A regression test is added for these cases as well as the empty dynamic
Huffman blocks case to verify worst-case performance comparable to zlib.
Resolves https://github.com/ebiggers/libdeflate/issues/33
When the block splitting algorithm was implemented, it became possible
for the compressor to use longer blocks, up to ~300KB. Unfortunately it
was overlooked that this can allow literal runs > 65535 bytes, while in
one place the length of a literal run was still being stored in a u16.
To overflow the litrunlen and hit the bug the data would have had to
have far fewer matches than random data, which is possible but very
unusual. Fix the bug by reserving more space to hold the litrunlen, and
add a test for it.