For fastest, greedy, lazy, and lazy2: save memory by reducing the length
of the sequence store, and forcing a split if it is filled.
For fastest: increase the max block length, but use a relatively short
sequence store that will cause shorter blocks to be used often.
For all: allow the final block to exceed the soft maximum length if it
avoids having to create a block below the minimum length.
Introduce deflate_compress_fastest(), and use it for level 1. It uses
the ht_matchfinder instead of the hc_matchfinder, and it skips the block
splitting algorithm. This speeds up level 1, without changing the
compression ratio too much (relative to what is expected for level 1).
Further improve the way the near-optimal parser estimates symbol costs:
- When setting a block's initial costs, weigh the default costs and
previous block's costs differently, depending on how different the
current block seems to be from the previous block.
- When determining the "default" costs, take into account how many
literals appear in the block and how frequent matches seem to be.
- Increase BIT_COST from 8 to 16, to increase precision in calculations.
When the near-optimal parser sets the initial costs for a block, it
takes into account the costs of the previous block (if there is one).
However, the costs for the previous block were not being updated
following the final codes being built, so the costs from the last
optimization pass were being used instead of the final costs.
Make it so that the final costs are used, as intended.
Also, include the DEFLATE_END_OF_BLOCK symbol in the non-final codes.
In general, these changes improve the compression ratio slightly.
With deflate_compress_near_optimal(), some data benefits more than
originally thought from larger values of max_search_depth and
nice_match_length. Some data even needs these parameters to be fairly
high for deflate_compress_near_optimal() to compress more than
deflate_compress_lazy2(). Bump these parameters up a bit.
In the greedy and lazy compressors, automatically increase the minimum
match length from the default of 3 if the data doesn't contain many
different literals. This greatly improves the compression ratio of
levels 1-9 on certain types of data, such as DNA sequencing data, while
not worsening the ratio on other types of data.
The near-optimal compressor (used by compression levels 10-12) continues
to use a minimum match length of 3, since it already did a better job at
deciding when short matches are worthwhile. (The method for setting the
initial costs needs improvement; later commits address that.)
Resolves https://github.com/ebiggers/libdeflate/issues/57
Instead of switching directly from the lazy compressor at level 7 to the
near-optimal compressor at level 8, use the lazy2 compressor at levels
8-9 and don't switch to near-optimal until level 10.
This avoids poor compression ratio and bad performance (both
significantly worse than level 7, and significantly worse than zlib) at
levels 8-9 on data where the near-optimal compressor doesn't do well
until the parameters are cranked up.
On data where the near-optimal compressor *does* do well, this change
worsens the compression ratio of levels 8-9, but also speeds them up a
lot, thus positioning them similarly vs. zlib as the lower levels (i.e.
much faster and slightly stronger, rather than slightly faster and much
stronger). The difference between levels 9 and 10 is increased, but
that's perhaps the least bad place to have a discontinuity.
Resolves https://github.com/ebiggers/libdeflate/issues/85
Use a lower max_search_depth but a higher nice_match_length. This seems
to turn out a bit better, on average. This is consistent with what the
other compression levels do; level 4 was the only one that had
nice_match_length <= max_search_depth.
The new match scoring method in the lazy compressor has improved the
compression ratio slightly. Therefore, for levels 5-6 decrease
max_search_depth slightly to get a bit more performance.
Remove the ability of matchfinder_init() and matchfinder_rebase() to
fail due to the matchfinder memory size being misaligned. Instead,
require that the size always be 128-byte aligned -- which is already the
case. Also, make the matchfinder memory always be 32-byte aligned --
which doesn't really have any downside.
The cutoff for outputting uncompressed data is currently < 16 bytes for
all compression levels. That isn't ideal, since the higher the
compression level, the more we should bother with very small inputs; and
the lower the compression level, the less we should bother.
Use a formula that produces the following cutoffs:
Level Cutoff
----- ------
0 56
1 52
2 48
3 44
4 40
5 36
6 32
7 28
8 24
9 20
10 16
11 12
12 8
Update https://github.com/ebiggers/libdeflate/issues/67
Some users may require a valid DEFLATE, zlib, or gzip stream but know
ahead of time that particular inputs are not compressible. zlib
supports "level 0" for this use case. Support this in libdeflate too.
Resolves https://github.com/ebiggers/libdeflate/issues/86
Allow building libdeflate without linking to any libc functions by using
'make FREESTANDING=1'. When using such a library build, the user will
need to call libdeflate_set_memory_allocator() before anything else,
since malloc() and free() will be unavailable.
[Folded in fix from Ingvar Stepanyan to use -nostdlib, and made
freestanding_tests() check that no libs are linked to.]
Update https://github.com/ebiggers/libdeflate/issues/62
In preparation for adding custom memory allocator support, don't call
the standard memory allocation functions directly but rather wrap them
with libdeflate_malloc() and libdeflate_free().
Unfortunately, MSVC only accepts __stdcall after the return type, while
gcc only accepts __attribute__((visibility("default"))) before the
return type. So we need a macro in each location.
Also, MSVC doesn't define __i386__; that's gcc specific. So instead use
'_WIN32 && !_WIN64' to detect 32-bit Windows.
The API returns the compressed size as a size_t, so
deflate_flush_output() needs to return size_t as well, not u32.
Otherwise sizes >= 4 GiB are truncated.
This bug has been there since the beginning.
(This only matters if you compress a buffer that large in a single go,
which obviously is not a good idea, but no matter -- it's a bug.)
Resolves https://github.com/ebiggers/libdeflate/issues/44
When the block splitting algorithm was implemented, it became possible
for the compressor to use longer blocks, up to ~300KB. Unfortunately it
was overlooked that this can allow literal runs > 65535 bytes, while in
one place the length of a literal run was still being stored in a u16.
To overflow the litrunlen and hit the bug the data would have had to
have far fewer matches than random data, which is possible but very
unusual. Fix the bug by reserving more space to hold the litrunlen, and
add a test for it.
I've decided to simplify and standardize the licensing status for the
library by using the MIT license instead of CC0 (a.k.a. "public
domain"). This eliminates the somewhat controversial 4(a) clause in
CC0, and, for this and other reasons, should (somewhat ironically) make
it easier for some people to use and contribute to the project.
Note: copyright will apply to new changes and to new versions of the
work as a whole. Of course, versions previously released as public
domain remain public domain where legally recognized.
It was reported that API symbols were being "exported" from the static
library built with MSVC, causing them to remain exported after being
linked into another program. It turns out this was actually a problem
outside of MSVC as well. The solution is to always build the static and
shared libraries from different object files, where the API symbols are
exported from the shared library object files but not from the static
library object files.
Reported-by: Joergen Ibsen <ji@ibse.dk>