418 Commits

Author SHA1 Message Date
Eric Biggers
ea536bcce2 bt_matchfinder: remove best_len_ret parameter
It doesn't seem worthwhile to have bt_matchfinder_get_matches() return
the best_len separately anymore, especially since it doesn't work as
expected due to it not handling length 3 matches.
2022-01-04 21:25:56 -08:00
Eric Biggers
3675136c39 deflate_compress: observe correct items in lazy compressor
Ensure that whenever deflate_choose_*() is called to choose a literal or
match, observe_*() is also called to tally it in the block split
statistics (except in deflate_compress_fastest(), which doesn't use the
block split statistics).  Previously, the lazy compressor contained an
optimization that made the block split statistics differ from the actual
lazy parse.  But that optimization no longer seems to be worthwhile.
2022-01-04 21:25:56 -08:00
Eric Biggers
6559b86a5a deflate_compress: rearrange compressor fields slightly
Put some fields in a more logical order.
2022-01-04 21:15:30 -08:00
Eric Biggers
60f38b4598 deflate_compress: clean up coding style
Use a consistent comment style, consistently limit lines to 80 columns,
consistently use a blank line after declarations, and other cleanups.
2022-01-04 21:15:30 -08:00
Eric Biggers
08692b8696 deflate_compress: refactor writing literals into separate function
Avoid too many levels of indentation.
2022-01-04 21:15:30 -08:00
leleliu008
f6e7593cfc add GitHubActions to build for android
Reference:
https://developer.android.com/ndk/guides/other_build_systems
2022-01-03 15:23:34 -06:00
Eric Biggers
d352945791 deflate_compress: fix checks for sequence store filled
There were off by 1.
2022-01-02 19:44:10 -06:00
Eric Biggers
4dd63ea272 deflate_compress: misc cleanups for new code 2022-01-02 19:44:10 -06:00
Eric Biggers
71db68b27f deflate_compress: adjust block splitting conditions
For fastest, greedy, lazy, and lazy2: save memory by reducing the length
of the sequence store, and forcing a split if it is filled.

For fastest: increase the max block length, but use a relatively short
sequence store that will cause shorter blocks to be used often.

For all: allow the final block to exceed the soft maximum length if it
avoids having to create a block below the minimum length.
2022-01-02 12:30:35 -06:00
Eric Biggers
7c60c4cdaf Makefile: avoid redundant invocations of "$(CC) -dumpmachine" 2022-01-02 11:54:02 -06:00
leleliu008
fcafa11201 buildsys: android apk do not support soversion 2022-01-02 11:23:14 -06:00
Eric Biggers
88d45c7e1c deflate_compress: only use full offset slot map when useful
Reduce the memory usage of compression levels 1-9 by using the condensed
offset slot map instead of the full one.
2022-01-01 20:15:27 -06:00
Eric Biggers
5e9226fff8 deflate_compress: optimize level 1
Introduce deflate_compress_fastest(), and use it for level 1.  It uses
the ht_matchfinder instead of the hc_matchfinder, and it skips the block
splitting algorithm.  This speeds up level 1, without changing the
compression ratio too much (relative to what is expected for level 1).
2022-01-01 20:15:27 -06:00
Eric Biggers
19816c5e26 matchfinder: introduce the ht_matchfinder
Add a new matchfinder optimized for very fast compression.
2022-01-01 20:15:27 -06:00
Eric Biggers
8012927541 matchfinder: rename skip_positions() to skip_bytes()
This is a bit shorter, and perhaps clearer.
2022-01-01 20:15:27 -06:00
Eric Biggers
320c306db3 hc_matchfinder: make skip_positions() return void 2022-01-01 20:15:27 -06:00
Eric Biggers
c16ba46008 deflate_compress: remove unneeded litrunlen variables
Just update deflate_sequence::litrunlen_and_length directly.
2022-01-01 20:15:27 -06:00
Eric Biggers
41685b0dac matchfinder: add MATCHFINDER_ALIGNED macro
Avoid some code duplication.
2022-01-01 20:15:27 -06:00
leleliu008
52b61d98d8 buildsys: change verify macOS platform condition
to support cross-compile for Android on macOS

Fixes #147
2022-01-01 19:48:39 -06:00
Eric Biggers
6ca065f9f8 hc_matchfinder: fix some comments 2022-01-01 09:36:11 -06:00
Eric Biggers
93a06e313e scripts: improve benchmark table script 2022-01-01 08:57:27 -06:00
Eric Biggers
3dca7de4bd deflate_compress: improve costs for near-optimal parsing
Further improve the way the near-optimal parser estimates symbol costs:

- When setting a block's initial costs, weigh the default costs and
  previous block's costs differently, depending on how different the
  current block seems to be from the previous block.

- When determining the "default" costs, take into account how many
  literals appear in the block and how frequent matches seem to be.

- Increase BIT_COST from 8 to 16, to increase precision in calculations.
2021-12-31 17:00:05 -06:00
Eric Biggers
bf3e032f71 deflate_compress: use correct previous costs
When the near-optimal parser sets the initial costs for a block, it
takes into account the costs of the previous block (if there is one).
However, the costs for the previous block were not being updated
following the final codes being built, so the costs from the last
optimization pass were being used instead of the final costs.

Make it so that the final costs are used, as intended.

Also, include the DEFLATE_END_OF_BLOCK symbol in the non-final codes.

In general, these changes improve the compression ratio slightly.
2021-12-31 17:00:05 -06:00
Eric Biggers
8d4a5ae15c deflate_compress: strengthen levels 10-12 slightly
With deflate_compress_near_optimal(), some data benefits more than
originally thought from larger values of max_search_depth and
nice_match_length.  Some data even needs these parameters to be fairly
high for deflate_compress_near_optimal() to compress more than
deflate_compress_lazy2().  Bump these parameters up a bit.
2021-12-31 17:00:05 -06:00
Eric Biggers
69a7ca07fd deflate_compress: automatically select minimum match length
In the greedy and lazy compressors, automatically increase the minimum
match length from the default of 3 if the data doesn't contain many
different literals.  This greatly improves the compression ratio of
levels 1-9 on certain types of data, such as DNA sequencing data, while
not worsening the ratio on other types of data.

The near-optimal compressor (used by compression levels 10-12) continues
to use a minimum match length of 3, since it already did a better job at
deciding when short matches are worthwhile.  (The method for setting the
initial costs needs improvement; later commits address that.)

Resolves https://github.com/ebiggers/libdeflate/issues/57
2021-12-31 17:00:05 -06:00
Eric Biggers
3bc42e23d6 deflate_compress: improve parameter comments and ordering
Make it clearer what each of the #define's do, and rearrange them into a
more logical order.
2021-12-31 17:00:05 -06:00
Eric Biggers
3f706a69bd deflate_compress: use MAX_PRE_CODEWORD_LEN constant
deflate_write_huffman_header() should use MAX_PRE_CODEWORD_LEN, not
DEFLATE_MAX_PRE_CODEWORD_LEN (though these are currently the same).
2021-12-31 17:00:05 -06:00
Eric Biggers
be5aefe42f deflate_compress: rename CACHE_LENGTH to MATCH_CACHE_LENGTH 2021-12-31 17:00:05 -06:00
Eric Biggers
1f45d0b36a deflate_constants: define constant for window order 2021-12-31 17:00:05 -06:00
Eric Biggers
dd42d1a001 deflate_constants: define constant for first len sym 2021-12-31 17:00:05 -06:00
Eric Biggers
0b6127da2d deflate_constants: remove unused constants 2021-12-31 17:00:05 -06:00
Eric Biggers
e5132579a4 deflate_compress: replace COST_SHIFT with BIT_COST
This is a bit easier to understand, and the compiler will optimize the
mulitplications to shifts anyway.
2021-12-31 17:00:05 -06:00
Eric Biggers
f7d3a70d4c deflate_compress: use lazy2 compressor for levels 8-9
Instead of switching directly from the lazy compressor at level 7 to the
near-optimal compressor at level 8, use the lazy2 compressor at levels
8-9 and don't switch to near-optimal until level 10.

This avoids poor compression ratio and bad performance (both
significantly worse than level 7, and significantly worse than zlib) at
levels 8-9 on data where the near-optimal compressor doesn't do well
until the parameters are cranked up.

On data where the near-optimal compressor *does* do well, this change
worsens the compression ratio of levels 8-9, but also speeds them up a
lot, thus positioning them similarly vs. zlib as the lower levels (i.e.
much faster and slightly stronger, rather than slightly faster and much
stronger).  The difference between levels 9 and 10 is increased, but
that's perhaps the least bad place to have a discontinuity.

Resolves https://github.com/ebiggers/libdeflate/issues/85
2021-12-31 17:00:05 -06:00
Eric Biggers
1b3eaf2f13 deflate_compress: introduce the lazy2 compressor
Add deflate_compress_lazy2(), which is a slightly stronger variant of
deflate_compress_lazy().  It looks ahead 2 positions instead of 1.
2021-12-31 17:00:05 -06:00
Eric Biggers
f699b697d6 deflate_compress: tweak level 4 parameters
Use a lower max_search_depth but a higher nice_match_length.  This seems
to turn out a bit better, on average.  This is consistent with what the
other compression levels do; level 4 was the only one that had
nice_match_length <= max_search_depth.
2021-12-31 17:00:05 -06:00
Eric Biggers
057cb92782 deflate_compress: slightly decrease max_search_depth for levels 5-6
The new match scoring method in the lazy compressor has improved the
compression ratio slightly.  Therefore, for levels 5-6 decrease
max_search_depth slightly to get a bit more performance.
2021-12-31 17:00:05 -06:00
Eric Biggers
193dedc73f deflate_compress: improve match scoring in lazy compressor
In the lazy compressor, it's usually worthwhile to (quickly) consider
the match offset too, not just the match length.
2021-12-31 17:00:05 -06:00
Eric Biggers
4b7e9029d1 deflate_compress: don't use far len 3 matches in lazy compressor
It's usually not worth using length 3 matches with a large offset.
2021-12-31 17:00:05 -06:00
Eric Biggers
7e0242d04f deflate_compress: don't use far len 3 matches in greedy compressor
It's usually not worth using length 3 matches with a large offset.
2021-12-31 17:00:05 -06:00
Eric Biggers
4f7fb20776 deflate_compress: skip unneeded work in do_end_block_check()
Summing ->new_observations[] is unnecessary, since
->num_new_observations already contains the sum.
2021-12-31 17:00:05 -06:00
Eric Biggers
dd5b9693cb deflate_compress: clean up deflate_compress_{greedy,lazy,near_optimal}()
Various minor cleanups, such as adjusting the coding style and
refactoring some logic into a helper function.  No "real" changes.
2021-12-31 17:00:05 -06:00
Eric Biggers
12e72cf936 deflate_compress: use DEFLATE_END_OF_BLOCK constant 2021-12-31 17:00:05 -06:00
Eric Biggers
804c6c74f6 scripts: improve afl-fuzz support
Add a proper script which builds the fuzzed programs and runs the
fuzzer.  Also make all compression levels get fuzzed.
2021-12-31 12:19:10 -06:00
Eric Biggers
eafe829b4d Remove "originally public domain" comments
These comments are unnecessary, and they might cause confusion since
they could be misunderstood as being part of the license.
2021-12-31 12:19:10 -06:00
Eric Biggers
a62d3610f0 Makefile: add missing chmod 2021-11-24 19:41:30 -08:00
Eric Biggers
2d2bc2cc8a Makefile: fix up coding style 2021-11-24 19:41:30 -08:00
nick black
0a2b40203d Generate a pkg-config support file #140
We pull the version out of libdeflate.h into the
Makefile, and then sed a few macros out of the new
file libdeflate.pc.in. Set the necessary Cflags
and Libs (CFLAGS and LFLAGS) based off compile-time
definitions. Depend on the Makefile to pick up
version changes. Update uninstall target.

Signed-off-by: nick black <dankamongmen@gmail.com>
2021-11-23 23:28:38 -08:00
Eric Biggers
22c0dd7afd ci.yml: remove ubuntu-16.04
This is no longer supported by GitHub Actions.
2021-11-03 23:07:44 -07:00
Dmitry Bogatov
ee4d18872b Add environment variable to disable building shared library
Environment variable DISABLE_SHARED (following convention of --disable-shared
of ./configure script) disables building of shared library and shared lib
symlink. It makes life of downstream maintainer easier when maintaining package
for environment that supports only static libraries.

See https://github.com/NixOS/nixpkgs/pull/144438
2021-11-03 22:25:22 -07:00
Eric Biggers
047aa84e01 v1.8 v1.8 2021-07-15 09:31:09 -05:00