216 Commits

Author SHA1 Message Date
Eric Biggers
333eff73b2 tools/run_tests.sh: run all test programs in exec_tests.sh 2018-12-28 10:25:44 -06:00
Eric Biggers
d3878bc8ae programs: new test program - test_incomplete_codes 2018-12-28 10:25:44 -06:00
Eric Biggers
c398e237b6 programs: move output_bitstream to test_util 2018-12-28 10:25:44 -06:00
Eric Biggers
ce6a95f47b programs: add test_util
Move program utility functions that are used only by "test programs"
(i.e. not by gzip/gunzip) from prog_util.{c,h} into test_util.{c,h}.
This reduces the code that is compiled for the default build target,
which excludes the test programs.
2018-12-28 10:25:44 -06:00
Eric Biggers
a64bd1e830 lib/deflate_decompress: optimize build_decode_table() via table doubling
Another build_decode_table() optimization: rather than filling all the
entries for each codeword using strided stores, just fill one initially
and fill the rest by memcpy()s as the table is incrementally expanded.

Also make some other cleanups and small optimizations.
2018-12-27 17:10:23 -06:00
dawg
954b59041a Include stdlib.h for _byteswap_* on MSVC. 2018-12-26 10:23:14 -06:00
Eric Biggers
bfc3f610e1 lib/deflate_decompress: build subtables separately
Further improve build_decode_table() performance by splitting the "fill
direct entries" and "fill subtable pointers and subtables" steps into
separate loops and making some other optimizations.
2018-12-25 23:57:43 -06:00
Eric Biggers
515b7ad15c lib/deflate_decompress: move len_counts[] and offsets[] to stack
This improves performance, and these arrays are not very large.
2018-12-25 22:15:10 -06:00
Eric Biggers
1a3f34eab9 lib/deflate_decompress: optimize codeword incrementing 2018-12-25 21:29:13 -06:00
Eric Biggers
a25f3b86d7 lib/deflate_decompress: further optimize match copying 2018-12-25 18:14:32 -06:00
Orivej Desh
6750e4f19d Makefile: make the installation prefix configurable 2018-12-25 14:40:48 -06:00
Eric Biggers
170c24190a lib/deflate_decompress: further optimize refilling the bitbuffer 2018-12-25 14:16:38 -06:00
Eric Biggers
1c3609da7b lib/deflate_decompress: store decode results pre-shifted
This slightly speeds up decode table building, since now the decode
results don't need to be shifted at runtime when building the tables.
2018-12-25 14:16:38 -06:00
Eric Biggers
eed4829c16 lib/deflate_decompress: fix a comment 2018-12-25 14:16:38 -06:00
Eric Biggers
73017f08e5 lib/x86/adler32: add an AVX-512BW optimized Adler32 implementation 2018-12-24 17:36:07 -06:00
Eric Biggers
5c80decb26 common/x86: detect AVX-512BW intrinsics support 2018-12-24 17:36:07 -06:00
Eric Biggers
4548033845 lib/x86/cpu_features: detect AVX-512BW support 2018-12-24 17:36:07 -06:00
Eric Biggers
6a05e63bbb v1.1 v1.1 2018-12-23 13:13:28 -06:00
Eric Biggers
6e7813e8fa Makefile: support user-specified CPPFLAGS 2018-12-23 13:13:28 -06:00
Eric Biggers
dfd839df4e test_checksums: test with guard page 2018-12-23 12:34:50 -06:00
Eric Biggers
a5a4822e2a prog_util: add guarded buffer allocator 2018-12-23 12:34:50 -06:00
Eric Biggers
57cab078f1 lib: optimize decompressing repeated static Huffman blocks
Improve libdeflate's worst-case performance decompressing malicious
DEFLATE streams by about 14x, bringing it within a factor of about 2x of
zlib, by skipping rebuilding the decode tables for the static Huffman
codes when they're already loaded into the decompressor.

This improves performance decompressing a stream of all empty static
Huffman blocks from about 0.36 MB/s to 175 MB/s, or the original
reproducer given on the Github issue from about 3.3 MB/s to 219 MB/s.
A regression test is added for these cases as well as the empty dynamic
Huffman blocks case to verify worst-case performance comparable to zlib.

Resolves https://github.com/ebiggers/libdeflate/issues/33
2018-12-23 12:03:00 -06:00
Eric Biggers
dd1c157750 prog_util: add timer_KB_per_s() 2018-12-23 12:03:00 -06:00
Eric Biggers
6c26eb18ea prog_util: add ASSERT() macro 2018-12-23 12:03:00 -06:00
Eric Biggers
becd91bb63 lib/arm: NEON intrinsics require hardware floating point support
NEON intrinsics cannot be used when compiling for an ARM CPU without
hardware floating point support, e.g. the Debian armel port.  In this
case arm_neon.h cannot even be included as it causes an #error.

[Based on a patch by Adrian Bunk <bunk@debian.org>, but changed to check
 for __ARM_FP instead of !__SOFTFP__ to be consistent with arm_neon.h,
 and added a comment.]
2018-12-22 00:00:04 -06:00
Eric Biggers
d6d50c6955 Fix stack alignment in 32-bit Windows builds
Resolves https://github.com/ebiggers/libdeflate/issues/35
2018-12-08 10:11:11 -08:00
Eric Biggers
906c54c16f Makefile: make changing libdeflate.h trigger a rebuild 2018-12-08 10:11:11 -08:00
Eric Biggers
65fd37d987 Add soname to shared library
To match common shared library packaging conventions: name the shared
library libdeflate.so.0, with matching soname, and make libdeflate.so
a symlink that points to it.
2018-12-06 21:43:08 -08:00
Eric Biggers
7fad94b8c9 Include import library in Windows binary releases
Previously:
	- libdeflate.dll: the dynamic library
	- libdeflate.lib: the static library

Now:
	- libdeflate.dll: the dynamic library
	- libdeflate.lib: the import library
	- libdeflatestatic.lib: the static library
2018-12-06 20:13:18 -08:00
Eric Biggers
2b6689d8aa Support 'make install' and 'make uninstall' 2018-06-14 22:58:46 -07:00
Eric Biggers
89b2d68aac README updates 2018-05-18 19:33:51 -07:00
ebiggers
203c1a8989
Merge pull request #32 from antonblanchard/ppc64_unaligned
Set UNALIGNED_ACCESS_IS_FAST on powerpc64
2018-05-12 21:49:24 -07:00
Anton Blanchard
9205845a16 Set UNALIGNED_ACCESS_IS_FAST on powerpc64
All 64bit PowerPC CPUs handle unaligned accesses reasonably fast, so
set UNALIGNED_ACCESS_IS_FAST.

Decompression of the snappy html test case is almost 50% faster on
POWER9 with this patch applied.
2018-05-13 07:48:40 +10:00
Eric Biggers
9a327aae41 v1.0 v1.0 2018-04-13 22:46:08 -07:00
Eric Biggers
e9d1014161 tools/checksum_benchmarks.sh: fix detecting/disabling NEON on AArch64 2018-03-03 13:04:13 -08:00
Eric Biggers
6eef15d6f3 lib/arm: fix PMULL detection on AArch64 2018-03-03 12:47:50 -08:00
Eric Biggers
fc2ea22b44 lib/arm: add ARM PMULL implementation of CRC-32
Add an ARM PMULL implementation of CRC-32.  This is based on a patch by
Jun He <jun.he@linaro.org> as well as the x86 PCLMUL implementation.
2018-02-18 23:03:26 -08:00
Eric Biggers
1fb34f86b5 lib: add template for vectorized CRC-32 implementations 2018-02-18 23:03:26 -08:00
Eric Biggers
0c62e25464 tools/run_tests.sh: detect gcc without multilib support 2018-02-18 23:03:26 -08:00
Eric Biggers
5f3afad793 tools/run_tests.sh: run checksum benchmarks 2018-02-18 23:03:26 -08:00
Eric Biggers
2f4315c21c tools/run_tests.sh: more Android tests 2018-02-18 23:03:26 -08:00
Eric Biggers
794a40401d tools/android_build.sh: move -pie to LDFLAGS 2018-02-18 23:03:26 -08:00
Eric Biggers
4282583b9b tools/android_build.sh: support crypto extensions 2018-02-18 23:03:26 -08:00
Eric Biggers
e7aa4666e0 tools/checksum_benchmarks.sh: various improvements
Make it compatible with the new code organization, make it run the
test_checksums program for each implementation, and run each
implementation in both 64-bit and 32-bit modes.
2018-02-18 23:03:26 -08:00
Eric Biggers
bf0797e666 programs/test_checksums: test Adler-32 overflow cases 2018-02-18 23:03:26 -08:00
Eric Biggers
fb5c6a8c85 lib/arm: allow choosing adler32_neon() at runtime
Now that we detect CPU features on ARM, allow the NEON implementation of
Adler-32 to be selected at runtime based on the presence of the NEON
feature.
2018-02-18 23:03:26 -08:00
Eric Biggers
2575ede5ff lib/arm: add ARM CPU feature detection (Linux only for now) 2018-02-18 23:03:26 -08:00
Eric Biggers
8d58d51160 common: detect ARM NEON and PMULL target intrinsics 2018-02-18 23:03:26 -08:00
Eric Biggers
1617206086 lib/x86: allow choosing adler32_sse2() at runtime
Now that we detect CPU features on 32-bit x86, allow the SSE2
implementation of Adler-32 to be selected at runtime based on the
presence of the SSE2 feature.
2018-02-18 23:03:26 -08:00
Eric Biggers
0d1260be99 lib/x86: allow CPU feature detection on 32-bit x86
The SSE2, AVX2, BMI2, etc. code actually works on 32-bit x86 if the CPU
has those features.  So there is no need to restrict it to x86_64-only.
2018-02-18 23:03:26 -08:00