410 Commits

Author SHA1 Message Date
Eric Biggers
bfc3f610e1 lib/deflate_decompress: build subtables separately
Further improve build_decode_table() performance by splitting the "fill
direct entries" and "fill subtable pointers and subtables" steps into
separate loops and making some other optimizations.
2018-12-25 23:57:43 -06:00
Eric Biggers
515b7ad15c lib/deflate_decompress: move len_counts[] and offsets[] to stack
This improves performance, and these arrays are not very large.
2018-12-25 22:15:10 -06:00
Eric Biggers
1a3f34eab9 lib/deflate_decompress: optimize codeword incrementing 2018-12-25 21:29:13 -06:00
Eric Biggers
a25f3b86d7 lib/deflate_decompress: further optimize match copying 2018-12-25 18:14:32 -06:00
Orivej Desh
6750e4f19d Makefile: make the installation prefix configurable 2018-12-25 14:40:48 -06:00
Eric Biggers
170c24190a lib/deflate_decompress: further optimize refilling the bitbuffer 2018-12-25 14:16:38 -06:00
Eric Biggers
1c3609da7b lib/deflate_decompress: store decode results pre-shifted
This slightly speeds up decode table building, since now the decode
results don't need to be shifted at runtime when building the tables.
2018-12-25 14:16:38 -06:00
Eric Biggers
eed4829c16 lib/deflate_decompress: fix a comment 2018-12-25 14:16:38 -06:00
Eric Biggers
73017f08e5 lib/x86/adler32: add an AVX-512BW optimized Adler32 implementation 2018-12-24 17:36:07 -06:00
Eric Biggers
5c80decb26 common/x86: detect AVX-512BW intrinsics support 2018-12-24 17:36:07 -06:00
Eric Biggers
4548033845 lib/x86/cpu_features: detect AVX-512BW support 2018-12-24 17:36:07 -06:00
Eric Biggers
6a05e63bbb v1.1 v1.1 2018-12-23 13:13:28 -06:00
Eric Biggers
6e7813e8fa Makefile: support user-specified CPPFLAGS 2018-12-23 13:13:28 -06:00
Eric Biggers
dfd839df4e test_checksums: test with guard page 2018-12-23 12:34:50 -06:00
Eric Biggers
a5a4822e2a prog_util: add guarded buffer allocator 2018-12-23 12:34:50 -06:00
Eric Biggers
57cab078f1 lib: optimize decompressing repeated static Huffman blocks
Improve libdeflate's worst-case performance decompressing malicious
DEFLATE streams by about 14x, bringing it within a factor of about 2x of
zlib, by skipping rebuilding the decode tables for the static Huffman
codes when they're already loaded into the decompressor.

This improves performance decompressing a stream of all empty static
Huffman blocks from about 0.36 MB/s to 175 MB/s, or the original
reproducer given on the Github issue from about 3.3 MB/s to 219 MB/s.
A regression test is added for these cases as well as the empty dynamic
Huffman blocks case to verify worst-case performance comparable to zlib.

Resolves https://github.com/ebiggers/libdeflate/issues/33
2018-12-23 12:03:00 -06:00
Eric Biggers
dd1c157750 prog_util: add timer_KB_per_s() 2018-12-23 12:03:00 -06:00
Eric Biggers
6c26eb18ea prog_util: add ASSERT() macro 2018-12-23 12:03:00 -06:00
Eric Biggers
becd91bb63 lib/arm: NEON intrinsics require hardware floating point support
NEON intrinsics cannot be used when compiling for an ARM CPU without
hardware floating point support, e.g. the Debian armel port.  In this
case arm_neon.h cannot even be included as it causes an #error.

[Based on a patch by Adrian Bunk <bunk@debian.org>, but changed to check
 for __ARM_FP instead of !__SOFTFP__ to be consistent with arm_neon.h,
 and added a comment.]
2018-12-22 00:00:04 -06:00
Eric Biggers
d6d50c6955 Fix stack alignment in 32-bit Windows builds
Resolves https://github.com/ebiggers/libdeflate/issues/35
2018-12-08 10:11:11 -08:00
Eric Biggers
906c54c16f Makefile: make changing libdeflate.h trigger a rebuild 2018-12-08 10:11:11 -08:00
Eric Biggers
65fd37d987 Add soname to shared library
To match common shared library packaging conventions: name the shared
library libdeflate.so.0, with matching soname, and make libdeflate.so
a symlink that points to it.
2018-12-06 21:43:08 -08:00
Eric Biggers
7fad94b8c9 Include import library in Windows binary releases
Previously:
	- libdeflate.dll: the dynamic library
	- libdeflate.lib: the static library

Now:
	- libdeflate.dll: the dynamic library
	- libdeflate.lib: the import library
	- libdeflatestatic.lib: the static library
2018-12-06 20:13:18 -08:00
Eric Biggers
2b6689d8aa Support 'make install' and 'make uninstall' 2018-06-14 22:58:46 -07:00
Eric Biggers
89b2d68aac README updates 2018-05-18 19:33:51 -07:00
ebiggers
203c1a8989
Merge pull request #32 from antonblanchard/ppc64_unaligned
Set UNALIGNED_ACCESS_IS_FAST on powerpc64
2018-05-12 21:49:24 -07:00
Anton Blanchard
9205845a16 Set UNALIGNED_ACCESS_IS_FAST on powerpc64
All 64bit PowerPC CPUs handle unaligned accesses reasonably fast, so
set UNALIGNED_ACCESS_IS_FAST.

Decompression of the snappy html test case is almost 50% faster on
POWER9 with this patch applied.
2018-05-13 07:48:40 +10:00
Eric Biggers
9a327aae41 v1.0 v1.0 2018-04-13 22:46:08 -07:00
Eric Biggers
e9d1014161 tools/checksum_benchmarks.sh: fix detecting/disabling NEON on AArch64 2018-03-03 13:04:13 -08:00
Eric Biggers
6eef15d6f3 lib/arm: fix PMULL detection on AArch64 2018-03-03 12:47:50 -08:00
Eric Biggers
fc2ea22b44 lib/arm: add ARM PMULL implementation of CRC-32
Add an ARM PMULL implementation of CRC-32.  This is based on a patch by
Jun He <jun.he@linaro.org> as well as the x86 PCLMUL implementation.
2018-02-18 23:03:26 -08:00
Eric Biggers
1fb34f86b5 lib: add template for vectorized CRC-32 implementations 2018-02-18 23:03:26 -08:00
Eric Biggers
0c62e25464 tools/run_tests.sh: detect gcc without multilib support 2018-02-18 23:03:26 -08:00
Eric Biggers
5f3afad793 tools/run_tests.sh: run checksum benchmarks 2018-02-18 23:03:26 -08:00
Eric Biggers
2f4315c21c tools/run_tests.sh: more Android tests 2018-02-18 23:03:26 -08:00
Eric Biggers
794a40401d tools/android_build.sh: move -pie to LDFLAGS 2018-02-18 23:03:26 -08:00
Eric Biggers
4282583b9b tools/android_build.sh: support crypto extensions 2018-02-18 23:03:26 -08:00
Eric Biggers
e7aa4666e0 tools/checksum_benchmarks.sh: various improvements
Make it compatible with the new code organization, make it run the
test_checksums program for each implementation, and run each
implementation in both 64-bit and 32-bit modes.
2018-02-18 23:03:26 -08:00
Eric Biggers
bf0797e666 programs/test_checksums: test Adler-32 overflow cases 2018-02-18 23:03:26 -08:00
Eric Biggers
fb5c6a8c85 lib/arm: allow choosing adler32_neon() at runtime
Now that we detect CPU features on ARM, allow the NEON implementation of
Adler-32 to be selected at runtime based on the presence of the NEON
feature.
2018-02-18 23:03:26 -08:00
Eric Biggers
2575ede5ff lib/arm: add ARM CPU feature detection (Linux only for now) 2018-02-18 23:03:26 -08:00
Eric Biggers
8d58d51160 common: detect ARM NEON and PMULL target intrinsics 2018-02-18 23:03:26 -08:00
Eric Biggers
1617206086 lib/x86: allow choosing adler32_sse2() at runtime
Now that we detect CPU features on 32-bit x86, allow the SSE2
implementation of Adler-32 to be selected at runtime based on the
presence of the SSE2 feature.
2018-02-18 23:03:26 -08:00
Eric Biggers
0d1260be99 lib/x86: allow CPU feature detection on 32-bit x86
The SSE2, AVX2, BMI2, etc. code actually works on 32-bit x86 if the CPU
has those features.  So there is no need to restrict it to x86_64-only.
2018-02-18 23:03:26 -08:00
Eric Biggers
58978af429 lib: make CPU feature masks and dispatch pointers volatile
Use 'volatile' for the CPU feature masks and dispatched function
pointers.  We don't need memory barriers for them, so 'volatile' is good
enough to stop the compiler from inserting bogus reads/writes.
2018-02-18 23:03:26 -08:00
Eric Biggers
4829a5add2 lib: refactor architecture-specific code
Move the x86 and ARM-specific code into their own directories to prevent
it from cluttering up the main library.  This will make it a bit easier
to add new architecture-specific code.

But to avoid complicating things too much for people who aren't using
the provided Makefile, we still just compile all .c files for all
architectures (irrelevant ones end up #ifdef'ed out), and the headers
are included explicitly for each architecture so that an
architecture-specific include path isn't needed.  So, now people just
need to compile both lib/*.c and lib/*/*.c instead of only lib/*.c.
2018-02-18 23:03:26 -08:00
Eric Biggers
0191c6bc26 lib: remove unused x86_cpu_features functionality
Remove the unused CPU features as well as the DEBUG code.
2018-02-18 23:03:26 -08:00
Eric Biggers
f76dcd5ee1 common: replace COMPILER_SUPPORTS_TARGET_INTRINSICS
Replace COMPILER_SUPPORTS_TARGET_INTRINSICS with macros for the
individual features, since COMPILER_SUPPORTS_TARGET_INTRINSICS was
x86-specific and would cause confusion when we try to use intrinsics in
'target' functions for other architectures.
2018-02-18 23:03:26 -08:00
Eric Biggers
5a9d25a892 Support multi-member gzip files 2017-11-20 00:35:24 -08:00
Eric Biggers
3d96a83ef9 v0.8 v0.8 2017-07-29 14:38:03 -07:00