libdeflate

mirror of https://github.com/cuberite/libdeflate.git synced 2025-09-08 11:50:00 -04:00

Author	SHA1	Message	Date
Eric Biggers	333eff73b2	tools/run_tests.sh: run all test programs in exec_tests.sh	2018-12-28 10:25:44 -06:00
Eric Biggers	d3878bc8ae	programs: new test program - test_incomplete_codes	2018-12-28 10:25:44 -06:00
Eric Biggers	c398e237b6	programs: move output_bitstream to test_util	2018-12-28 10:25:44 -06:00
Eric Biggers	ce6a95f47b	programs: add test_util Move program utility functions that are used only by "test programs" (i.e. not by gzip/gunzip) from prog_util.{c,h} into test_util.{c,h}. This reduces the code that is compiled for the default build target, which excludes the test programs.	2018-12-28 10:25:44 -06:00
Eric Biggers	a64bd1e830	lib/deflate_decompress: optimize build_decode_table() via table doubling Another build_decode_table() optimization: rather than filling all the entries for each codeword using strided stores, just fill one initially and fill the rest by memcpy()s as the table is incrementally expanded. Also make some other cleanups and small optimizations.	2018-12-27 17:10:23 -06:00
dawg	954b59041a	Include stdlib.h for _byteswap_* on MSVC.	2018-12-26 10:23:14 -06:00
Eric Biggers	bfc3f610e1	lib/deflate_decompress: build subtables separately Further improve build_decode_table() performance by splitting the "fill direct entries" and "fill subtable pointers and subtables" steps into separate loops and making some other optimizations.	2018-12-25 23:57:43 -06:00
Eric Biggers	515b7ad15c	lib/deflate_decompress: move len_counts[] and offsets[] to stack This improves performance, and these arrays are not very large.	2018-12-25 22:15:10 -06:00
Eric Biggers	1a3f34eab9	lib/deflate_decompress: optimize codeword incrementing	2018-12-25 21:29:13 -06:00
Eric Biggers	a25f3b86d7	lib/deflate_decompress: further optimize match copying	2018-12-25 18:14:32 -06:00
Orivej Desh	6750e4f19d	Makefile: make the installation prefix configurable	2018-12-25 14:40:48 -06:00
Eric Biggers	170c24190a	lib/deflate_decompress: further optimize refilling the bitbuffer	2018-12-25 14:16:38 -06:00
Eric Biggers	1c3609da7b	lib/deflate_decompress: store decode results pre-shifted This slightly speeds up decode table building, since now the decode results don't need to be shifted at runtime when building the tables.	2018-12-25 14:16:38 -06:00
Eric Biggers	eed4829c16	lib/deflate_decompress: fix a comment	2018-12-25 14:16:38 -06:00
Eric Biggers	73017f08e5	lib/x86/adler32: add an AVX-512BW optimized Adler32 implementation	2018-12-24 17:36:07 -06:00
Eric Biggers	5c80decb26	common/x86: detect AVX-512BW intrinsics support	2018-12-24 17:36:07 -06:00
Eric Biggers	4548033845	lib/x86/cpu_features: detect AVX-512BW support	2018-12-24 17:36:07 -06:00
Eric Biggers	6a05e63bbb	v1.1 v1.1	2018-12-23 13:13:28 -06:00
Eric Biggers	6e7813e8fa	Makefile: support user-specified CPPFLAGS	2018-12-23 13:13:28 -06:00
Eric Biggers	dfd839df4e	test_checksums: test with guard page	2018-12-23 12:34:50 -06:00
Eric Biggers	a5a4822e2a	prog_util: add guarded buffer allocator	2018-12-23 12:34:50 -06:00
Eric Biggers	57cab078f1	lib: optimize decompressing repeated static Huffman blocks Improve libdeflate's worst-case performance decompressing malicious DEFLATE streams by about 14x, bringing it within a factor of about 2x of zlib, by skipping rebuilding the decode tables for the static Huffman codes when they're already loaded into the decompressor. This improves performance decompressing a stream of all empty static Huffman blocks from about 0.36 MB/s to 175 MB/s, or the original reproducer given on the Github issue from about 3.3 MB/s to 219 MB/s. A regression test is added for these cases as well as the empty dynamic Huffman blocks case to verify worst-case performance comparable to zlib. Resolves https://github.com/ebiggers/libdeflate/issues/33	2018-12-23 12:03:00 -06:00
Eric Biggers	dd1c157750	prog_util: add timer_KB_per_s()	2018-12-23 12:03:00 -06:00
Eric Biggers	6c26eb18ea	prog_util: add ASSERT() macro	2018-12-23 12:03:00 -06:00
Eric Biggers	becd91bb63	lib/arm: NEON intrinsics require hardware floating point support NEON intrinsics cannot be used when compiling for an ARM CPU without hardware floating point support, e.g. the Debian armel port. In this case arm_neon.h cannot even be included as it causes an #error. [Based on a patch by Adrian Bunk <bunk@debian.org>, but changed to check for __ARM_FP instead of !__SOFTFP__ to be consistent with arm_neon.h, and added a comment.]	2018-12-22 00:00:04 -06:00
Eric Biggers	d6d50c6955	Fix stack alignment in 32-bit Windows builds Resolves https://github.com/ebiggers/libdeflate/issues/35	2018-12-08 10:11:11 -08:00
Eric Biggers	906c54c16f	Makefile: make changing libdeflate.h trigger a rebuild	2018-12-08 10:11:11 -08:00
Eric Biggers	65fd37d987	Add soname to shared library To match common shared library packaging conventions: name the shared library libdeflate.so.0, with matching soname, and make libdeflate.so a symlink that points to it.	2018-12-06 21:43:08 -08:00
Eric Biggers	7fad94b8c9	Include import library in Windows binary releases Previously: - libdeflate.dll: the dynamic library - libdeflate.lib: the static library Now: - libdeflate.dll: the dynamic library - libdeflate.lib: the import library - libdeflatestatic.lib: the static library	2018-12-06 20:13:18 -08:00
Eric Biggers	2b6689d8aa	Support 'make install' and 'make uninstall'	2018-06-14 22:58:46 -07:00
Eric Biggers	89b2d68aac	README updates	2018-05-18 19:33:51 -07:00
ebiggers	203c1a8989	Merge pull request #32 from antonblanchard/ppc64_unaligned Set UNALIGNED_ACCESS_IS_FAST on powerpc64	2018-05-12 21:49:24 -07:00
Anton Blanchard	9205845a16	Set UNALIGNED_ACCESS_IS_FAST on powerpc64 All 64bit PowerPC CPUs handle unaligned accesses reasonably fast, so set UNALIGNED_ACCESS_IS_FAST. Decompression of the snappy html test case is almost 50% faster on POWER9 with this patch applied.	2018-05-13 07:48:40 +10:00
Eric Biggers	9a327aae41	v1.0 v1.0	2018-04-13 22:46:08 -07:00
Eric Biggers	e9d1014161	tools/checksum_benchmarks.sh: fix detecting/disabling NEON on AArch64	2018-03-03 13:04:13 -08:00
Eric Biggers	6eef15d6f3	lib/arm: fix PMULL detection on AArch64	2018-03-03 12:47:50 -08:00
Eric Biggers	fc2ea22b44	lib/arm: add ARM PMULL implementation of CRC-32 Add an ARM PMULL implementation of CRC-32. This is based on a patch by Jun He <jun.he@linaro.org> as well as the x86 PCLMUL implementation.	2018-02-18 23:03:26 -08:00
Eric Biggers	1fb34f86b5	lib: add template for vectorized CRC-32 implementations	2018-02-18 23:03:26 -08:00
Eric Biggers	0c62e25464	tools/run_tests.sh: detect gcc without multilib support	2018-02-18 23:03:26 -08:00
Eric Biggers	5f3afad793	tools/run_tests.sh: run checksum benchmarks	2018-02-18 23:03:26 -08:00
Eric Biggers	2f4315c21c	tools/run_tests.sh: more Android tests	2018-02-18 23:03:26 -08:00
Eric Biggers	794a40401d	tools/android_build.sh: move -pie to LDFLAGS	2018-02-18 23:03:26 -08:00
Eric Biggers	4282583b9b	tools/android_build.sh: support crypto extensions	2018-02-18 23:03:26 -08:00
Eric Biggers	e7aa4666e0	tools/checksum_benchmarks.sh: various improvements Make it compatible with the new code organization, make it run the test_checksums program for each implementation, and run each implementation in both 64-bit and 32-bit modes.	2018-02-18 23:03:26 -08:00
Eric Biggers	bf0797e666	programs/test_checksums: test Adler-32 overflow cases	2018-02-18 23:03:26 -08:00
Eric Biggers	fb5c6a8c85	lib/arm: allow choosing adler32_neon() at runtime Now that we detect CPU features on ARM, allow the NEON implementation of Adler-32 to be selected at runtime based on the presence of the NEON feature.	2018-02-18 23:03:26 -08:00
Eric Biggers	2575ede5ff	lib/arm: add ARM CPU feature detection (Linux only for now)	2018-02-18 23:03:26 -08:00
Eric Biggers	8d58d51160	common: detect ARM NEON and PMULL target intrinsics	2018-02-18 23:03:26 -08:00
Eric Biggers	1617206086	lib/x86: allow choosing adler32_sse2() at runtime Now that we detect CPU features on 32-bit x86, allow the SSE2 implementation of Adler-32 to be selected at runtime based on the presence of the SSE2 feature.	2018-02-18 23:03:26 -08:00
Eric Biggers	0d1260be99	lib/x86: allow CPU feature detection on 32-bit x86 The SSE2, AVX2, BMI2, etc. code actually works on 32-bit x86 if the CPU has those features. So there is no need to restrict it to x86_64-only.	2018-02-18 23:03:26 -08:00

1 2 3 4 5

216 Commits