libdeflate

mirror of https://github.com/cuberite/libdeflate.git synced 2025-09-22 02:39:45 -04:00

Author	SHA1	Message	Date
Eric Biggers	eafe829b4d	Remove "originally public domain" comments These comments are unnecessary, and they might cause confusion since they could be misunderstood as being part of the license.	2021-12-31 12:19:10 -06:00
cielavenir	9b565afd99	Fix ICC compilation - crc32: On ICC, __v2di is defined in immintrin.h - adler32.c: __v64qi etc are not available on ICC	2021-05-06 23:10:58 -07:00
Eric Biggers	83a1bbf1d3	lib: consistently use include guards A lot of the internal library headers don't have include guards because they aren't needed. It might look like a bug, though, and it doesn't hurt to add them. So do this. Update https://github.com/ebiggers/libdeflate/issues/117	2021-03-12 00:07:30 -08:00
Eric Biggers	4a0bb736c9	lib: make freestanding memset() et al. symbols "weak" This allows these symbols to be overridden by another definition of these symbols somewhere else in the binary. Resolves https://github.com/ebiggers/libdeflate/issues/107	2020-11-23 18:47:15 -08:00
Eric Biggers	f8057e8805	lib/matchfinder: document matchfinder_rebase() preconditions	2020-10-28 19:20:30 -07:00
Eric Biggers	ff8634427b	lib/matchfinder: simplify init and rebase Remove the ability of matchfinder_init() and matchfinder_rebase() to fail due to the matchfinder memory size being misaligned. Instead, require that the size always be 128-byte aligned -- which is already the case. Also, make the matchfinder memory always be 32-byte aligned -- which doesn't really have any downside.	2020-10-25 22:42:25 -07:00
Eric Biggers	166084acaa	lib/deflate_compress: select min_size_to_compress based on level The cutoff for outputting uncompressed data is currently < 16 bytes for all compression levels. That isn't ideal, since the higher the compression level, the more we should bother with very small inputs; and the lower the compression level, the less we should bother. Use a formula that produces the following cutoffs: Level Cutoff ----- ------ 0 56 1 52 2 48 3 44 4 40 5 36 6 32 7 28 8 24 9 20 10 16 11 12 12 8 Update https://github.com/ebiggers/libdeflate/issues/67	2020-10-18 18:37:51 -07:00
Eric Biggers	ef936b6521	lib/x86/adler32: use unsigned vector types This is needed to avoid the following error when using -fsanitize=undefined with gcc: lib/x86/adler32_impl.h:214:2: runtime error: signed integer overflow: 1951294680 + 1956941400 cannot be represented in type 'int' Note that this isn't seen when using -fsanitize=undefined with clang. Old compilers don't have unsigned vector types, so work around that.	2020-10-18 15:14:15 -07:00
Eric Biggers	ea88fa822f	lib/arm/crc32: add support for ARM CRC32 instructions Add a CRC32 implementation that uses the ARM CRC32 instructions. This is simpler and faster than the PMULL implementation. On AWS Graviton2, the performance improvement is about 70%. On Hikey960, the performance improvement is about 30% for the Cortex-A53 cores or about 5% for the Cortex-A73 cores. Based on work by Greg V <greg@unrelenting.technology> (https://github.com/ebiggers/libdeflate/pull/45) and Andrew Steinborn <git@steinborn.me> (https://github.com/ebiggers/libdeflate/pull/76).	2020-10-10 23:03:50 -07:00
Eric Biggers	2eeaa9282e	lib/arm/cpu_features: recognize the crc32 feature If support for CRC32 instructions is detected, set ARM_CPU_FEATURE_CRC32. Also define COMPILER_SUPPORTS_CRC32_TARGET_INTRINSICS when appropriate, and update run_tests.sh to toggle the crc32 feature for testing.	2020-10-10 23:03:50 -07:00
Eric Biggers	7373bdc9ff	lib/arm/cpu_features: reorganize arm feature macros Reorganize up some confusing logic.	2020-10-10 23:03:50 -07:00
Eric Biggers	4c92394eaa	Support level 0, "no compression" Some users may require a valid DEFLATE, zlib, or gzip stream but know ahead of time that particular inputs are not compressible. zlib supports "level 0" for this use case. Support this in libdeflate too. Resolves https://github.com/ebiggers/libdeflate/issues/86	2020-10-10 22:31:15 -07:00
Eric Biggers	5729095d2d	lib/cpu_features: support disabling CPU features for testing Make test-only builds of libdeflate support an environmental variable LIBDEFLATE_DISABLE_CPU_FEATURES that contains a list of CPU features to disable like "avx512bw,avx2,sse2". This makes it possible to test all the variants of dynamically dispatched code without editing the source code. Note, this environmental variable is not a stable interface, so put the support for it behind a scary-looking option TEST_SUPPORT__DO_NOT_USE.	2020-10-05 00:35:19 -07:00
Eric Biggers	f23fd6ca7f	lib/x86/cpu_features: rename PCLMULQDQ feature bit to PCLMUL This is less unwieldy and is consistent with "DISPATCH_PCLMUL" and with the "-mno-pclmul" compiler flag.	2020-10-05 00:35:19 -07:00
Eric Biggers	82037908c7	lib/x86/cpu_features: add missing earlyclobber constraint for cpuid on i386 In cpuid() in the '__i386__ && __PIC__' case, the second output operand is written to before the input operands are used. So the second output operand needs the earlyclobber constraint.	2020-10-04 23:17:56 -07:00
Eric Biggers	303aecb074	lib/utils.c: improve header include order Don't assume that lib_common.h and libdeflate.h don't include <stdlib.h>. Currently this change doesn't matter unless someone uses -DFREESTANDING for a Windows build, which isn't supported anyway, but we might as well clean this up. Update https://github.com/ebiggers/libdeflate/pull/68	2020-10-04 09:42:05 -07:00
Eric Biggers	0f5238f0ad	lib: remove the "packed struct" approach to unaligned memory access gcc 10 is miscompiling libdeflate on x86_64 at -O3 due to a regression in how packed structs are handled (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994). Work around this by just always using memcpy() for unaligned accesses. It's unclear that the "packed struct" approach is worthwhile to maintain anymore. Currently I'm only aware that it's useful with old gcc's on arm32. Hopefully, compilers are good enough now that we can simply use memcpy() everywhere. Update https://github.com/ebiggers/libdeflate/issues/64	2020-05-08 23:03:58 -07:00
Eric Biggers	14be043724	lib: fix memcpy() performance with freestanding library builds With -ffreestanding, for memcpy() to be optimized properly when used for unaligned accesses, we need to use __builtin_memcpy().	2020-05-08 23:03:58 -07:00
Eric Biggers	3dfd93e365	lib, programs: include common_defs.h by relative path It's better to use a relative path, so that people not using the Makefile don't have to put -Icommon on their compiler command line.	2020-05-08 23:03:58 -07:00
Eric Biggers	d92c601bdc	lib: use "in_nbytes", not "in_size" For consistency, make the implementations of libdeflate_gzip_compress() and libdeflate_zlib_compress() use the same parameter name that their declarations and everywhere else use.	2020-04-17 23:04:36 -07:00
Eric Biggers	9bf2e9f270	lib/gzip_constants.h: fix misspelling	2020-04-17 22:58:25 -07:00
Eric Biggers	27d5a74f03	lib: add freestanding support Allow building libdeflate without linking to any libc functions by using 'make FREESTANDING=1'. When using such a library build, the user will need to call libdeflate_set_memory_allocator() before anything else, since malloc() and free() will be unavailable. [Folded in fix from Ingvar Stepanyan to use -nostdlib, and made freestanding_tests() check that no libs are linked to.] Update https://github.com/ebiggers/libdeflate/issues/62	2020-04-17 22:32:49 -07:00
Eric Biggers	0ded4c6f52	lib: add libdeflate_set_memory_allocator() Add an API function to install a custom memory allocator. Resolves https://github.com/ebiggers/libdeflate/issues/62	2020-04-17 21:28:49 -07:00
Eric Biggers	944500af9f	lib: wrap the memory allocation functions In preparation for adding custom memory allocator support, don't call the standard memory allocation functions directly but rather wrap them with libdeflate_malloc() and libdeflate_free().	2020-04-17 21:28:49 -07:00
Eric Biggers	66bd59c4be	lib: rename the aligned allocation functions In preparation for adding libdeflate_malloc() and libdeflate_free(), rename the aligned allocation functions to match.	2020-04-17 21:28:49 -07:00
Eric Biggers	64b4e8191e	lib: rename aligned_malloc.c to utils.c Prepare to use this file for more utility functions.	2020-04-17 21:28:49 -07:00
Eric Biggers	d1b6a825ab	lib: merge aligned_malloc.h into lib_common.h It's simpler to declare the library utility functions in lib_common.h rather than use a separate header.	2020-04-17 21:28:49 -07:00
Eric Biggers	a735fa830f	lib, programs: remove all unnecessary 'extern' keywords 'extern' on function declarations is redundant.	2020-04-17 21:27:56 -07:00
Izumi Raine	5ed95f48d4	Add function libdeflate_zlib_decompress_ex Additionally, libdeflate_zlib_decompress now returns successfully in case there are additional trailing bytes in the input buffer after the compressed stream.	2020-04-16 18:15:11 +02:00
Eric Biggers	3acda56db0	Declare __stdcall correctly for MSVC Unfortunately, MSVC only accepts __stdcall after the return type, while gcc only accepts __attribute__((visibility("default"))) before the return type. So we need a macro in each location. Also, MSVC doesn't define __i386__; that's gcc specific. So instead use '_WIN32 && !_WIN64' to detect 32-bit Windows.	2019-12-28 13:20:50 -06:00
Eric Biggers	2a2e24dc8b	lib: fix some typos in comments	2019-08-24 17:38:50 -07:00
Eric Biggers	5038748d61	lib/deflate_compress: fix return value for output >= 4 GiB The API returns the compressed size as a size_t, so deflate_flush_output() needs to return size_t as well, not u32. Otherwise sizes >= 4 GiB are truncated. This bug has been there since the beginning. (This only matters if you compress a buffer that large in a single go, which obviously is not a good idea, but no matter -- it's a bug.) Resolves https://github.com/ebiggers/libdeflate/issues/44	2019-05-21 21:13:59 -07:00
Eric Biggers	449b5adc16	lib/deflate_decompress: slight simplification in build_decode_table()	2019-01-14 21:37:48 -08:00
Eric Biggers	a64bd1e830	lib/deflate_decompress: optimize build_decode_table() via table doubling Another build_decode_table() optimization: rather than filling all the entries for each codeword using strided stores, just fill one initially and fill the rest by memcpy()s as the table is incrementally expanded. Also make some other cleanups and small optimizations.	2018-12-27 17:10:23 -06:00
Eric Biggers	bfc3f610e1	lib/deflate_decompress: build subtables separately Further improve build_decode_table() performance by splitting the "fill direct entries" and "fill subtable pointers and subtables" steps into separate loops and making some other optimizations.	2018-12-25 23:57:43 -06:00
Eric Biggers	515b7ad15c	lib/deflate_decompress: move len_counts[] and offsets[] to stack This improves performance, and these arrays are not very large.	2018-12-25 22:15:10 -06:00
Eric Biggers	1a3f34eab9	lib/deflate_decompress: optimize codeword incrementing	2018-12-25 21:29:13 -06:00
Eric Biggers	a25f3b86d7	lib/deflate_decompress: further optimize match copying	2018-12-25 18:14:32 -06:00
Eric Biggers	170c24190a	lib/deflate_decompress: further optimize refilling the bitbuffer	2018-12-25 14:16:38 -06:00
Eric Biggers	1c3609da7b	lib/deflate_decompress: store decode results pre-shifted This slightly speeds up decode table building, since now the decode results don't need to be shifted at runtime when building the tables.	2018-12-25 14:16:38 -06:00
Eric Biggers	eed4829c16	lib/deflate_decompress: fix a comment	2018-12-25 14:16:38 -06:00
Eric Biggers	73017f08e5	lib/x86/adler32: add an AVX-512BW optimized Adler32 implementation	2018-12-24 17:36:07 -06:00
Eric Biggers	4548033845	lib/x86/cpu_features: detect AVX-512BW support	2018-12-24 17:36:07 -06:00
Eric Biggers	57cab078f1	lib: optimize decompressing repeated static Huffman blocks Improve libdeflate's worst-case performance decompressing malicious DEFLATE streams by about 14x, bringing it within a factor of about 2x of zlib, by skipping rebuilding the decode tables for the static Huffman codes when they're already loaded into the decompressor. This improves performance decompressing a stream of all empty static Huffman blocks from about 0.36 MB/s to 175 MB/s, or the original reproducer given on the Github issue from about 3.3 MB/s to 219 MB/s. A regression test is added for these cases as well as the empty dynamic Huffman blocks case to verify worst-case performance comparable to zlib. Resolves https://github.com/ebiggers/libdeflate/issues/33	2018-12-23 12:03:00 -06:00
Eric Biggers	6eef15d6f3	lib/arm: fix PMULL detection on AArch64	2018-03-03 12:47:50 -08:00
Eric Biggers	fc2ea22b44	lib/arm: add ARM PMULL implementation of CRC-32 Add an ARM PMULL implementation of CRC-32. This is based on a patch by Jun He <jun.he@linaro.org> as well as the x86 PCLMUL implementation.	2018-02-18 23:03:26 -08:00
Eric Biggers	1fb34f86b5	lib: add template for vectorized CRC-32 implementations	2018-02-18 23:03:26 -08:00
Eric Biggers	fb5c6a8c85	lib/arm: allow choosing adler32_neon() at runtime Now that we detect CPU features on ARM, allow the NEON implementation of Adler-32 to be selected at runtime based on the presence of the NEON feature.	2018-02-18 23:03:26 -08:00
Eric Biggers	2575ede5ff	lib/arm: add ARM CPU feature detection (Linux only for now)	2018-02-18 23:03:26 -08:00
Eric Biggers	1617206086	lib/x86: allow choosing adler32_sse2() at runtime Now that we detect CPU features on 32-bit x86, allow the SSE2 implementation of Adler-32 to be selected at runtime based on the presence of the SSE2 feature.	2018-02-18 23:03:26 -08:00

1 2 3

144 Commits