144 Commits

Author SHA1 Message Date
Eric Biggers
eafe829b4d Remove "originally public domain" comments
These comments are unnecessary, and they might cause confusion since
they could be misunderstood as being part of the license.
2021-12-31 12:19:10 -06:00
cielavenir
9b565afd99 Fix ICC compilation
- crc32: On ICC, __v2di is defined in immintrin.h
- adler32.c: __v64qi etc are not available on ICC
2021-05-06 23:10:58 -07:00
Eric Biggers
83a1bbf1d3 lib: consistently use include guards
A lot of the internal library headers don't have include guards because
they aren't needed.  It might look like a bug, though, and it doesn't
hurt to add them.  So do this.

Update https://github.com/ebiggers/libdeflate/issues/117
2021-03-12 00:07:30 -08:00
Eric Biggers
4a0bb736c9 lib: make freestanding memset() et al. symbols "weak"
This allows these symbols to be overridden by another definition of
these symbols somewhere else in the binary.

Resolves https://github.com/ebiggers/libdeflate/issues/107
2020-11-23 18:47:15 -08:00
Eric Biggers
f8057e8805 lib/matchfinder: document matchfinder_rebase() preconditions 2020-10-28 19:20:30 -07:00
Eric Biggers
ff8634427b lib/matchfinder: simplify init and rebase
Remove the ability of matchfinder_init() and matchfinder_rebase() to
fail due to the matchfinder memory size being misaligned.  Instead,
require that the size always be 128-byte aligned -- which is already the
case.  Also, make the matchfinder memory always be 32-byte aligned --
which doesn't really have any downside.
2020-10-25 22:42:25 -07:00
Eric Biggers
166084acaa lib/deflate_compress: select min_size_to_compress based on level
The cutoff for outputting uncompressed data is currently < 16 bytes for
all compression levels.  That isn't ideal, since the higher the
compression level, the more we should bother with very small inputs; and
the lower the compression level, the less we should bother.

Use a formula that produces the following cutoffs:

        Level  Cutoff
        -----  ------
        0      56
        1      52
        2      48
        3      44
        4      40
        5      36
        6      32
        7      28
        8      24
        9      20
        10     16
        11     12
        12     8

Update https://github.com/ebiggers/libdeflate/issues/67
2020-10-18 18:37:51 -07:00
Eric Biggers
ef936b6521 lib/x86/adler32: use unsigned vector types
This is needed to avoid the following error when using
-fsanitize=undefined with gcc:

    lib/x86/adler32_impl.h:214:2: runtime error: signed integer overflow:
    1951294680 + 1956941400 cannot be represented in type 'int'

Note that this isn't seen when using -fsanitize=undefined with clang.

Old compilers don't have unsigned vector types, so work around that.
2020-10-18 15:14:15 -07:00
Eric Biggers
ea88fa822f lib/arm/crc32: add support for ARM CRC32 instructions
Add a CRC32 implementation that uses the ARM CRC32 instructions.

This is simpler and faster than the PMULL implementation.  On AWS
Graviton2, the performance improvement is about 70%.  On Hikey960, the
performance improvement is about 30% for the Cortex-A53 cores or about
5% for the Cortex-A73 cores.

Based on work by Greg V <greg@unrelenting.technology>
(https://github.com/ebiggers/libdeflate/pull/45)
and Andrew Steinborn <git@steinborn.me>
(https://github.com/ebiggers/libdeflate/pull/76).
2020-10-10 23:03:50 -07:00
Eric Biggers
2eeaa9282e lib/arm/cpu_features: recognize the crc32 feature
If support for CRC32 instructions is detected, set
ARM_CPU_FEATURE_CRC32.  Also define
COMPILER_SUPPORTS_CRC32_TARGET_INTRINSICS when appropriate, and update
run_tests.sh to toggle the crc32 feature for testing.
2020-10-10 23:03:50 -07:00
Eric Biggers
7373bdc9ff lib/arm/cpu_features: reorganize arm feature macros
Reorganize up some confusing logic.
2020-10-10 23:03:50 -07:00
Eric Biggers
4c92394eaa Support level 0, "no compression"
Some users may require a valid DEFLATE, zlib, or gzip stream but know
ahead of time that particular inputs are not compressible.  zlib
supports "level 0" for this use case.  Support this in libdeflate too.

Resolves https://github.com/ebiggers/libdeflate/issues/86
2020-10-10 22:31:15 -07:00
Eric Biggers
5729095d2d lib/cpu_features: support disabling CPU features for testing
Make test-only builds of libdeflate support an environmental variable
LIBDEFLATE_DISABLE_CPU_FEATURES that contains a list of CPU features to
disable like "avx512bw,avx2,sse2".

This makes it possible to test all the variants of dynamically
dispatched code without editing the source code.

Note, this environmental variable is not a stable interface, so put the
support for it behind a scary-looking option TEST_SUPPORT__DO_NOT_USE.
2020-10-05 00:35:19 -07:00
Eric Biggers
f23fd6ca7f lib/x86/cpu_features: rename PCLMULQDQ feature bit to PCLMUL
This is less unwieldy and is consistent with "DISPATCH_PCLMUL" and with
the "-mno-pclmul" compiler flag.
2020-10-05 00:35:19 -07:00
Eric Biggers
82037908c7 lib/x86/cpu_features: add missing earlyclobber constraint for cpuid on i386
In cpuid() in the '__i386__ && __PIC__' case, the second output operand
is written to before the input operands are used.  So the second output
operand needs the earlyclobber constraint.
2020-10-04 23:17:56 -07:00
Eric Biggers
303aecb074 lib/utils.c: improve header include order
Don't assume that lib_common.h and libdeflate.h don't include
<stdlib.h>.  Currently this change doesn't matter unless someone uses
-DFREESTANDING for a Windows build, which isn't supported anyway, but we
might as well clean this up.

Update https://github.com/ebiggers/libdeflate/pull/68
2020-10-04 09:42:05 -07:00
Eric Biggers
0f5238f0ad lib: remove the "packed struct" approach to unaligned memory access
gcc 10 is miscompiling libdeflate on x86_64 at -O3 due to a regression
in how packed structs are handled
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994).

Work around this by just always using memcpy() for unaligned accesses.
It's unclear that the "packed struct" approach is worthwhile to maintain
anymore.  Currently I'm only aware that it's useful with old gcc's on
arm32.  Hopefully, compilers are good enough now that we can simply use
memcpy() everywhere.

Update https://github.com/ebiggers/libdeflate/issues/64
2020-05-08 23:03:58 -07:00
Eric Biggers
14be043724 lib: fix memcpy() performance with freestanding library builds
With -ffreestanding, for memcpy() to be optimized properly when used for
unaligned accesses, we need to use __builtin_memcpy().
2020-05-08 23:03:58 -07:00
Eric Biggers
3dfd93e365 lib, programs: include common_defs.h by relative path
It's better to use a relative path, so that people not using the
Makefile don't have to put -Icommon on their compiler command line.
2020-05-08 23:03:58 -07:00
Eric Biggers
d92c601bdc lib: use "in_nbytes", not "in_size"
For consistency, make the implementations of libdeflate_gzip_compress()
and libdeflate_zlib_compress() use the same parameter name that their
declarations and everywhere else use.
2020-04-17 23:04:36 -07:00
Eric Biggers
9bf2e9f270 lib/gzip_constants.h: fix misspelling 2020-04-17 22:58:25 -07:00
Eric Biggers
27d5a74f03 lib: add freestanding support
Allow building libdeflate without linking to any libc functions by using
'make FREESTANDING=1'.  When using such a library build, the user will
need to call libdeflate_set_memory_allocator() before anything else,
since malloc() and free() will be unavailable.

[Folded in fix from Ingvar Stepanyan to use -nostdlib, and made
 freestanding_tests() check that no libs are linked to.]

Update https://github.com/ebiggers/libdeflate/issues/62
2020-04-17 22:32:49 -07:00
Eric Biggers
0ded4c6f52 lib: add libdeflate_set_memory_allocator()
Add an API function to install a custom memory allocator.

Resolves https://github.com/ebiggers/libdeflate/issues/62
2020-04-17 21:28:49 -07:00
Eric Biggers
944500af9f lib: wrap the memory allocation functions
In preparation for adding custom memory allocator support, don't call
the standard memory allocation functions directly but rather wrap them
with libdeflate_malloc() and libdeflate_free().
2020-04-17 21:28:49 -07:00
Eric Biggers
66bd59c4be lib: rename the aligned allocation functions
In preparation for adding libdeflate_malloc() and libdeflate_free(),
rename the aligned allocation functions to match.
2020-04-17 21:28:49 -07:00
Eric Biggers
64b4e8191e lib: rename aligned_malloc.c to utils.c
Prepare to use this file for more utility functions.
2020-04-17 21:28:49 -07:00
Eric Biggers
d1b6a825ab lib: merge aligned_malloc.h into lib_common.h
It's simpler to declare the library utility functions in lib_common.h
rather than use a separate header.
2020-04-17 21:28:49 -07:00
Eric Biggers
a735fa830f lib, programs: remove all unnecessary 'extern' keywords
'extern' on function declarations is redundant.
2020-04-17 21:27:56 -07:00
Izumi Raine
5ed95f48d4
Add function libdeflate_zlib_decompress_ex
Additionally, libdeflate_zlib_decompress now returns successfully in
case there are additional trailing bytes in the input buffer after the
compressed stream.
2020-04-16 18:15:11 +02:00
Eric Biggers
3acda56db0 Declare __stdcall correctly for MSVC
Unfortunately, MSVC only accepts __stdcall after the return type, while
gcc only accepts __attribute__((visibility("default"))) before the
return type.  So we need a macro in each location.

Also, MSVC doesn't define __i386__; that's gcc specific.  So instead use
'_WIN32 && !_WIN64' to detect 32-bit Windows.
2019-12-28 13:20:50 -06:00
Eric Biggers
2a2e24dc8b lib: fix some typos in comments 2019-08-24 17:38:50 -07:00
Eric Biggers
5038748d61 lib/deflate_compress: fix return value for output >= 4 GiB
The API returns the compressed size as a size_t, so
deflate_flush_output() needs to return size_t as well, not u32.
Otherwise sizes >= 4 GiB are truncated.

This bug has been there since the beginning.

(This only matters if you compress a buffer that large in a single go,
which obviously is not a good idea, but no matter -- it's a bug.)

Resolves https://github.com/ebiggers/libdeflate/issues/44
2019-05-21 21:13:59 -07:00
Eric Biggers
449b5adc16 lib/deflate_decompress: slight simplification in build_decode_table() 2019-01-14 21:37:48 -08:00
Eric Biggers
a64bd1e830 lib/deflate_decompress: optimize build_decode_table() via table doubling
Another build_decode_table() optimization: rather than filling all the
entries for each codeword using strided stores, just fill one initially
and fill the rest by memcpy()s as the table is incrementally expanded.

Also make some other cleanups and small optimizations.
2018-12-27 17:10:23 -06:00
Eric Biggers
bfc3f610e1 lib/deflate_decompress: build subtables separately
Further improve build_decode_table() performance by splitting the "fill
direct entries" and "fill subtable pointers and subtables" steps into
separate loops and making some other optimizations.
2018-12-25 23:57:43 -06:00
Eric Biggers
515b7ad15c lib/deflate_decompress: move len_counts[] and offsets[] to stack
This improves performance, and these arrays are not very large.
2018-12-25 22:15:10 -06:00
Eric Biggers
1a3f34eab9 lib/deflate_decompress: optimize codeword incrementing 2018-12-25 21:29:13 -06:00
Eric Biggers
a25f3b86d7 lib/deflate_decompress: further optimize match copying 2018-12-25 18:14:32 -06:00
Eric Biggers
170c24190a lib/deflate_decompress: further optimize refilling the bitbuffer 2018-12-25 14:16:38 -06:00
Eric Biggers
1c3609da7b lib/deflate_decompress: store decode results pre-shifted
This slightly speeds up decode table building, since now the decode
results don't need to be shifted at runtime when building the tables.
2018-12-25 14:16:38 -06:00
Eric Biggers
eed4829c16 lib/deflate_decompress: fix a comment 2018-12-25 14:16:38 -06:00
Eric Biggers
73017f08e5 lib/x86/adler32: add an AVX-512BW optimized Adler32 implementation 2018-12-24 17:36:07 -06:00
Eric Biggers
4548033845 lib/x86/cpu_features: detect AVX-512BW support 2018-12-24 17:36:07 -06:00
Eric Biggers
57cab078f1 lib: optimize decompressing repeated static Huffman blocks
Improve libdeflate's worst-case performance decompressing malicious
DEFLATE streams by about 14x, bringing it within a factor of about 2x of
zlib, by skipping rebuilding the decode tables for the static Huffman
codes when they're already loaded into the decompressor.

This improves performance decompressing a stream of all empty static
Huffman blocks from about 0.36 MB/s to 175 MB/s, or the original
reproducer given on the Github issue from about 3.3 MB/s to 219 MB/s.
A regression test is added for these cases as well as the empty dynamic
Huffman blocks case to verify worst-case performance comparable to zlib.

Resolves https://github.com/ebiggers/libdeflate/issues/33
2018-12-23 12:03:00 -06:00
Eric Biggers
6eef15d6f3 lib/arm: fix PMULL detection on AArch64 2018-03-03 12:47:50 -08:00
Eric Biggers
fc2ea22b44 lib/arm: add ARM PMULL implementation of CRC-32
Add an ARM PMULL implementation of CRC-32.  This is based on a patch by
Jun He <jun.he@linaro.org> as well as the x86 PCLMUL implementation.
2018-02-18 23:03:26 -08:00
Eric Biggers
1fb34f86b5 lib: add template for vectorized CRC-32 implementations 2018-02-18 23:03:26 -08:00
Eric Biggers
fb5c6a8c85 lib/arm: allow choosing adler32_neon() at runtime
Now that we detect CPU features on ARM, allow the NEON implementation of
Adler-32 to be selected at runtime based on the presence of the NEON
feature.
2018-02-18 23:03:26 -08:00
Eric Biggers
2575ede5ff lib/arm: add ARM CPU feature detection (Linux only for now) 2018-02-18 23:03:26 -08:00
Eric Biggers
1617206086 lib/x86: allow choosing adler32_sse2() at runtime
Now that we detect CPU features on 32-bit x86, allow the SSE2
implementation of Adler-32 to be selected at runtime based on the
presence of the SSE2 feature.
2018-02-18 23:03:26 -08:00