A lot of the internal library headers don't have include guards because
they aren't needed. It might look like a bug, though, and it doesn't
hurt to add them. So do this.
Update https://github.com/ebiggers/libdeflate/issues/117
Remove the ability of matchfinder_init() and matchfinder_rebase() to
fail due to the matchfinder memory size being misaligned. Instead,
require that the size always be 128-byte aligned -- which is already the
case. Also, make the matchfinder memory always be 32-byte aligned --
which doesn't really have any downside.
This is needed to avoid the following error when using
-fsanitize=undefined with gcc:
lib/x86/adler32_impl.h:214:2: runtime error: signed integer overflow:
1951294680 + 1956941400 cannot be represented in type 'int'
Note that this isn't seen when using -fsanitize=undefined with clang.
Old compilers don't have unsigned vector types, so work around that.
Make test-only builds of libdeflate support an environmental variable
LIBDEFLATE_DISABLE_CPU_FEATURES that contains a list of CPU features to
disable like "avx512bw,avx2,sse2".
This makes it possible to test all the variants of dynamically
dispatched code without editing the source code.
Note, this environmental variable is not a stable interface, so put the
support for it behind a scary-looking option TEST_SUPPORT__DO_NOT_USE.
In cpuid() in the '__i386__ && __PIC__' case, the second output operand
is written to before the input operands are used. So the second output
operand needs the earlyclobber constraint.
Now that we detect CPU features on 32-bit x86, allow the SSE2
implementation of Adler-32 to be selected at runtime based on the
presence of the SSE2 feature.
Use 'volatile' for the CPU feature masks and dispatched function
pointers. We don't need memory barriers for them, so 'volatile' is good
enough to stop the compiler from inserting bogus reads/writes.
Move the x86 and ARM-specific code into their own directories to prevent
it from cluttering up the main library. This will make it a bit easier
to add new architecture-specific code.
But to avoid complicating things too much for people who aren't using
the provided Makefile, we still just compile all .c files for all
architectures (irrelevant ones end up #ifdef'ed out), and the headers
are included explicitly for each architecture so that an
architecture-specific include path isn't needed. So, now people just
need to compile both lib/*.c and lib/*/*.c instead of only lib/*.c.