All 64bit PowerPC CPUs handle unaligned accesses reasonably fast, so
set UNALIGNED_ACCESS_IS_FAST.
Decompression of the snappy html test case is almost 50% faster on
POWER9 with this patch applied.
Make it compatible with the new code organization, make it run the
test_checksums program for each implementation, and run each
implementation in both 64-bit and 32-bit modes.
Now that we detect CPU features on 32-bit x86, allow the SSE2
implementation of Adler-32 to be selected at runtime based on the
presence of the SSE2 feature.
Use 'volatile' for the CPU feature masks and dispatched function
pointers. We don't need memory barriers for them, so 'volatile' is good
enough to stop the compiler from inserting bogus reads/writes.
Move the x86 and ARM-specific code into their own directories to prevent
it from cluttering up the main library. This will make it a bit easier
to add new architecture-specific code.
But to avoid complicating things too much for people who aren't using
the provided Makefile, we still just compile all .c files for all
architectures (irrelevant ones end up #ifdef'ed out), and the headers
are included explicitly for each architecture so that an
architecture-specific include path isn't needed. So, now people just
need to compile both lib/*.c and lib/*/*.c instead of only lib/*.c.
Replace COMPILER_SUPPORTS_TARGET_INTRINSICS with macros for the
individual features, since COMPILER_SUPPORTS_TARGET_INTRINSICS was
x86-specific and would cause confusion when we try to use intrinsics in
'target' functions for other architectures.
The _DEFAULT_SOURCE feature test macro is only supported by glibc 2.19
and later. As a result, various things were not being defined when
building with an older glibc version, causing compile errors. Instead,
_POSIX_C_SOURCE=200809L should expose everything we need.
When the block splitting algorithm was implemented, it became possible
for the compressor to use longer blocks, up to ~300KB. Unfortunately it
was overlooked that this can allow literal runs > 65535 bytes, while in
one place the length of a literal run was still being stored in a u16.
To overflow the litrunlen and hit the bug the data would have had to
have far fewer matches than random data, which is possible but very
unusual. Fix the bug by reserving more space to hold the litrunlen, and
add a test for it.
With clang 3.9:
warning: macro expansion producing 'defined' has undefined
behavior [-Wexpansion-to-defined]
Just eliminate the tests for clang and icc; they shouldn't be necessary.