19 Commits

Author SHA1 Message Date
Ingvar Stepanyan
a07ed5824a Assume fast unaligned access on WebAssembly
I saw this tweet claiming this flag makes libdeflate run 20% faster on
WebAssembly: https://twitter.com/Algunenano/status/1317098341377900550.

Indeed, when tried even in a complex PNG compression benchmark I've
observed 10-15% improvement when this flag is enabled.

Even though WebAssembly might be running on top of a variety of
underlying platforms, the spec requires it to support unaligned access,
and on majority of platforms it will translate to a faster code.

Hence, I think it makes sense to enable this flag by default.
2021-01-19 10:31:25 -08:00
Eric Biggers
ef936b6521 lib/x86/adler32: use unsigned vector types
This is needed to avoid the following error when using
-fsanitize=undefined with gcc:

    lib/x86/adler32_impl.h:214:2: runtime error: signed integer overflow:
    1951294680 + 1956941400 cannot be represented in type 'int'

Note that this isn't seen when using -fsanitize=undefined with clang.

Old compilers don't have unsigned vector types, so work around that.
2020-10-18 15:14:15 -07:00
Eric Biggers
2eeaa9282e lib/arm/cpu_features: recognize the crc32 feature
If support for CRC32 instructions is detected, set
ARM_CPU_FEATURE_CRC32.  Also define
COMPILER_SUPPORTS_CRC32_TARGET_INTRINSICS when appropriate, and update
run_tests.sh to toggle the crc32 feature for testing.
2020-10-10 23:03:50 -07:00
Eric Biggers
7373bdc9ff lib/arm/cpu_features: reorganize arm feature macros
Reorganize up some confusing logic.
2020-10-10 23:03:50 -07:00
Eric Biggers
0f5238f0ad lib: remove the "packed struct" approach to unaligned memory access
gcc 10 is miscompiling libdeflate on x86_64 at -O3 due to a regression
in how packed structs are handled
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994).

Work around this by just always using memcpy() for unaligned accesses.
It's unclear that the "packed struct" approach is worthwhile to maintain
anymore.  Currently I'm only aware that it's useful with old gcc's on
arm32.  Hopefully, compilers are good enough now that we can simply use
memcpy() everywhere.

Update https://github.com/ebiggers/libdeflate/issues/64
2020-05-08 23:03:58 -07:00
Eric Biggers
5c80decb26 common/x86: detect AVX-512BW intrinsics support 2018-12-24 17:36:07 -06:00
Eric Biggers
becd91bb63 lib/arm: NEON intrinsics require hardware floating point support
NEON intrinsics cannot be used when compiling for an ARM CPU without
hardware floating point support, e.g. the Debian armel port.  In this
case arm_neon.h cannot even be included as it causes an #error.

[Based on a patch by Adrian Bunk <bunk@debian.org>, but changed to check
 for __ARM_FP instead of !__SOFTFP__ to be consistent with arm_neon.h,
 and added a comment.]
2018-12-22 00:00:04 -06:00
Anton Blanchard
9205845a16 Set UNALIGNED_ACCESS_IS_FAST on powerpc64
All 64bit PowerPC CPUs handle unaligned accesses reasonably fast, so
set UNALIGNED_ACCESS_IS_FAST.

Decompression of the snappy html test case is almost 50% faster on
POWER9 with this patch applied.
2018-05-13 07:48:40 +10:00
Eric Biggers
8d58d51160 common: detect ARM NEON and PMULL target intrinsics 2018-02-18 23:03:26 -08:00
Eric Biggers
1617206086 lib/x86: allow choosing adler32_sse2() at runtime
Now that we detect CPU features on 32-bit x86, allow the SSE2
implementation of Adler-32 to be selected at runtime based on the
presence of the SSE2 feature.
2018-02-18 23:03:26 -08:00
Eric Biggers
f76dcd5ee1 common: replace COMPILER_SUPPORTS_TARGET_INTRINSICS
Replace COMPILER_SUPPORTS_TARGET_INTRINSICS with macros for the
individual features, since COMPILER_SUPPORTS_TARGET_INTRINSICS was
x86-specific and would cause confusion when we try to use intrinsics in
'target' functions for other architectures.
2018-02-18 23:03:26 -08:00
Eric Biggers
e79444be27 Fix compilation with icc 2016-11-07 19:45:37 -08:00
Eric Biggers
3a3d2da7c2 Fix compilation with clang 3.7 2016-11-04 21:24:44 -07:00
Eric Biggers
2ea8ddae66 Don't use 'defined' in macro expansion
With clang 3.9:
	warning: macro expansion producing 'defined' has undefined
		 behavior [-Wexpansion-to-defined]

Just eliminate the tests for clang and icc; they shouldn't be necessary.
2016-10-30 12:40:47 -07:00
Eric Biggers
62cc3d71b4 Add PCLMUL/AVX-optimized CRC32 2016-10-27 20:33:35 -07:00
Eric Biggers
0c043dd602 Add PCLMUL-accelerated CRC-32 2016-10-15 11:01:18 -07:00
Eric Biggers
9c5c05201a AVX2 Adler-32 actually requires gcc 4.9+ 2016-09-09 21:53:08 -07:00
Eric Biggers
81e45b86e2 Add SSE2 and AVX2 accelerated Adler-32 2016-08-31 23:53:25 -07:00
Eric Biggers
f2c3a5b4e9 Various reorganization and cleanups
* Bring in common headers and program code from xpack project
* Move program code to programs/
* Move library code to lib/
* GNU89 and MSVC2010 compatibility
* Other changes
2016-05-21 15:38:15 -05:00