mirror of
https://github.com/cuberite/libdeflate.git
synced 2025-09-12 13:58:35 -04:00

Currently the optimized implementations of matchfinder_init() and matchfinder_rebase() are chosen via static dispatch. That means that the AVX-2 implementations usually aren't used. Fix this by using dynamic dispatch, like what libdeflate does for the Adler-32 and CRC-32 checksums and for DEFLATE decompression. Based on work by Andrew Steinborn <git@steinborn.me> (https://github.com/ebiggers/libdeflate/pull/77). He wrote: "The main impact is on x86: the AVX2 matchfinder can now be properly dynamically dispatched at runtime and if -mavx2 is included in CFLAGS (or -march set to any platform with AVX2 support). On my Ryzen 9 3900X, I got an approximately 1% boost in deflate time (measured with a uncompressed tarball of the Silesia corpus) using just the changes in this PR and the regular CFLAGS, and a 2.7% boost when specifying -mavx2 as CFLAGS. (I also tested with an Intel Xeon Skylake c5.large EC2 instance, and did not see any performance regression)."