I saw this tweet claiming this flag makes libdeflate run 20% faster on
WebAssembly: https://twitter.com/Algunenano/status/1317098341377900550.
Indeed, when tried even in a complex PNG compression benchmark I've
observed 10-15% improvement when this flag is enabled.
Even though WebAssembly might be running on top of a variety of
underlying platforms, the spec requires it to support unaligned access,
and on majority of platforms it will translate to a faster code.
Hence, I think it makes sense to enable this flag by default.
Include sys/types.h to avoid the following build failure on uclibc:
In file included from programs/gzip.c:28:0:
programs/prog_util.h:159:1: error: unknown type name ‘ssize_t’
ssize_t xread(struct file_stream *strm, void *buf, size_t count);
^
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Remove the ability of matchfinder_init() and matchfinder_rebase() to
fail due to the matchfinder memory size being misaligned. Instead,
require that the size always be 128-byte aligned -- which is already the
case. Also, make the matchfinder memory always be 32-byte aligned --
which doesn't really have any downside.
The Makefile didn't trigger a rebuild if some settings changed, e.g.
LDFLAGS or DECOMPRESSION_ONLY. Fix this.
Also simplify the rebuild logic by not handling the library and programs
separately, as this optimization doesn't seem to be worthwhile.
Avoid confusion with the GNU extension 'program_invocation_name', which
is described by 'man 3 program_invocation_name'. The GNU version isn't
supposed to be exposed without defining _GNU_SOURCE, which we don't in
any of the relevant files, but it's best to avoid any confusion.
The cutoff for outputting uncompressed data is currently < 16 bytes for
all compression levels. That isn't ideal, since the higher the
compression level, the more we should bother with very small inputs; and
the lower the compression level, the less we should bother.
Use a formula that produces the following cutoffs:
Level Cutoff
----- ------
0 56
1 52
2 48
3 44
4 40
5 36
6 32
7 28
8 24
9 20
10 16
11 12
12 8
Update https://github.com/ebiggers/libdeflate/issues/67
Improve the list of Travis CI jobs by using the build matrix feature to
test all combinations of compilers and architectures on the latest
version of Ubuntu, and by adding more jobs for older versions of Ubuntu.
Don't try to detect and use different compilers, since it's better to
specify this via the environment (e.g. via the Travis CI build matrix).
While doing this, also deduplicate the logic for testing with valgrind
and UBSAN, improve the log messages, and add a test with -O3.
Now that run_tests.sh has been cleaned up to remove (or move) test
groups that weren't very useful, remove the concept of test groups and
just run all the tests.
The reason that run_tests.sh supported running checksum_benchmarks.sh is
that as a side effect, checksum_benchmarks.sh runs the 'test_checksums'
program with all combinations of CPU features.
However, commit ec60cb48d11c ("tools/run_tests.sh: test different
combinations of CPU features") made run_tests.sh handle this elsewhere.
So having run_tests.sh run checksum_benchmarks.sh is no longer useful.
Keep checksum_benchmarks.sh around for manual benchmark runs, however.
android_tests is only useful for local testing, and it wasn't being run
in Travis CI. Move it into a separate script to avoid complicating
run_tests.sh.
This was only useful for me to do local testing, I don't have the needed
MIPS router available anymore, and its main purpose was test a big
endian system but that's now covered by testing s390x with Travis CI.
This script only worked for me to do local testing and wasn't otherwise
used. In particular, the Windows build tests in Travis CI don't use
this script, nor does the make-windows-releases script use it.