diff --git a/README.md b/README.md index 3a351420..dfd9aeae 100644 --- a/README.md +++ b/README.md @@ -163,53 +163,53 @@ A good starting point for apt-based systems is probably: You can pick either `clang` or `g++`, but at least recent `clang` versions will produce substantially faster code: - $ hyperfine ./dwarfs_test-* - Benchmark #1: ./dwarfs_test-clang-O2 - Time (mean ± σ): 9.425 s ± 0.049 s [User: 15.724 s, System: 0.773 s] - Range (min … max): 9.373 s … 9.523 s 10 runs + $ hyperfine -L prog $(echo build-*/mkdwarfs | tr ' ' ,) '{prog} --no-progress --log-level warn -i /usr/include -o /dev/null -C null' + Benchmark #1: build-clang-10/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null + Time (mean ± σ): 6.403 s ± 0.178 s [User: 12.039 s, System: 1.963 s] + Range (min … max): 6.250 s … 6.701 s 10 runs - Benchmark #2: ./dwarfs_test-clang-O3 - Time (mean ± σ): 9.328 s ± 0.045 s [User: 15.593 s, System: 0.791 s] - Range (min … max): 9.277 s … 9.418 s 10 runs + Benchmark #2: build-clang-11/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null + Time (mean ± σ): 6.408 s ± 0.143 s [User: 12.109 s, System: 1.974 s] + Range (min … max): 6.231 s … 6.617 s 10 runs - Benchmark #3: ./dwarfs_test-gcc-O2 - Time (mean ± σ): 13.798 s ± 0.035 s [User: 20.161 s, System: 0.767 s] - Range (min … max): 13.731 s … 13.852 s 10 runs + Benchmark #3: build-gcc-10/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null + Time (mean ± σ): 11.484 s ± 0.245 s [User: 18.487 s, System: 2.071 s] + Range (min … max): 11.119 s … 11.779 s 10 runs - Benchmark #4: ./dwarfs_test-gcc-O3 - Time (mean ± σ): 13.223 s ± 0.034 s [User: 19.576 s, System: 0.769 s] - Range (min … max): 13.176 s … 13.278 s 10 runs + Benchmark #4: build-gcc-9/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null + Time (mean ± σ): 11.443 s ± 0.242 s [User: 18.419 s, System: 2.067 s] + Range (min … max): 11.177 s … 11.742 s 10 runs Summary - './dwarfs_test-clang-O3' ran - 1.01 ± 0.01 times faster than './dwarfs_test-clang-O2' - 1.42 ± 0.01 times faster than './dwarfs_test-gcc-O3' - 1.48 ± 0.01 times faster than './dwarfs_test-gcc-O2' + 'build-clang-10/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null' ran + 1.00 ± 0.04 times faster than 'build-clang-11/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null' + 1.79 ± 0.06 times faster than 'build-gcc-9/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null' + 1.79 ± 0.06 times faster than 'build-gcc-10/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null' - $ hyperfine -L prog $(echo mkdwarfs-* | tr ' ' ,) '{prog} --no-progress --log-level warn -i tree -o /dev/null -C null' - Benchmark #1: mkdwarfs-clang-O2 --no-progress --log-level warn -i tree -o /dev/null -C null - Time (mean ± σ): 4.358 s ± 0.033 s [User: 6.364 s, System: 0.622 s] - Range (min … max): 4.321 s … 4.408 s 10 runs + $ hyperfine build-*/dwarfs_test + Benchmark #1: build-clang-10/dwarfs_test + Time (mean ± σ): 1.789 s ± 0.008 s [User: 2.049 s, System: 0.636 s] + Range (min … max): 1.775 s … 1.808 s 10 runs - Benchmark #2: mkdwarfs-clang-O3 --no-progress --log-level warn -i tree -o /dev/null -C null - Time (mean ± σ): 4.282 s ± 0.035 s [User: 6.249 s, System: 0.623 s] - Range (min … max): 4.244 s … 4.349 s 10 runs + Benchmark #2: build-clang-11/dwarfs_test + Time (mean ± σ): 1.806 s ± 0.011 s [User: 2.053 s, System: 0.660 s] + Range (min … max): 1.791 s … 1.820 s 10 runs - Benchmark #3: mkdwarfs-gcc-O2 --no-progress --log-level warn -i tree -o /dev/null -C null - Time (mean ± σ): 6.212 s ± 0.031 s [User: 8.185 s, System: 0.638 s] - Range (min … max): 6.159 s … 6.250 s 10 runs + Benchmark #3: build-gcc-10/dwarfs_test + Time (mean ± σ): 2.027 s ± 0.004 s [User: 2.270 s, System: 0.797 s] + Range (min … max): 2.023 s … 2.032 s 10 runs - Benchmark #4: mkdwarfs-gcc-O3 --no-progress --log-level warn -i tree -o /dev/null -C null - Time (mean ± σ): 5.740 s ± 0.037 s [User: 7.742 s, System: 0.645 s] - Range (min … max): 5.685 s … 5.796 s 10 runs + Benchmark #4: build-gcc-9/dwarfs_test + Time (mean ± σ): 2.033 s ± 0.005 s [User: 2.278 s, System: 0.796 s] + Range (min … max): 2.027 s … 2.040 s 10 runs Summary - 'mkdwarfs-clang-O3 --no-progress --log-level warn -i tree -o /dev/null -C null' ran - 1.02 ± 0.01 times faster than 'mkdwarfs-clang-O2 --no-progress --log-level warn -i tree -o /dev/null -C null' - 1.34 ± 0.01 times faster than 'mkdwarfs-gcc-O3 --no-progress --log-level warn -i tree -o /dev/null -C null' - 1.45 ± 0.01 times faster than 'mkdwarfs-gcc-O2 --no-progress --log-level warn -i tree -o /dev/null -C null' + 'build-clang-10/dwarfs_test' ran + 1.01 ± 0.01 times faster than 'build-clang-11/dwarfs_test' + 1.13 ± 0.01 times faster than 'build-gcc-10/dwarfs_test' + 1.14 ± 0.01 times faster than 'build-gcc-9/dwarfs_test' -These measurements were made with gcc-9.3.0 and clang-10.0.1. +These measurements were made with gcc-9.3.0, gcc-10.2.0, clang-10.0.1 and clang-11.0.1. ### Building @@ -274,9 +274,10 @@ what can currently be done with the interface. ## Usage -Please check out the man pages for [mkdwarfs](doc/mkdwarfs.md) -and [dwarfs](doc/dwarfs.md). `dwarfsck` will be built and installed -as well, but it's still work in progress. +Please check out the man pages for [mkdwarfs](doc/mkdwarfs.md), +[dwarfs](doc/dwarfs.md) and [dwarfsextract](doc/dwarfsextract.md). +`dwarfsck` will be built and installed as well, but it's still +work in progress. The [dwarfs](doc/dwarfs.md) man page also shows an example for setting up DwarFS with [overlayfs](https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt) @@ -285,11 +286,16 @@ DwarFS image. ## Comparison -### With SquashFS +The SquashFS and `xz` tests were all done on an 8 core Intel(R) Xeon(R) +E-2286M CPU @ 2.40GHz with 64 GiB of RAM. -These tests were done on an Intel(R) Xeon(R) CPU D-1528 @ 1.90GHz -6 core CPU with 64 GiB of RAM. The system was mostly idle during -all of the tests. +The wimlib, Cromfs and EROFS tests were done with an older version of +DwarFS on a 6 core Intel(R) Xeon(R) CPU D-1528 @ 1.90GHz with 64 GiB +of RAM. + +The systems were mostly idle during all of the tests. + +### With SquashFS The source directory contained **1139 different Perl installations** from 284 distinct releases, a total of 47.65 GiB of data in 1,927,501 @@ -301,9 +307,9 @@ I'm using the same compression type and compression level for SquashFS that is the default setting for DwarFS: $ time mksquashfs install perl-install.squashfs -comp zstd -Xcompression-level 22 - Parallel mksquashfs: Using 12 processors - Creating 4.0 filesystem on perl-install.squashfs, block size 131072. - [=====================================================================-] 2107401/2107401 100% + Parallel mksquashfs: Using 16 processors + Creating 4.0 filesystem on perl-install-zstd.squashfs, block size 131072. + [=========================================================/] 2107401/2107401 100% Exportable Squashfs 4.0 filesystem, zstd compressed, data block size 131072 compressed data, compressed metadata, compressed fragments, @@ -330,84 +336,121 @@ SquashFS that is the default setting for DwarFS: Number of gids 1 users (100) - real 69m18.427s - user 817m15.199s - sys 1m38.237s + real 32m54.713s + user 501m46.382s + sys 0m58.528s For DwarFS, I'm sticking to the defaults: $ time mkdwarfs -i install -o perl-install.dwarfs - I 23:54:10.119098 scanning install - I 23:54:32.317096 waiting for background scanners... - I 23:55:04.131755 assigning directory and link inodes... - I 23:55:04.457824 finding duplicate files... - I 23:55:22.041760 saved 28.2 GiB / 47.65 GiB in 1782826/1927501 duplicate files - I 23:55:22.041862 waiting for inode scanners... - I 23:55:55.365591 assigning device inodes... - I 23:55:55.423564 assigning pipe/socket inodes... - I 23:55:55.479380 building metadata... - I 23:55:55.479468 building blocks... - I 23:55:55.479538 saving names and links... - I 23:55:55.479615 ordering 144675 inodes using nilsimsa similarity... - I 23:55:55.488995 nilsimsa: depth=20000 (1000), limit=255 - I 23:55:55.928990 updating name and link indices... - I 23:55:56.186375 pre-sorted index (659296 name, 366624 path lookups) [697.3ms] - I 00:02:11.239104 144675 inodes ordered [375.8s] - I 00:02:11.239224 waiting for segmenting/blockifying to finish... - I 00:06:45.599953 saving chunks... - I 00:06:45.639297 saving directories... - I 00:06:49.937160 waiting for compression to finish... - I 00:08:07.873055 compressed 47.65 GiB to 471.6 MiB (ratio=0.0096655) - I 00:08:08.512030 filesystem created without errors [838.4s] - ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ + I 20:22:43.450128 scanning install + I 20:22:49.254365 waiting for background scanners... + I 20:23:01.612414 assigning directory and link inodes... + I 20:23:01.821212 finding duplicate files... + I 20:23:11.331823 saved 28.2 GiB / 47.65 GiB in 1782826/1927501 duplicate files + I 20:23:11.332706 waiting for inode scanners... + I 20:23:23.426440 assigning device inodes... + I 20:23:23.462760 assigning pipe/socket inodes... + I 20:23:23.497944 building metadata... + I 20:23:23.497991 building blocks... + I 20:23:23.498060 saving names and links... + I 20:23:23.498127 ordering 144675 inodes using nilsimsa similarity... + I 20:23:23.502987 nilsimsa: depth=20000 (1000), limit=255 + I 20:23:23.688388 updating name and link indices... + I 20:23:23.851168 pre-sorted index (660176 name, 366179 path lookups) [348.1ms] + I 20:25:50.432870 144675 inodes ordered [146.9s] + I 20:25:50.433477 waiting for segmenting/blockifying to finish... + I 20:27:06.432510 segmentation matches: good=459247, bad=6685, total=469664 + I 20:27:06.432582 segmentation collisions: L1=0.007%, L2=0.001% [2233350 hashes] + I 20:27:06.432616 saving chunks... + I 20:27:06.457463 saving directories... + I 20:27:09.324708 waiting for compression to finish... + I 20:28:18.778353 compressed 47.65 GiB to 426.5 MiB (ratio=0.00874147) + I 20:28:19.044470 filesystem created without errors [335.6s] + ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ waiting for block compression to finish 330733 dirs, 0/2440 soft/hard links, 1927501/1927501 files, 0 other - original size: 47.65 GiB, dedupe: 28.2 GiB (1782826 files), segment: 12.6 GiB - filesystem: 6.846 GiB in 439 blocks (471482 chunks, 144675/144675 inodes) - compressed filesystem: 439 blocks/471.6 MiB written [depth: 20000] - ████████████████████████████████████████████████████████████████████████▏100% \ + original size: 47.65 GiB, dedupe: 28.2 GiB (1782826 files), segment: 15.19 GiB + filesystem: 4.261 GiB in 273 blocks (318041 chunks, 144675/144675 inodes) + compressed filesystem: 273 blocks/426.5 MiB written [depth: 20000] + █████████████████████████████████████████████████████████████████████████▏100% \ - real 13m58.556s - user 135m40.567s - sys 2m59.853s + real 5m35.631s + user 80m45.564s + sys 1m54.045s -So in this comparison, `mkdwarfs` is almost 5 times faster than `mksquashfs`. -In total CPU time, it actually uses 6 times less CPU resources. +So in this comparison, `mkdwarfs` is almost 6 times faster than `mksquashfs`. +In total CPU time, it actually uses more than 6 times less CPU resources. - $ ls -l perl-install.*fs - -rw-r--r-- 1 mhx users 494505722 Dec 30 00:22 perl-install.dwarfs - -rw-r--r-- 1 mhx users 4748902400 Nov 25 00:37 perl-install.squashfs + $ ll perl-install.*fs + -rw-r--r-- 1 mhx users 447230618 Mar 3 20:28 perl-install.dwarfs + -rw-r--r-- 1 mhx users 4748902400 Mar 3 20:10 perl-install.squashfs -In terms of compression ratio, the **DwarFS file system is almost 10 times +In terms of compression ratio, the **DwarFS file system is more than 10 times smaller than the SquashFS file system**. With DwarFS, the content has been -**compressed down to less than 1% (!) of its original size**. This compression +**compressed down to less than 0.9% (!) of its original size**. This compression ratio only considers the data stored in the individual files, not the actual -disk space used. On the original EXT4 file system, according to `du`, the -source folder uses 54 GiB, so **the DwarFS image actually only uses 0.85% of +disk space used. On the original XFS file system, according to `du`, the +source folder uses 52 GiB, so **the DwarFS image actually only uses 0.8% of the original space**. -When using identical block sizes for both file systems, the difference, -quite expectedly, becomes a lot less dramatic: +Here's another comparison using `lzma` compression instead of `zstd`: - $ time mksquashfs install perl-install-1M.squashfs -comp zstd -Xcompression-level 22 -b 1M + $ time mksquashfs install perl-install-lzma.squashfs -comp lzma - real 41m55.004s - user 340m30.012s - sys 1m47.945s + real 13m42.825s + user 205m40.851s + sys 3m29.088s - $ time mkdwarfs -i install -o perl-install-1M.dwarfs -S 20 + $ time mkdwarfs -i install -o perl-install-lzma.dwarfs -l9 - real 24m38.027s - user 282m37.305s - sys 2m37.558s + real 3m51.163s + user 50m30.446s + sys 1m46.203s - $ ls -l perl-install-1M.* - -rw-r--r-- 1 mhx users 2953052798 Dec 10 18:47 perl-install-1M.dwarfs - -rw-r--r-- 1 mhx users 4198944768 Nov 30 10:05 perl-install-1M.squashfs + $ ll perl-install-lzma.*fs + -rw-r--r-- 1 mhx users 315482627 Mar 3 21:23 perl-install-lzma.dwarfs + -rw-r--r-- 1 mhx users 3838406656 Mar 3 20:50 perl-install-lzma.squashfs + +It's immediately obvious that the runs are significantly faster and the +resulting images are significantly smaller. Still, `mkdwarfs` is about +**4 times faster** and produces and image that's **12 times smaller** than +the SquashFS image. The DwarFS image is only 0.6% of the original file size. + +So why not use `lzma` instead of `zstd` by default? The reason is that `lzma` +is about an order of magnitude slower to decompress than `zstd`. If you're +only accessing data on your compressed filesystem occasionally, this might +not be a big deal, but if you use it extensively, `zstd` will result in +better performance. + +The comparisons above are not completely fair. `mksquashfs` by default +uses a block size of 128KiB, whereas `mkdwarfs` uses 16MiB blocks by default, +or even 64MiB blocks with `-l9`. When using identical block sizes for both +file systems, the difference, quite expectedly, becomes a lot less dramatic: + + $ time mksquashfs install perl-install-lzma-1M.squashfs -comp lzma -b 1M + + real 15m43.319s + user 139m24.533s + sys 0m45.132s + + $ time mkdwarfs -i install -o perl-install-lzma-1M.dwarfs -l9 -S20 -B3 + + real 4m52.694s + user 49m20.611s + sys 6m46.285s + + $ ll perl-install*.*fs + -rw-r--r-- 1 mhx users 947407018 Mar 3 22:23 perl-install-lzma-1M.dwarfs + -rw-r--r-- 1 mhx users 3407474688 Mar 3 21:54 perl-install-lzma-1M.squashfs + +Even this is *still* not entirely fair, as it uses a feature (`-B3`) that allows +DwarFS to reference file chunks from up to two previous filesystem blocks. But the point is that this is really where SquashFS tops out, as it doesn't -support larger block sizes. And as you'll see below, the larger blocks that -DwarFS is using don't necessarily negatively impact performance. +support larger block sizes or back-referencing. And as you'll see below, the +larger blocks that DwarFS is using by default don't necessarily negatively +impact performance. DwarFS also features an option to recompress an existing file system with a different compression algorithm. This can be useful as it allows relatively @@ -415,23 +458,26 @@ fast experimentation with different algorithms and options without requiring a full rebuild of the file system. For example, recompressing the above file system with the best possible compression (`-l 9`): - $ time mkdwarfs --recompress -i perl-install.dwarfs -o perl-lzma.dwarfs -l 9 - 20:44:43.738823 filesystem rewritten [385.4s] - ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ - filesystem: 6.832 GiB in 438 blocks (0 chunks, 0 inodes) - compressed filesystem: 438/438 blocks/408.9 MiB written - █████████████████████████████████████████████████████████████████████▏100% | + $ time mkdwarfs --recompress -i perl-install.dwarfs -o perl-lzma-re.dwarfs -l9 + I 20:28:03.246534 filesystem rewrittenwithout errors [148.3s] + ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ + filesystem: 4.261 GiB in 273 blocks (0 chunks, 0 inodes) + compressed filesystem: 273/273 blocks/372.7 MiB written + ████████████████████████████████████████████████████████████████████▏100% \ - real 6m25.474s - user 73m0.298s - sys 1m37.701s + real 2m28.279s + user 37m8.825s + sys 0m43.256s - $ ls -l perl-*.dwarfs - -rw-r--r-- 1 mhx users 494602224 Dec 10 18:20 perl-install.dwarfs - -rw-r--r-- 1 mhx users 428802416 Dec 10 20:44 perl-lzma.dwarfs + $ ll perl-*.dwarfs + -rw-r--r-- 1 mhx users 447230618 Mar 3 20:28 perl-install.dwarfs + -rw-r--r-- 1 mhx users 390845518 Mar 4 20:28 perl-lzma-re.dwarfs + -rw-r--r-- 1 mhx users 315482627 Mar 3 21:23 perl-install-lzma.dwarfs -This reduces the file system size by another 13%, pushing the total -compression ratio to 0.84% (or 0.74% when considering disk usage). +Note that while the recompressed filesystem is smaller than the original image, +it is still a lot bigger than the filesystem we previously build with `-l9`. +The reason is that the recompressed image still uses the same block size, and +the block size cannot be changed by recompressing. In terms of how fast the file system is when using it, a quick test I've done is to freshly mount the filesystem created above and run @@ -439,100 +485,103 @@ each of the 1139 `perl` executables to print their version. $ hyperfine -c "umount mnt" -p "umount mnt; dwarfs perl-install.dwarfs mnt -o cachesize=1g -o workers=4; sleep 1" -P procs 5 20 -D 5 "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'" Benchmark #1: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P5 sh -c '$0 -v >/dev/null' - Time (mean ± σ): 4.092 s ± 0.031 s [User: 2.183 s, System: 4.355 s] - Range (min … max): 4.022 s … 4.122 s 10 runs + Time (mean ± σ): 1.810 s ± 0.013 s [User: 1.847 s, System: 0.623 s] + Range (min … max): 1.788 s … 1.825 s 10 runs Benchmark #2: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null' - Time (mean ± σ): 2.698 s ± 0.027 s [User: 1.979 s, System: 3.977 s] - Range (min … max): 2.657 s … 2.732 s 10 runs + Time (mean ± σ): 1.333 s ± 0.009 s [User: 1.993 s, System: 0.656 s] + Range (min … max): 1.321 s … 1.354 s 10 runs Benchmark #3: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P15 sh -c '$0 -v >/dev/null' - Time (mean ± σ): 2.341 s ± 0.029 s [User: 1.883 s, System: 3.794 s] - Range (min … max): 2.303 s … 2.397 s 10 runs + Time (mean ± σ): 1.181 s ± 0.018 s [User: 2.086 s, System: 0.712 s] + Range (min … max): 1.165 s … 1.214 s 10 runs Benchmark #4: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null' - Time (mean ± σ): 2.207 s ± 0.037 s [User: 1.818 s, System: 3.673 s] - Range (min … max): 2.163 s … 2.278 s 10 runs + Time (mean ± σ): 1.149 s ± 0.015 s [User: 2.128 s, System: 0.781 s] + Range (min … max): 1.136 s … 1.186 s 10 runs These timings are for *initial* runs on a freshly mounted file system, -running 5, 10, 15 and 20 processes in parallel. 2.2 seconds means that -it takes only about 2 milliseconds per Perl binary. +running 5, 10, 15 and 20 processes in parallel. 1.1 seconds means that +it takes only about 1 millisecond per Perl binary. Following are timings for *subsequent* runs, both on DwarFS (at `mnt`) -and the original EXT4 (at `install`). DwarFS is around 15% slower here: +and the original XFS (at `install`). DwarFS is around 15% slower here: $ hyperfine -P procs 10 20 -D 10 -w1 "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'" "ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'" Benchmark #1: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null' - Time (mean ± σ): 655.8 ms ± 5.5 ms [User: 1.716 s, System: 2.784 s] - Range (min … max): 647.6 ms … 664.3 ms 10 runs + Time (mean ± σ): 347.0 ms ± 7.2 ms [User: 1.755 s, System: 0.452 s] + Range (min … max): 341.3 ms … 365.2 ms 10 runs Benchmark #2: ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null' - Time (mean ± σ): 583.9 ms ± 5.0 ms [User: 1.715 s, System: 2.773 s] - Range (min … max): 577.0 ms … 592.0 ms 10 runs + Time (mean ± σ): 302.5 ms ± 3.3 ms [User: 1.656 s, System: 0.377 s] + Range (min … max): 297.1 ms … 308.7 ms 10 runs Benchmark #3: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null' - Time (mean ± σ): 638.2 ms ± 10.7 ms [User: 1.667 s, System: 2.736 s] - Range (min … max): 629.1 ms … 658.4 ms 10 runs + Time (mean ± σ): 342.2 ms ± 4.1 ms [User: 1.766 s, System: 0.451 s] + Range (min … max): 336.0 ms … 349.7 ms 10 runs Benchmark #4: ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null' - Time (mean ± σ): 567.0 ms ± 3.2 ms [User: 1.684 s, System: 2.719 s] - Range (min … max): 561.5 ms … 570.5 ms 10 runs + Time (mean ± σ): 302.0 ms ± 3.0 ms [User: 1.659 s, System: 0.374 s] + Range (min … max): 297.0 ms … 305.4 ms 10 runs + + Summary + 'ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null'' ran + 1.00 ± 0.01 times faster than 'ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null'' + 1.13 ± 0.02 times faster than 'ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null'' + 1.15 ± 0.03 times faster than 'ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null'' Using the lzma-compressed file system, the metrics for *initial* runs look -considerably worse: +considerably worse (about an order of magnitude): - $ hyperfine -c "umount mnt" -p "umount mnt; dwarfs perl-lzma.dwarfs mnt -o cachesize=1g -o workers=4; sleep 1" -P procs 5 20 -D 5 "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'" + $ hyperfine -c "umount mnt" -p "umount mnt; dwarfs perl-install-lzma.dwarfs mnt -o cachesize=1g -o workers=4; sleep 1" -P procs 5 20 -D 5 "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'" Benchmark #1: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P5 sh -c '$0 -v >/dev/null' - Time (mean ± σ): 20.170 s ± 0.138 s [User: 2.379 s, System: 4.713 s] - Range (min … max): 19.968 s … 20.390 s 10 runs + Time (mean ± σ): 10.660 s ± 0.057 s [User: 1.952 s, System: 0.729 s] + Range (min … max): 10.615 s … 10.811 s 10 runs Benchmark #2: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null' - Time (mean ± σ): 12.960 s ± 0.105 s [User: 2.130 s, System: 4.272 s] - Range (min … max): 12.807 s … 13.122 s 10 runs + Time (mean ± σ): 9.092 s ± 0.021 s [User: 1.979 s, System: 0.680 s] + Range (min … max): 9.059 s … 9.126 s 10 runs Benchmark #3: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P15 sh -c '$0 -v >/dev/null' - Time (mean ± σ): 11.495 s ± 0.141 s [User: 1.995 s, System: 4.100 s] - Range (min … max): 11.324 s … 11.782 s 10 runs + Time (mean ± σ): 9.012 s ± 0.188 s [User: 2.077 s, System: 0.702 s] + Range (min … max): 8.839 s … 9.277 s 10 runs Benchmark #4: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null' - Time (mean ± σ): 11.238 s ± 0.100 s [User: 1.894 s, System: 3.855 s] - Range (min … max): 11.119 s … 11.371 s 10 runs + Time (mean ± σ): 9.004 s ± 0.298 s [User: 2.134 s, System: 0.736 s] + Range (min … max): 8.611 s … 9.555 s 10 runs -So you might want to consider using zstd instead of lzma if you'd +So you might want to consider using `zstd` instead of `lzma` if you'd like to optimize for file system performance. It's also the default compression used by `mkdwarfs`. -On a different system, Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, -with 4 cores, I did more tests with both SquashFS and DwarFS -(just because on the 6 core box my kernel didn't have support -for zstd in SquashFS): +Now here's a comparison with the SquashFS filesystem: - hyperfine -c 'sudo umount /tmp/perl/install' -p 'umount /tmp/perl/install; dwarfs perl-install.dwarfs /tmp/perl/install -o cachesize=1g -o workers=4; sleep 1' -n dwarfs-zstd "ls -1 /tmp/perl/install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '\$0 -v >/dev/null'" -p 'sudo umount /tmp/perl/install; sudo mount -t squashfs perl-install.squashfs /tmp/perl/install; sleep 1' -n squashfs-zstd "ls -1 /tmp/perl/install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '\$0 -v >/dev/null'" + $ hyperfine -c 'sudo umount mnt' -p 'umount mnt; dwarfs perl-install.dwarfs mnt -o cachesize=1g -o workers=4; sleep 1' -n dwarfs-zstd "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '\$0 -v >/dev/null'" -p 'sudo umount mnt; sudo mount -t squashfs perl-install.squashfs mnt; sleep 1' -n squashfs-zstd "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '\$0 -v >/dev/null'" Benchmark #1: dwarfs-zstd - Time (mean ± σ): 2.071 s ± 0.372 s [User: 1.727 s, System: 2.866 s] - Range (min … max): 1.711 s … 2.532 s 10 runs + Time (mean ± σ): 1.151 s ± 0.015 s [User: 2.147 s, System: 0.769 s] + Range (min … max): 1.118 s … 1.174 s 10 runs Benchmark #2: squashfs-zstd - Time (mean ± σ): 3.668 s ± 0.070 s [User: 2.173 s, System: 21.287 s] - Range (min … max): 3.616 s … 3.846 s 10 runs + Time (mean ± σ): 6.733 s ± 0.007 s [User: 3.188 s, System: 17.015 s] + Range (min … max): 6.721 s … 6.743 s 10 runs Summary 'dwarfs-zstd' ran - 1.77 ± 0.32 times faster than 'squashfs-zstd' + 5.85 ± 0.08 times faster than 'squashfs-zstd' -So DwarFS is almost twice as fast as SquashFS. But what's more, +So DwarFS is almost six times faster than SquashFS. But what's more, SquashFS also uses significantly more CPU power. However, the numbers shown above for DwarFS obviously don't include the time spent in the `dwarfs` process, so I repeated the test outside of hyperfine: - $ time dwarfs perl-install.dwarfs /tmp/perl/install -o cachesize=1g -o workers=4 -f + $ time dwarfs perl-install.dwarfs mnt -o cachesize=1g -o workers=4 -f - real 0m8.463s - user 0m3.821s - sys 0m2.117s + real 0m4.569s + user 0m2.154s + sys 0m1.846s -So in total, DwarFS was using 10.5 seconds of CPU time, whereas -SquashFS was using 23.5 seconds, more than twice as much. Ignore +So in total, DwarFS was using 5.7 seconds of CPU time, whereas +SquashFS was using 20.2 seconds, almost four times as much. Ignore the 'real' time, this is only how long it took me to unmount the file system again after mounting it. @@ -557,7 +606,7 @@ I wrote a little script to be able to run multiple builds in parallel: set -eu perl=$1 dir=$(echo "$perl" | cut -d/ --output-delimiter=- -f5,6) -rsync -a Tie-Hash-Indexed-0.08/ $dir/ +rsync -a Tie-Hash-Indexed/ $dir/ cd $dir $1 Makefile.PL >/dev/null 2>&1 make test >/dev/null 2>&1 @@ -566,34 +615,35 @@ rm -rf $dir echo $perl ``` -The following command will run up to 8 builds in parallel on the 4 core -i7 CPU, including debug, optimized and threaded versions of all Perl +The following command will run up to 16 builds in parallel on the 8 core +Xeon CPU, including debug, optimized and threaded versions of all Perl releases between 5.10.0 and 5.33.3, a total of 624 `perl` installations: - $ time ls -1 /tmp/perl/install/*/perl-5.??.?/bin/perl5* | sort -t / -k 8 | xargs -d $'\n' -P 8 -n 1 ./build.sh + $ time ls -1 /tmp/perl/install/*/perl-5.??.?/bin/perl5* | sort -t / -k 8 | xargs -d $'\n' -P 16 -n 1 ./build.sh Tests were done with a cleanly mounted file system to make sure the caches were empty. `ccache` was primed to make sure all compiler runs could be satisfied from the cache. With SquashFS, the timing was: - real 3m17.182s - user 20m54.064s - sys 4m16.907s + real 0m52.385s + user 8m10.333s + sys 4m10.056s And with DwarFS: - real 3m14.402s - user 19m42.984s - sys 2m49.292s + real 0m50.469s + user 9m22.597s + sys 1m18.469s -So, frankly, not much of a difference. The `dwarfs` process itself used: +So, frankly, not much of a difference, with DwarFS being just a bit faster. +The `dwarfs` process itself used: - real 4m23.151s - user 0m25.036s - sys 0m35.216s + real 0m56.686s + user 0m18.857s + sys 0m21.058s -So again, DwarFS used less raw CPU power, but in terms of wallclock time, -the difference is really marginal. +So again, DwarFS used less raw CPU power overall, but in terms of wallclock +time, the difference is really marginal. ### With SquashFS & xz @@ -602,60 +652,60 @@ a recent Raspberry Pi OS release. This file system also contains device inodes, so in order to preserve those, we pass `--with-devices` to `mkdwarfs`: $ time sudo mkdwarfs -i raspbian -o raspbian.dwarfs --with-devices - 20:49:45.099221 scanning raspbian - 20:49:45.395243 waiting for background scanners... - 20:49:46.019979 assigning directory and link inodes... - 20:49:46.035099 finding duplicate files... - 20:49:46.148490 saved 31.05 MiB / 1007 MiB in 1617/34582 duplicate files - 20:49:46.149221 waiting for inode scanners... - 20:49:48.518179 assigning device inodes... - 20:49:48.519512 assigning pipe/socket inodes... - 20:49:48.520322 building metadata... - 20:49:48.520425 building blocks... - 20:49:48.520473 saving names and links... - 20:49:48.520568 ordering 32965 inodes using nilsimsa similarity... - 20:49:48.522323 nilsimsa: depth=20000, limit=255 - 20:49:48.554803 updating name and link indices... - 20:49:48.577389 pre-sorted index (55243 name, 26489 path lookups) [54.95ms] - 20:50:55.921085 32965 inodes ordered [67.4s] - 20:50:55.921179 waiting for segmenting/blockifying to finish... - 20:51:02.372233 saving chunks... - 20:51:02.376389 saving directories... - 20:51:02.492263 waiting for compression to finish... - 20:51:31.098179 compressed 1007 MiB to 286.6 MiB (ratio=0.284714) - 20:51:31.140186 filesystem created without errors [106s] - ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ + I 21:30:29.812562 scanning raspbian + I 21:30:29.908984 waiting for background scanners... + I 21:30:30.217446 assigning directory and link inodes... + I 21:30:30.221941 finding duplicate files... + I 21:30:30.288099 saved 31.05 MiB / 1007 MiB in 1617/34582 duplicate files + I 21:30:30.288143 waiting for inode scanners... + I 21:30:31.393710 assigning device inodes... + I 21:30:31.394481 assigning pipe/socket inodes... + I 21:30:31.395196 building metadata... + I 21:30:31.395230 building blocks... + I 21:30:31.395291 saving names and links... + I 21:30:31.395374 ordering 32965 inodes using nilsimsa similarity... + I 21:30:31.396254 nilsimsa: depth=20000 (1000), limit=255 + I 21:30:31.407967 pre-sorted index (46431 name, 2206 path lookups) [11.66ms] + I 21:30:31.410089 updating name and link indices... + I 21:30:38.178505 32965 inodes ordered [6.783s] + I 21:30:38.179417 waiting for segmenting/blockifying to finish... + I 21:31:06.248304 saving chunks... + I 21:31:06.251998 saving directories... + I 21:31:06.402559 waiting for compression to finish... + I 21:31:16.425563 compressed 1007 MiB to 287 MiB (ratio=0.285036) + I 21:31:16.464772 filesystem created without errors [46.65s] + ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ waiting for block compression to finish - 4435 dirs, 5908/473 soft/hard links, 34582/34582 files, 7 other - original size: 1007 MiB, dedupe: 31.05 MiB (1617 files), segment: 46.9 MiB - filesystem: 928.7 MiB in 59 blocks (39117 chunks, 32965/32965 inodes) - compressed filesystem: 59 blocks/286.6 MiB written - ███████████████████████████████████████████████████████████████████▏100% | + 4435 dirs, 5908/0 soft/hard links, 34582/34582 files, 7 other + original size: 1007 MiB, dedupe: 31.05 MiB (1617 files), segment: 47.23 MiB + filesystem: 928.4 MiB in 59 blocks (38890 chunks, 32965/32965 inodes) + compressed filesystem: 59 blocks/287 MiB written [depth: 20000] + ████████████████████████████████████████████████████████████████████▏100% | - real 1m46.153s - user 18m7.973s - sys 0m16.013s + real 0m46.711s + user 10m39.038s + sys 0m8.123s Again, SquashFS uses the same compression options: $ time sudo mksquashfs raspbian raspbian.squashfs -comp zstd -Xcompression-level 22 - Parallel mksquashfs: Using 12 processors + Parallel mksquashfs: Using 16 processors Creating 4.0 filesystem on raspbian.squashfs, block size 131072. - [====================================================================-] 38644/38644 100% + [===============================================================\] 39232/39232 100% Exportable Squashfs 4.0 filesystem, zstd compressed, data block size 131072 compressed data, compressed metadata, compressed fragments, compressed xattrs, compressed ids duplicates are removed - Filesystem size 371931.65 Kbytes (363.21 Mbytes) - 36.89% of uncompressed filesystem size (1008353.15 Kbytes) - Inode table size 398565 bytes (389.22 Kbytes) - 26.61% of uncompressed inode table size (1497593 bytes) - Directory table size 408794 bytes (399.21 Kbytes) - 42.28% of uncompressed directory table size (966980 bytes) - Number of duplicate files found 1145 - Number of inodes 44459 - Number of files 34109 + Filesystem size 371934.50 Kbytes (363.22 Mbytes) + 35.98% of uncompressed filesystem size (1033650.60 Kbytes) + Inode table size 399913 bytes (390.54 Kbytes) + 26.53% of uncompressed inode table size (1507581 bytes) + Directory table size 408749 bytes (399.17 Kbytes) + 42.31% of uncompressed directory table size (966174 bytes) + Number of duplicate files found 1618 + Number of inodes 44932 + Number of files 34582 Number of fragments 3290 Number of symbolic links 5908 Number of device nodes 7 @@ -666,9 +716,9 @@ Again, SquashFS uses the same compression options: Number of uids 5 root (0) mhx (1000) - logitechmediaserver (103) + unknown (103) shutdown (6) - x2goprint (106) + unknown (106) Number of gids 15 root (0) unknown (109) @@ -686,61 +736,53 @@ Again, SquashFS uses the same compression options: adm (4) mem (8) - real 1m54.997s - user 18m32.386s - sys 0m2.627s + real 0m50.124s + user 9m41.708s + sys 0m1.727s The difference in speed is almost negligible. SquashFS is just a bit slower here. In terms of compression, the difference also isn't huge: $ ls -lh raspbian.* *.xz - -rw-r--r-- 1 root root 287M Dec 10 20:51 raspbian.dwarfs - -rw-r--r-- 1 root root 364M Dec 9 22:31 raspbian.squashfs - -rw-r--r-- 1 mhx users 297M Aug 20 12:47 2020-08-20-raspios-buster-armhf-lite.img.xz + -rw-r--r-- 1 mhx users 297M Mar 4 21:32 2020-08-20-raspios-buster-armhf-lite.img.xz + -rw-r--r-- 1 root root 287M Mar 4 21:31 raspbian.dwarfs + -rw-r--r-- 1 root root 364M Mar 4 21:33 raspbian.squashfs Interestingly, `xz` actually can't compress the whole original image better than DwarFS. We can even again try to increase the DwarFS compression level: - $ time mkdwarfs -i raspbian.dwarfs -o raspbian-9.dwarfs -l 9 --recompress - 20:55:34.416488 filesystem rewritten [69.79s] - ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ - filesystem: 928.7 MiB in 59 blocks (0 chunks, 0 inodes) - compressed filesystem: 59/59 blocks/257.7 MiB written - ██████████████████████████████████████████████████████████████████▏100% \ + $ time sudo mkdwarfs -i raspbian -o raspbian-9.dwarfs --with-devices -l9 - real 1m9.879s - user 12m52.376s - sys 0m14.315s + real 0m54.161s + user 8m40.109s + sys 0m7.101s Now that actually gets the DwarFS image size well below that of the `xz` archive: $ ls -lh raspbian-9.dwarfs *.xz - -rw-r--r-- 1 mhx users 258M Dec 10 20:55 raspbian-9.dwarfs - -rw-r--r-- 1 mhx users 297M Aug 20 12:47 2020-08-20-raspios-buster-armhf-lite.img.xz + -rw-r--r-- 1 root root 244M Mar 4 21:36 raspbian-9.dwarfs + -rw-r--r-- 1 mhx users 297M Mar 4 21:32 2020-08-20-raspios-buster-armhf-lite.img.xz -However, if you actually build a tarball and compress that (instead of -compressing the EXT4 file system), `xz` is, unsurprisingly, able to take -the lead again: +Even if you actually build a tarball and compress that (instead of +compressing the EXT4 file system itself), `xz` isn't quite able to +match the DwarFS image size: - $ time sudo tar cf - raspbian | xz -9e -vT 0 >raspbian.tar.xz - 100 % 245.9 MiB / 1,012.3 MiB = 0.243 5.4 MiB/s 3:07 + $ time sudo tar cf - raspbian | xz -9 -vT 0 >raspbian.tar.xz + 100 % 246.9 MiB / 1,037.2 MiB = 0.238 13 MiB/s 1:18 - real 3m8.088s - user 14m16.519s - sys 0m5.843s + real 1m18.226s + user 6m35.381s + sys 0m2.205s $ ls -lh raspbian.tar.xz - -rw-r--r-- 1 mhx users 246M Nov 30 00:16 raspbian.tar.xz + -rw-r--r-- 1 mhx users 247M Mar 4 21:40 raspbian.tar.xz -In summary, DwarFS can get pretty close to an `xz` compressed tarball -in terms of size. It's also almsot three times faster to build the file -system than to build the tarball. At the same time, SquashFS really -isn't that much worse. It's really the cases where you *know* upfront -that your data is highly redundant where DwarFS can play out its full -strength. +In summary, DwarFS can even outperform an `xz` compressed tarball in +terms of size. It's also significantly faster to build the file +system than to build the tarball. ### With wimlib