Update README

This commit is contained in:
Marcus Holland-Moritz 2021-03-04 23:26:27 +01:00
parent 1535ffc85a
commit 635f6840e7

552
README.md
View File

@ -163,53 +163,53 @@ A good starting point for apt-based systems is probably:
You can pick either `clang` or `g++`, but at least recent `clang`
versions will produce substantially faster code:
$ hyperfine ./dwarfs_test-*
Benchmark #1: ./dwarfs_test-clang-O2
Time (mean ± σ): 9.425 s ± 0.049 s [User: 15.724 s, System: 0.773 s]
Range (min … max): 9.373 s … 9.523 s 10 runs
$ hyperfine -L prog $(echo build-*/mkdwarfs | tr ' ' ,) '{prog} --no-progress --log-level warn -i /usr/include -o /dev/null -C null'
Benchmark #1: build-clang-10/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null
Time (mean ± σ): 6.403 s ± 0.178 s [User: 12.039 s, System: 1.963 s]
Range (min … max): 6.250 s … 6.701 s 10 runs
Benchmark #2: ./dwarfs_test-clang-O3
Time (mean ± σ): 9.328 s ± 0.045 s [User: 15.593 s, System: 0.791 s]
Range (min … max): 9.277 s … 9.418 s 10 runs
Benchmark #2: build-clang-11/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null
Time (mean ± σ): 6.408 s ± 0.143 s [User: 12.109 s, System: 1.974 s]
Range (min … max): 6.231 s … 6.617 s 10 runs
Benchmark #3: ./dwarfs_test-gcc-O2
Time (mean ± σ): 13.798 s ± 0.035 s [User: 20.161 s, System: 0.767 s]
Range (min … max): 13.731 s … 13.852 s 10 runs
Benchmark #3: build-gcc-10/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null
Time (mean ± σ): 11.484 s ± 0.245 s [User: 18.487 s, System: 2.071 s]
Range (min … max): 11.119 s … 11.779 s 10 runs
Benchmark #4: ./dwarfs_test-gcc-O3
Time (mean ± σ): 13.223 s ± 0.034 s [User: 19.576 s, System: 0.769 s]
Range (min … max): 13.176 s … 13.278 s 10 runs
Benchmark #4: build-gcc-9/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null
Time (mean ± σ): 11.443 s ± 0.242 s [User: 18.419 s, System: 2.067 s]
Range (min … max): 11.177 s … 11.742 s 10 runs
Summary
'./dwarfs_test-clang-O3' ran
1.01 ± 0.01 times faster than './dwarfs_test-clang-O2'
1.42 ± 0.01 times faster than './dwarfs_test-gcc-O3'
1.48 ± 0.01 times faster than './dwarfs_test-gcc-O2'
'build-clang-10/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null' ran
1.00 ± 0.04 times faster than 'build-clang-11/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null'
1.79 ± 0.06 times faster than 'build-gcc-9/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null'
1.79 ± 0.06 times faster than 'build-gcc-10/mkdwarfs --no-progress --log-level warn -i /usr/include -o /dev/null -C null'
$ hyperfine -L prog $(echo mkdwarfs-* | tr ' ' ,) '{prog} --no-progress --log-level warn -i tree -o /dev/null -C null'
Benchmark #1: mkdwarfs-clang-O2 --no-progress --log-level warn -i tree -o /dev/null -C null
Time (mean ± σ): 4.358 s ± 0.033 s [User: 6.364 s, System: 0.622 s]
Range (min … max): 4.321 s … 4.408 s 10 runs
$ hyperfine build-*/dwarfs_test
Benchmark #1: build-clang-10/dwarfs_test
Time (mean ± σ): 1.789 s ± 0.008 s [User: 2.049 s, System: 0.636 s]
Range (min … max): 1.775 s … 1.808 s 10 runs
Benchmark #2: mkdwarfs-clang-O3 --no-progress --log-level warn -i tree -o /dev/null -C null
Time (mean ± σ): 4.282 s ± 0.035 s [User: 6.249 s, System: 0.623 s]
Range (min … max): 4.244 s … 4.349 s 10 runs
Benchmark #2: build-clang-11/dwarfs_test
Time (mean ± σ): 1.806 s ± 0.011 s [User: 2.053 s, System: 0.660 s]
Range (min … max): 1.791 s … 1.820 s 10 runs
Benchmark #3: mkdwarfs-gcc-O2 --no-progress --log-level warn -i tree -o /dev/null -C null
Time (mean ± σ): 6.212 s ± 0.031 s [User: 8.185 s, System: 0.638 s]
Range (min … max): 6.159 s … 6.250 s 10 runs
Benchmark #3: build-gcc-10/dwarfs_test
Time (mean ± σ): 2.027 s ± 0.004 s [User: 2.270 s, System: 0.797 s]
Range (min … max): 2.023 s … 2.032 s 10 runs
Benchmark #4: mkdwarfs-gcc-O3 --no-progress --log-level warn -i tree -o /dev/null -C null
Time (mean ± σ): 5.740 s ± 0.037 s [User: 7.742 s, System: 0.645 s]
Range (min … max): 5.685 s … 5.796 s 10 runs
Benchmark #4: build-gcc-9/dwarfs_test
Time (mean ± σ): 2.033 s ± 0.005 s [User: 2.278 s, System: 0.796 s]
Range (min … max): 2.027 s … 2.040 s 10 runs
Summary
'mkdwarfs-clang-O3 --no-progress --log-level warn -i tree -o /dev/null -C null' ran
1.02 ± 0.01 times faster than 'mkdwarfs-clang-O2 --no-progress --log-level warn -i tree -o /dev/null -C null'
1.34 ± 0.01 times faster than 'mkdwarfs-gcc-O3 --no-progress --log-level warn -i tree -o /dev/null -C null'
1.45 ± 0.01 times faster than 'mkdwarfs-gcc-O2 --no-progress --log-level warn -i tree -o /dev/null -C null'
'build-clang-10/dwarfs_test' ran
1.01 ± 0.01 times faster than 'build-clang-11/dwarfs_test'
1.13 ± 0.01 times faster than 'build-gcc-10/dwarfs_test'
1.14 ± 0.01 times faster than 'build-gcc-9/dwarfs_test'
These measurements were made with gcc-9.3.0 and clang-10.0.1.
These measurements were made with gcc-9.3.0, gcc-10.2.0, clang-10.0.1 and clang-11.0.1.
### Building
@ -274,9 +274,10 @@ what can currently be done with the interface.
## Usage
Please check out the man pages for [mkdwarfs](doc/mkdwarfs.md)
and [dwarfs](doc/dwarfs.md). `dwarfsck` will be built and installed
as well, but it's still work in progress.
Please check out the man pages for [mkdwarfs](doc/mkdwarfs.md),
[dwarfs](doc/dwarfs.md) and [dwarfsextract](doc/dwarfsextract.md).
`dwarfsck` will be built and installed as well, but it's still
work in progress.
The [dwarfs](doc/dwarfs.md) man page also shows an example for setting
up DwarFS with [overlayfs](https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt)
@ -285,11 +286,16 @@ DwarFS image.
## Comparison
### With SquashFS
The SquashFS and `xz` tests were all done on an 8 core Intel(R) Xeon(R)
E-2286M CPU @ 2.40GHz with 64 GiB of RAM.
These tests were done on an Intel(R) Xeon(R) CPU D-1528 @ 1.90GHz
6 core CPU with 64 GiB of RAM. The system was mostly idle during
all of the tests.
The wimlib, Cromfs and EROFS tests were done with an older version of
DwarFS on a 6 core Intel(R) Xeon(R) CPU D-1528 @ 1.90GHz with 64 GiB
of RAM.
The systems were mostly idle during all of the tests.
### With SquashFS
The source directory contained **1139 different Perl installations**
from 284 distinct releases, a total of 47.65 GiB of data in 1,927,501
@ -301,9 +307,9 @@ I'm using the same compression type and compression level for
SquashFS that is the default setting for DwarFS:
$ time mksquashfs install perl-install.squashfs -comp zstd -Xcompression-level 22
Parallel mksquashfs: Using 12 processors
Creating 4.0 filesystem on perl-install.squashfs, block size 131072.
[=====================================================================-] 2107401/2107401 100%
Parallel mksquashfs: Using 16 processors
Creating 4.0 filesystem on perl-install-zstd.squashfs, block size 131072.
[=========================================================/] 2107401/2107401 100%
Exportable Squashfs 4.0 filesystem, zstd compressed, data block size 131072
compressed data, compressed metadata, compressed fragments,
@ -330,84 +336,121 @@ SquashFS that is the default setting for DwarFS:
Number of gids 1
users (100)
real 69m18.427s
user 817m15.199s
sys 1m38.237s
real 32m54.713s
user 501m46.382s
sys 0m58.528s
For DwarFS, I'm sticking to the defaults:
$ time mkdwarfs -i install -o perl-install.dwarfs
I 23:54:10.119098 scanning install
I 23:54:32.317096 waiting for background scanners...
I 23:55:04.131755 assigning directory and link inodes...
I 23:55:04.457824 finding duplicate files...
I 23:55:22.041760 saved 28.2 GiB / 47.65 GiB in 1782826/1927501 duplicate files
I 23:55:22.041862 waiting for inode scanners...
I 23:55:55.365591 assigning device inodes...
I 23:55:55.423564 assigning pipe/socket inodes...
I 23:55:55.479380 building metadata...
I 23:55:55.479468 building blocks...
I 23:55:55.479538 saving names and links...
I 23:55:55.479615 ordering 144675 inodes using nilsimsa similarity...
I 23:55:55.488995 nilsimsa: depth=20000 (1000), limit=255
I 23:55:55.928990 updating name and link indices...
I 23:55:56.186375 pre-sorted index (659296 name, 366624 path lookups) [697.3ms]
I 00:02:11.239104 144675 inodes ordered [375.8s]
I 00:02:11.239224 waiting for segmenting/blockifying to finish...
I 00:06:45.599953 saving chunks...
I 00:06:45.639297 saving directories...
I 00:06:49.937160 waiting for compression to finish...
I 00:08:07.873055 compressed 47.65 GiB to 471.6 MiB (ratio=0.0096655)
I 00:08:08.512030 filesystem created without errors [838.4s]
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
I 20:22:43.450128 scanning install
I 20:22:49.254365 waiting for background scanners...
I 20:23:01.612414 assigning directory and link inodes...
I 20:23:01.821212 finding duplicate files...
I 20:23:11.331823 saved 28.2 GiB / 47.65 GiB in 1782826/1927501 duplicate files
I 20:23:11.332706 waiting for inode scanners...
I 20:23:23.426440 assigning device inodes...
I 20:23:23.462760 assigning pipe/socket inodes...
I 20:23:23.497944 building metadata...
I 20:23:23.497991 building blocks...
I 20:23:23.498060 saving names and links...
I 20:23:23.498127 ordering 144675 inodes using nilsimsa similarity...
I 20:23:23.502987 nilsimsa: depth=20000 (1000), limit=255
I 20:23:23.688388 updating name and link indices...
I 20:23:23.851168 pre-sorted index (660176 name, 366179 path lookups) [348.1ms]
I 20:25:50.432870 144675 inodes ordered [146.9s]
I 20:25:50.433477 waiting for segmenting/blockifying to finish...
I 20:27:06.432510 segmentation matches: good=459247, bad=6685, total=469664
I 20:27:06.432582 segmentation collisions: L1=0.007%, L2=0.001% [2233350 hashes]
I 20:27:06.432616 saving chunks...
I 20:27:06.457463 saving directories...
I 20:27:09.324708 waiting for compression to finish...
I 20:28:18.778353 compressed 47.65 GiB to 426.5 MiB (ratio=0.00874147)
I 20:28:19.044470 filesystem created without errors [335.6s]
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
waiting for block compression to finish
330733 dirs, 0/2440 soft/hard links, 1927501/1927501 files, 0 other
original size: 47.65 GiB, dedupe: 28.2 GiB (1782826 files), segment: 12.6 GiB
filesystem: 6.846 GiB in 439 blocks (471482 chunks, 144675/144675 inodes)
compressed filesystem: 439 blocks/471.6 MiB written [depth: 20000]
████████████████████████████████████████████████████████████████████████▏100% \
original size: 47.65 GiB, dedupe: 28.2 GiB (1782826 files), segment: 15.19 GiB
filesystem: 4.261 GiB in 273 blocks (318041 chunks, 144675/144675 inodes)
compressed filesystem: 273 blocks/426.5 MiB written [depth: 20000]
████████████████████████████████████████████████████████████████████████▏100% \
real 13m58.556s
user 135m40.567s
sys 2m59.853s
real 5m35.631s
user 80m45.564s
sys 1m54.045s
So in this comparison, `mkdwarfs` is almost 5 times faster than `mksquashfs`.
In total CPU time, it actually uses 6 times less CPU resources.
So in this comparison, `mkdwarfs` is almost 6 times faster than `mksquashfs`.
In total CPU time, it actually uses more than 6 times less CPU resources.
$ ls -l perl-install.*fs
-rw-r--r-- 1 mhx users 494505722 Dec 30 00:22 perl-install.dwarfs
-rw-r--r-- 1 mhx users 4748902400 Nov 25 00:37 perl-install.squashfs
$ ll perl-install.*fs
-rw-r--r-- 1 mhx users 447230618 Mar 3 20:28 perl-install.dwarfs
-rw-r--r-- 1 mhx users 4748902400 Mar 3 20:10 perl-install.squashfs
In terms of compression ratio, the **DwarFS file system is almost 10 times
In terms of compression ratio, the **DwarFS file system is more than 10 times
smaller than the SquashFS file system**. With DwarFS, the content has been
**compressed down to less than 1% (!) of its original size**. This compression
**compressed down to less than 0.9% (!) of its original size**. This compression
ratio only considers the data stored in the individual files, not the actual
disk space used. On the original EXT4 file system, according to `du`, the
source folder uses 54 GiB, so **the DwarFS image actually only uses 0.85% of
disk space used. On the original XFS file system, according to `du`, the
source folder uses 52 GiB, so **the DwarFS image actually only uses 0.8% of
the original space**.
When using identical block sizes for both file systems, the difference,
quite expectedly, becomes a lot less dramatic:
Here's another comparison using `lzma` compression instead of `zstd`:
$ time mksquashfs install perl-install-1M.squashfs -comp zstd -Xcompression-level 22 -b 1M
$ time mksquashfs install perl-install-lzma.squashfs -comp lzma
real 41m55.004s
user 340m30.012s
sys 1m47.945s
real 13m42.825s
user 205m40.851s
sys 3m29.088s
$ time mkdwarfs -i install -o perl-install-1M.dwarfs -S 20
$ time mkdwarfs -i install -o perl-install-lzma.dwarfs -l9
real 24m38.027s
user 282m37.305s
sys 2m37.558s
real 3m51.163s
user 50m30.446s
sys 1m46.203s
$ ls -l perl-install-1M.*
-rw-r--r-- 1 mhx users 2953052798 Dec 10 18:47 perl-install-1M.dwarfs
-rw-r--r-- 1 mhx users 4198944768 Nov 30 10:05 perl-install-1M.squashfs
$ ll perl-install-lzma.*fs
-rw-r--r-- 1 mhx users 315482627 Mar 3 21:23 perl-install-lzma.dwarfs
-rw-r--r-- 1 mhx users 3838406656 Mar 3 20:50 perl-install-lzma.squashfs
It's immediately obvious that the runs are significantly faster and the
resulting images are significantly smaller. Still, `mkdwarfs` is about
**4 times faster** and produces and image that's **12 times smaller** than
the SquashFS image. The DwarFS image is only 0.6% of the original file size.
So why not use `lzma` instead of `zstd` by default? The reason is that `lzma`
is about an order of magnitude slower to decompress than `zstd`. If you're
only accessing data on your compressed filesystem occasionally, this might
not be a big deal, but if you use it extensively, `zstd` will result in
better performance.
The comparisons above are not completely fair. `mksquashfs` by default
uses a block size of 128KiB, whereas `mkdwarfs` uses 16MiB blocks by default,
or even 64MiB blocks with `-l9`. When using identical block sizes for both
file systems, the difference, quite expectedly, becomes a lot less dramatic:
$ time mksquashfs install perl-install-lzma-1M.squashfs -comp lzma -b 1M
real 15m43.319s
user 139m24.533s
sys 0m45.132s
$ time mkdwarfs -i install -o perl-install-lzma-1M.dwarfs -l9 -S20 -B3
real 4m52.694s
user 49m20.611s
sys 6m46.285s
$ ll perl-install*.*fs
-rw-r--r-- 1 mhx users 947407018 Mar 3 22:23 perl-install-lzma-1M.dwarfs
-rw-r--r-- 1 mhx users 3407474688 Mar 3 21:54 perl-install-lzma-1M.squashfs
Even this is *still* not entirely fair, as it uses a feature (`-B3`) that allows
DwarFS to reference file chunks from up to two previous filesystem blocks.
But the point is that this is really where SquashFS tops out, as it doesn't
support larger block sizes. And as you'll see below, the larger blocks that
DwarFS is using don't necessarily negatively impact performance.
support larger block sizes or back-referencing. And as you'll see below, the
larger blocks that DwarFS is using by default don't necessarily negatively
impact performance.
DwarFS also features an option to recompress an existing file system with
a different compression algorithm. This can be useful as it allows relatively
@ -415,23 +458,26 @@ fast experimentation with different algorithms and options without requiring
a full rebuild of the file system. For example, recompressing the above file
system with the best possible compression (`-l 9`):
$ time mkdwarfs --recompress -i perl-install.dwarfs -o perl-lzma.dwarfs -l 9
20:44:43.738823 filesystem rewritten [385.4s]
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
filesystem: 6.832 GiB in 438 blocks (0 chunks, 0 inodes)
compressed filesystem: 438/438 blocks/408.9 MiB written
█████████████████████████████████████████████████████████████████████▏100% |
$ time mkdwarfs --recompress -i perl-install.dwarfs -o perl-lzma-re.dwarfs -l9
I 20:28:03.246534 filesystem rewrittenwithout errors [148.3s]
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
filesystem: 4.261 GiB in 273 blocks (0 chunks, 0 inodes)
compressed filesystem: 273/273 blocks/372.7 MiB written
████████████████████████████████████████████████████████████████████▏100% \
real 6m25.474s
user 73m0.298s
sys 1m37.701s
real 2m28.279s
user 37m8.825s
sys 0m43.256s
$ ls -l perl-*.dwarfs
-rw-r--r-- 1 mhx users 494602224 Dec 10 18:20 perl-install.dwarfs
-rw-r--r-- 1 mhx users 428802416 Dec 10 20:44 perl-lzma.dwarfs
$ ll perl-*.dwarfs
-rw-r--r-- 1 mhx users 447230618 Mar 3 20:28 perl-install.dwarfs
-rw-r--r-- 1 mhx users 390845518 Mar 4 20:28 perl-lzma-re.dwarfs
-rw-r--r-- 1 mhx users 315482627 Mar 3 21:23 perl-install-lzma.dwarfs
This reduces the file system size by another 13%, pushing the total
compression ratio to 0.84% (or 0.74% when considering disk usage).
Note that while the recompressed filesystem is smaller than the original image,
it is still a lot bigger than the filesystem we previously build with `-l9`.
The reason is that the recompressed image still uses the same block size, and
the block size cannot be changed by recompressing.
In terms of how fast the file system is when using it, a quick test
I've done is to freshly mount the filesystem created above and run
@ -439,100 +485,103 @@ each of the 1139 `perl` executables to print their version.
$ hyperfine -c "umount mnt" -p "umount mnt; dwarfs perl-install.dwarfs mnt -o cachesize=1g -o workers=4; sleep 1" -P procs 5 20 -D 5 "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'"
Benchmark #1: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P5 sh -c '$0 -v >/dev/null'
Time (mean ± σ): 4.092 s ± 0.031 s [User: 2.183 s, System: 4.355 s]
Range (min … max): 4.022 s … 4.122 s 10 runs
Time (mean ± σ): 1.810 s ± 0.013 s [User: 1.847 s, System: 0.623 s]
Range (min … max): 1.788 s … 1.825 s 10 runs
Benchmark #2: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null'
Time (mean ± σ): 2.698 s ± 0.027 s [User: 1.979 s, System: 3.977 s]
Range (min … max): 2.657 s … 2.732 s 10 runs
Time (mean ± σ): 1.333 s ± 0.009 s [User: 1.993 s, System: 0.656 s]
Range (min … max): 1.321 s … 1.354 s 10 runs
Benchmark #3: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P15 sh -c '$0 -v >/dev/null'
Time (mean ± σ): 2.341 s ± 0.029 s [User: 1.883 s, System: 3.794 s]
Range (min … max): 2.303 s … 2.397 s 10 runs
Time (mean ± σ): 1.181 s ± 0.018 s [User: 2.086 s, System: 0.712 s]
Range (min … max): 1.165 s … 1.214 s 10 runs
Benchmark #4: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null'
Time (mean ± σ): 2.207 s ± 0.037 s [User: 1.818 s, System: 3.673 s]
Range (min … max): 2.163 s … 2.278 s 10 runs
Time (mean ± σ): 1.149 s ± 0.015 s [User: 2.128 s, System: 0.781 s]
Range (min … max): 1.136 s … 1.186 s 10 runs
These timings are for *initial* runs on a freshly mounted file system,
running 5, 10, 15 and 20 processes in parallel. 2.2 seconds means that
it takes only about 2 milliseconds per Perl binary.
running 5, 10, 15 and 20 processes in parallel. 1.1 seconds means that
it takes only about 1 millisecond per Perl binary.
Following are timings for *subsequent* runs, both on DwarFS (at `mnt`)
and the original EXT4 (at `install`). DwarFS is around 15% slower here:
and the original XFS (at `install`). DwarFS is around 15% slower here:
$ hyperfine -P procs 10 20 -D 10 -w1 "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'" "ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'"
Benchmark #1: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null'
Time (mean ± σ): 655.8 ms ± 5.5 ms [User: 1.716 s, System: 2.784 s]
Range (min … max): 647.6 ms … 664.3 ms 10 runs
Time (mean ± σ): 347.0 ms ± 7.2 ms [User: 1.755 s, System: 0.452 s]
Range (min … max): 341.3 ms … 365.2 ms 10 runs
Benchmark #2: ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null'
Time (mean ± σ): 583.9 ms ± 5.0 ms [User: 1.715 s, System: 2.773 s]
Range (min … max): 577.0 ms … 592.0 ms 10 runs
Time (mean ± σ): 302.5 ms ± 3.3 ms [User: 1.656 s, System: 0.377 s]
Range (min … max): 297.1 ms … 308.7 ms 10 runs
Benchmark #3: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null'
Time (mean ± σ): 638.2 ms ± 10.7 ms [User: 1.667 s, System: 2.736 s]
Range (min … max): 629.1 ms … 658.4 ms 10 runs
Time (mean ± σ): 342.2 ms ± 4.1 ms [User: 1.766 s, System: 0.451 s]
Range (min … max): 336.0 ms … 349.7 ms 10 runs
Benchmark #4: ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null'
Time (mean ± σ): 567.0 ms ± 3.2 ms [User: 1.684 s, System: 2.719 s]
Range (min … max): 561.5 ms … 570.5 ms 10 runs
Time (mean ± σ): 302.0 ms ± 3.0 ms [User: 1.659 s, System: 0.374 s]
Range (min … max): 297.0 ms … 305.4 ms 10 runs
Summary
'ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null'' ran
1.00 ± 0.01 times faster than 'ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null''
1.13 ± 0.02 times faster than 'ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null''
1.15 ± 0.03 times faster than 'ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null''
Using the lzma-compressed file system, the metrics for *initial* runs look
considerably worse:
considerably worse (about an order of magnitude):
$ hyperfine -c "umount mnt" -p "umount mnt; dwarfs perl-lzma.dwarfs mnt -o cachesize=1g -o workers=4; sleep 1" -P procs 5 20 -D 5 "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'"
$ hyperfine -c "umount mnt" -p "umount mnt; dwarfs perl-install-lzma.dwarfs mnt -o cachesize=1g -o workers=4; sleep 1" -P procs 5 20 -D 5 "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'"
Benchmark #1: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P5 sh -c '$0 -v >/dev/null'
Time (mean ± σ): 20.170 s ± 0.138 s [User: 2.379 s, System: 4.713 s]
Range (min … max): 19.968 s … 20.390 s 10 runs
Time (mean ± σ): 10.660 s ± 0.057 s [User: 1.952 s, System: 0.729 s]
Range (min … max): 10.615 s … 10.811 s 10 runs
Benchmark #2: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null'
Time (mean ± σ): 12.960 s ± 0.105 s [User: 2.130 s, System: 4.272 s]
Range (min … max): 12.807 s … 13.122 s 10 runs
Time (mean ± σ): 9.092 s ± 0.021 s [User: 1.979 s, System: 0.680 s]
Range (min … max): 9.059 s … 9.126 s 10 runs
Benchmark #3: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P15 sh -c '$0 -v >/dev/null'
Time (mean ± σ): 11.495 s ± 0.141 s [User: 1.995 s, System: 4.100 s]
Range (min … max): 11.324 s … 11.782 s 10 runs
Time (mean ± σ): 9.012 s ± 0.188 s [User: 2.077 s, System: 0.702 s]
Range (min … max): 8.839 s … 9.277 s 10 runs
Benchmark #4: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null'
Time (mean ± σ): 11.238 s ± 0.100 s [User: 1.894 s, System: 3.855 s]
Range (min … max): 11.119 s … 11.371 s 10 runs
Time (mean ± σ): 9.004 s ± 0.298 s [User: 2.134 s, System: 0.736 s]
Range (min … max): 8.611 s … 9.555 s 10 runs
So you might want to consider using zstd instead of lzma if you'd
So you might want to consider using `zstd` instead of `lzma` if you'd
like to optimize for file system performance. It's also the default
compression used by `mkdwarfs`.
On a different system, Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz,
with 4 cores, I did more tests with both SquashFS and DwarFS
(just because on the 6 core box my kernel didn't have support
for zstd in SquashFS):
Now here's a comparison with the SquashFS filesystem:
hyperfine -c 'sudo umount /tmp/perl/install' -p 'umount /tmp/perl/install; dwarfs perl-install.dwarfs /tmp/perl/install -o cachesize=1g -o workers=4; sleep 1' -n dwarfs-zstd "ls -1 /tmp/perl/install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '\$0 -v >/dev/null'" -p 'sudo umount /tmp/perl/install; sudo mount -t squashfs perl-install.squashfs /tmp/perl/install; sleep 1' -n squashfs-zstd "ls -1 /tmp/perl/install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '\$0 -v >/dev/null'"
$ hyperfine -c 'sudo umount mnt' -p 'umount mnt; dwarfs perl-install.dwarfs mnt -o cachesize=1g -o workers=4; sleep 1' -n dwarfs-zstd "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '\$0 -v >/dev/null'" -p 'sudo umount mnt; sudo mount -t squashfs perl-install.squashfs mnt; sleep 1' -n squashfs-zstd "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '\$0 -v >/dev/null'"
Benchmark #1: dwarfs-zstd
Time (mean ± σ): 2.071 s ± 0.372 s [User: 1.727 s, System: 2.866 s]
Range (min … max): 1.711 s … 2.532 s 10 runs
Time (mean ± σ): 1.151 s ± 0.015 s [User: 2.147 s, System: 0.769 s]
Range (min … max): 1.118 s … 1.174 s 10 runs
Benchmark #2: squashfs-zstd
Time (mean ± σ): 3.668 s ± 0.070 s [User: 2.173 s, System: 21.287 s]
Range (min … max): 3.616 s … 3.846 s 10 runs
Time (mean ± σ): 6.733 s ± 0.007 s [User: 3.188 s, System: 17.015 s]
Range (min … max): 6.721 s … 6.743 s 10 runs
Summary
'dwarfs-zstd' ran
1.77 ± 0.32 times faster than 'squashfs-zstd'
5.85 ± 0.08 times faster than 'squashfs-zstd'
So DwarFS is almost twice as fast as SquashFS. But what's more,
So DwarFS is almost six times faster than SquashFS. But what's more,
SquashFS also uses significantly more CPU power. However, the numbers
shown above for DwarFS obviously don't include the time spent in the
`dwarfs` process, so I repeated the test outside of hyperfine:
$ time dwarfs perl-install.dwarfs /tmp/perl/install -o cachesize=1g -o workers=4 -f
$ time dwarfs perl-install.dwarfs mnt -o cachesize=1g -o workers=4 -f
real 0m8.463s
user 0m3.821s
sys 0m2.117s
real 0m4.569s
user 0m2.154s
sys 0m1.846s
So in total, DwarFS was using 10.5 seconds of CPU time, whereas
SquashFS was using 23.5 seconds, more than twice as much. Ignore
So in total, DwarFS was using 5.7 seconds of CPU time, whereas
SquashFS was using 20.2 seconds, almost four times as much. Ignore
the 'real' time, this is only how long it took me to unmount the
file system again after mounting it.
@ -557,7 +606,7 @@ I wrote a little script to be able to run multiple builds in parallel:
set -eu
perl=$1
dir=$(echo "$perl" | cut -d/ --output-delimiter=- -f5,6)
rsync -a Tie-Hash-Indexed-0.08/ $dir/
rsync -a Tie-Hash-Indexed/ $dir/
cd $dir
$1 Makefile.PL >/dev/null 2>&1
make test >/dev/null 2>&1
@ -566,34 +615,35 @@ rm -rf $dir
echo $perl
```
The following command will run up to 8 builds in parallel on the 4 core
i7 CPU, including debug, optimized and threaded versions of all Perl
The following command will run up to 16 builds in parallel on the 8 core
Xeon CPU, including debug, optimized and threaded versions of all Perl
releases between 5.10.0 and 5.33.3, a total of 624 `perl` installations:
$ time ls -1 /tmp/perl/install/*/perl-5.??.?/bin/perl5* | sort -t / -k 8 | xargs -d $'\n' -P 8 -n 1 ./build.sh
$ time ls -1 /tmp/perl/install/*/perl-5.??.?/bin/perl5* | sort -t / -k 8 | xargs -d $'\n' -P 16 -n 1 ./build.sh
Tests were done with a cleanly mounted file system to make sure the caches
were empty. `ccache` was primed to make sure all compiler runs could be
satisfied from the cache. With SquashFS, the timing was:
real 3m17.182s
user 20m54.064s
sys 4m16.907s
real 0m52.385s
user 8m10.333s
sys 4m10.056s
And with DwarFS:
real 3m14.402s
user 19m42.984s
sys 2m49.292s
real 0m50.469s
user 9m22.597s
sys 1m18.469s
So, frankly, not much of a difference. The `dwarfs` process itself used:
So, frankly, not much of a difference, with DwarFS being just a bit faster.
The `dwarfs` process itself used:
real 4m23.151s
user 0m25.036s
sys 0m35.216s
real 0m56.686s
user 0m18.857s
sys 0m21.058s
So again, DwarFS used less raw CPU power, but in terms of wallclock time,
the difference is really marginal.
So again, DwarFS used less raw CPU power overall, but in terms of wallclock
time, the difference is really marginal.
### With SquashFS & xz
@ -602,60 +652,60 @@ a recent Raspberry Pi OS release. This file system also contains device inodes,
so in order to preserve those, we pass `--with-devices` to `mkdwarfs`:
$ time sudo mkdwarfs -i raspbian -o raspbian.dwarfs --with-devices
20:49:45.099221 scanning raspbian
20:49:45.395243 waiting for background scanners...
20:49:46.019979 assigning directory and link inodes...
20:49:46.035099 finding duplicate files...
20:49:46.148490 saved 31.05 MiB / 1007 MiB in 1617/34582 duplicate files
20:49:46.149221 waiting for inode scanners...
20:49:48.518179 assigning device inodes...
20:49:48.519512 assigning pipe/socket inodes...
20:49:48.520322 building metadata...
20:49:48.520425 building blocks...
20:49:48.520473 saving names and links...
20:49:48.520568 ordering 32965 inodes using nilsimsa similarity...
20:49:48.522323 nilsimsa: depth=20000, limit=255
20:49:48.554803 updating name and link indices...
20:49:48.577389 pre-sorted index (55243 name, 26489 path lookups) [54.95ms]
20:50:55.921085 32965 inodes ordered [67.4s]
20:50:55.921179 waiting for segmenting/blockifying to finish...
20:51:02.372233 saving chunks...
20:51:02.376389 saving directories...
20:51:02.492263 waiting for compression to finish...
20:51:31.098179 compressed 1007 MiB to 286.6 MiB (ratio=0.284714)
20:51:31.140186 filesystem created without errors [106s]
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
I 21:30:29.812562 scanning raspbian
I 21:30:29.908984 waiting for background scanners...
I 21:30:30.217446 assigning directory and link inodes...
I 21:30:30.221941 finding duplicate files...
I 21:30:30.288099 saved 31.05 MiB / 1007 MiB in 1617/34582 duplicate files
I 21:30:30.288143 waiting for inode scanners...
I 21:30:31.393710 assigning device inodes...
I 21:30:31.394481 assigning pipe/socket inodes...
I 21:30:31.395196 building metadata...
I 21:30:31.395230 building blocks...
I 21:30:31.395291 saving names and links...
I 21:30:31.395374 ordering 32965 inodes using nilsimsa similarity...
I 21:30:31.396254 nilsimsa: depth=20000 (1000), limit=255
I 21:30:31.407967 pre-sorted index (46431 name, 2206 path lookups) [11.66ms]
I 21:30:31.410089 updating name and link indices...
I 21:30:38.178505 32965 inodes ordered [6.783s]
I 21:30:38.179417 waiting for segmenting/blockifying to finish...
I 21:31:06.248304 saving chunks...
I 21:31:06.251998 saving directories...
I 21:31:06.402559 waiting for compression to finish...
I 21:31:16.425563 compressed 1007 MiB to 287 MiB (ratio=0.285036)
I 21:31:16.464772 filesystem created without errors [46.65s]
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
waiting for block compression to finish
4435 dirs, 5908/473 soft/hard links, 34582/34582 files, 7 other
original size: 1007 MiB, dedupe: 31.05 MiB (1617 files), segment: 46.9 MiB
filesystem: 928.7 MiB in 59 blocks (39117 chunks, 32965/32965 inodes)
compressed filesystem: 59 blocks/286.6 MiB written
███████████████████████████████████████████████████████████████████▏100% |
4435 dirs, 5908/0 soft/hard links, 34582/34582 files, 7 other
original size: 1007 MiB, dedupe: 31.05 MiB (1617 files), segment: 47.23 MiB
filesystem: 928.4 MiB in 59 blocks (38890 chunks, 32965/32965 inodes)
compressed filesystem: 59 blocks/287 MiB written [depth: 20000]
███████████████████████████████████████████████████████████████████▏100% |
real 1m46.153s
user 18m7.973s
sys 0m16.013s
real 0m46.711s
user 10m39.038s
sys 0m8.123s
Again, SquashFS uses the same compression options:
$ time sudo mksquashfs raspbian raspbian.squashfs -comp zstd -Xcompression-level 22
Parallel mksquashfs: Using 12 processors
Parallel mksquashfs: Using 16 processors
Creating 4.0 filesystem on raspbian.squashfs, block size 131072.
[====================================================================-] 38644/38644 100%
[===============================================================\] 39232/39232 100%
Exportable Squashfs 4.0 filesystem, zstd compressed, data block size 131072
compressed data, compressed metadata, compressed fragments,
compressed xattrs, compressed ids
duplicates are removed
Filesystem size 371931.65 Kbytes (363.21 Mbytes)
36.89% of uncompressed filesystem size (1008353.15 Kbytes)
Inode table size 398565 bytes (389.22 Kbytes)
26.61% of uncompressed inode table size (1497593 bytes)
Directory table size 408794 bytes (399.21 Kbytes)
42.28% of uncompressed directory table size (966980 bytes)
Number of duplicate files found 1145
Number of inodes 44459
Number of files 34109
Filesystem size 371934.50 Kbytes (363.22 Mbytes)
35.98% of uncompressed filesystem size (1033650.60 Kbytes)
Inode table size 399913 bytes (390.54 Kbytes)
26.53% of uncompressed inode table size (1507581 bytes)
Directory table size 408749 bytes (399.17 Kbytes)
42.31% of uncompressed directory table size (966174 bytes)
Number of duplicate files found 1618
Number of inodes 44932
Number of files 34582
Number of fragments 3290
Number of symbolic links 5908
Number of device nodes 7
@ -666,9 +716,9 @@ Again, SquashFS uses the same compression options:
Number of uids 5
root (0)
mhx (1000)
logitechmediaserver (103)
unknown (103)
shutdown (6)
x2goprint (106)
unknown (106)
Number of gids 15
root (0)
unknown (109)
@ -686,61 +736,53 @@ Again, SquashFS uses the same compression options:
adm (4)
mem (8)
real 1m54.997s
user 18m32.386s
sys 0m2.627s
real 0m50.124s
user 9m41.708s
sys 0m1.727s
The difference in speed is almost negligible. SquashFS is just a bit
slower here. In terms of compression, the difference also isn't huge:
$ ls -lh raspbian.* *.xz
-rw-r--r-- 1 root root 287M Dec 10 20:51 raspbian.dwarfs
-rw-r--r-- 1 root root 364M Dec 9 22:31 raspbian.squashfs
-rw-r--r-- 1 mhx users 297M Aug 20 12:47 2020-08-20-raspios-buster-armhf-lite.img.xz
-rw-r--r-- 1 mhx users 297M Mar 4 21:32 2020-08-20-raspios-buster-armhf-lite.img.xz
-rw-r--r-- 1 root root 287M Mar 4 21:31 raspbian.dwarfs
-rw-r--r-- 1 root root 364M Mar 4 21:33 raspbian.squashfs
Interestingly, `xz` actually can't compress the whole original image
better than DwarFS.
We can even again try to increase the DwarFS compression level:
$ time mkdwarfs -i raspbian.dwarfs -o raspbian-9.dwarfs -l 9 --recompress
20:55:34.416488 filesystem rewritten [69.79s]
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
filesystem: 928.7 MiB in 59 blocks (0 chunks, 0 inodes)
compressed filesystem: 59/59 blocks/257.7 MiB written
██████████████████████████████████████████████████████████████████▏100% \
$ time sudo mkdwarfs -i raspbian -o raspbian-9.dwarfs --with-devices -l9
real 1m9.879s
user 12m52.376s
sys 0m14.315s
real 0m54.161s
user 8m40.109s
sys 0m7.101s
Now that actually gets the DwarFS image size well below that of the
`xz` archive:
$ ls -lh raspbian-9.dwarfs *.xz
-rw-r--r-- 1 mhx users 258M Dec 10 20:55 raspbian-9.dwarfs
-rw-r--r-- 1 mhx users 297M Aug 20 12:47 2020-08-20-raspios-buster-armhf-lite.img.xz
-rw-r--r-- 1 root root 244M Mar 4 21:36 raspbian-9.dwarfs
-rw-r--r-- 1 mhx users 297M Mar 4 21:32 2020-08-20-raspios-buster-armhf-lite.img.xz
However, if you actually build a tarball and compress that (instead of
compressing the EXT4 file system), `xz` is, unsurprisingly, able to take
the lead again:
Even if you actually build a tarball and compress that (instead of
compressing the EXT4 file system itself), `xz` isn't quite able to
match the DwarFS image size:
$ time sudo tar cf - raspbian | xz -9e -vT 0 >raspbian.tar.xz
100 % 245.9 MiB / 1,012.3 MiB = 0.243 5.4 MiB/s 3:07
$ time sudo tar cf - raspbian | xz -9 -vT 0 >raspbian.tar.xz
100 % 246.9 MiB / 1,037.2 MiB = 0.238 13 MiB/s 1:18
real 3m8.088s
user 14m16.519s
sys 0m5.843s
real 1m18.226s
user 6m35.381s
sys 0m2.205s
$ ls -lh raspbian.tar.xz
-rw-r--r-- 1 mhx users 246M Nov 30 00:16 raspbian.tar.xz
-rw-r--r-- 1 mhx users 247M Mar 4 21:40 raspbian.tar.xz
In summary, DwarFS can get pretty close to an `xz` compressed tarball
in terms of size. It's also almsot three times faster to build the file
system than to build the tarball. At the same time, SquashFS really
isn't that much worse. It's really the cases where you *know* upfront
that your data is highly redundant where DwarFS can play out its full
strength.
In summary, DwarFS can even outperform an `xz` compressed tarball in
terms of size. It's also significantly faster to build the file
system than to build the tarball.
### With wimlib