mirror of
https://github.com/mhx/dwarfs.git
synced 2025-09-14 14:59:52 -04:00
docs: README overhaul
This commit is contained in:
parent
d2a1c00f04
commit
dbf84a290a
133
README.md
133
README.md
@ -10,7 +10,7 @@
|
||||
|
||||
The **D**eduplicating **W**arp-speed **A**dvanced **R**ead-only **F**ile **S**ystem.
|
||||
|
||||
A fast high compression read-only file system for Linux and Windows.
|
||||
A fast high-compression read-only file system for Linux and Windows.
|
||||
|
||||
## Table of contents
|
||||
|
||||
@ -59,7 +59,7 @@ A fast high compression read-only file system for Linux and Windows.
|
||||

|
||||
|
||||
DwarFS is a read-only file system with a focus on achieving **very
|
||||
high compression ratios** in particular for very redundant data.
|
||||
high compression ratios**, particularly for highly redundant data.
|
||||
|
||||
This probably doesn't sound very exciting, because if it's redundant,
|
||||
it *should* compress well. However, I found that other read-only,
|
||||
@ -67,10 +67,10 @@ compressed file systems don't do a very good job at making use of
|
||||
this redundancy. See [here](#comparison) for a comparison with other
|
||||
compressed file systems.
|
||||
|
||||
DwarFS also **doesn't compromise on speed** and for my use cases I've
|
||||
found it to be on par with or perform better than SquashFS. For my
|
||||
primary use case, **DwarFS compression is an order of magnitude better
|
||||
than SquashFS compression**, it's **6 times faster to build the file
|
||||
DwarFS also **doesn't compromise on speed**; in my use cases, it
|
||||
performs on par with, or better than, SquashFS. For my primary use
|
||||
case, **DwarFS compression is an order of magnitude better than
|
||||
SquashFS compression**, it's **6 times faster to build the file
|
||||
system**, it's typically faster to access files on DwarFS and it uses
|
||||
less CPU resources.
|
||||
|
||||
@ -83,7 +83,7 @@ So there's redundancy in both the video and audio data, but as the streams
|
||||
are interleaved and identical blocks are typically very far apart, it's
|
||||
challenging to make use of that redundancy for compression. SquashFS
|
||||
essentially fails to compress the source data at all, whereas DwarFS is
|
||||
able to reduce the size by almost a factor of 3, which is close to the
|
||||
able to reduce the size to nearly one-third, which is close to the
|
||||
theoretical maximum:
|
||||
|
||||
```
|
||||
@ -177,21 +177,25 @@ some rudimentary docs as well.
|
||||
### Note to Package Maintainers
|
||||
|
||||
DwarFS should usually build fine with minimal changes out of the box.
|
||||
If it doesn't, please file a issue. I've set up
|
||||
[CI jobs](https://github.com/mhx/dwarfs/actions/workflows/build.yml)
|
||||
using Docker images for Ubuntu ([22.04](https://github.com/mhx/dwarfs/blob/main/.docker/Dockerfile.ubuntu-2204)
|
||||
and [24.04](https://github.com/mhx/dwarfs/blob/main/.docker/Dockerfile.ubuntu)),
|
||||
[Fedora Rawhide](https://github.com/mhx/dwarfs/blob/main/.docker/Dockerfile.fedora)
|
||||
and [Arch](https://github.com/mhx/dwarfs/blob/main/.docker/Dockerfile.arch)
|
||||
If it doesn't, please file an issue. I've set up
|
||||
[CI jobs](actions/workflows/build.yml)
|
||||
using Docker images for Ubuntu ([22.04](.docker/Dockerfile.ubuntu-2204)
|
||||
and [24.04](.docker/Dockerfile.ubuntu)),
|
||||
[Fedora Rawhide](.docker/Dockerfile.fedora),
|
||||
[Arch Linux](.docker/Dockerfile.arch), and
|
||||
[Debian](.docker/Dockerfile.debian),
|
||||
as well as a setup script for [FreeBSD](.github/scripts/freebsd_setup_base.sh),
|
||||
that can help with determining an up-to-date set of dependencies.
|
||||
Note that building from the release tarball requires less dependencies
|
||||
than building from the git repository, notably the `ronn` tool as well
|
||||
as Python and the `mistletoe` Python module are not required when
|
||||
building from the release tarball.
|
||||
building from the release tarball. Also, the release tarball build
|
||||
doesn't require to build the thrift compiler, which makes the build
|
||||
a lot faster.
|
||||
|
||||
There are some things to be aware of:
|
||||
|
||||
- There's a tendency to try and unbundle the [folly](https://github.com/facebook/folly/)
|
||||
- There's a tendency to try to unbundle the [folly](https://github.com/facebook/folly/)
|
||||
and [fbthrift](https://github.com/facebook/fbthrift) libraries that
|
||||
are included as submodules and are built along with DwarFS.
|
||||
While I agree with the sentiment, it's unfortunately a bad idea.
|
||||
@ -209,13 +213,13 @@ There are some things to be aware of:
|
||||
fbthrift headers are required to build against DwarFS' libraries.
|
||||
|
||||
- Similar issues can arise when using a system-installed version
|
||||
of GoogleTest. GoogleTest itself recommends that it is being
|
||||
downloaded as part of the build. However, you can use the system
|
||||
installed version by passing `-DPREFER_SYSTEM_GTEST=ON` to the
|
||||
`cmake` call. Use at your own risk.
|
||||
of GoogleTest. GoogleTest recommends downloading it as part of
|
||||
the build. However, you can use the system-installed version by
|
||||
passing `-DPREFER_SYSTEM_GTEST=ON` to the `cmake` call. Use at
|
||||
your own risk.
|
||||
|
||||
- For other bundled libraries (namely `fmt`, `parallel-hashmap`,
|
||||
`range-v3`), the system installed version is used as long as it
|
||||
`range-v3`), the system-installed version is used as long as it
|
||||
meets the minimum required version. Otherwise, the preferred
|
||||
version is fetched during the build.
|
||||
|
||||
@ -233,18 +237,33 @@ In addition to the binary tarballs, there's a **universal binary**
|
||||
available for each architecture. These universal binaries contain
|
||||
*all* tools (`mkdwarfs`, `dwarfsck`, `dwarfsextract` and the `dwarfs`
|
||||
FUSE driver) in a single executable. These executables are compressed
|
||||
using [upx](https://github.com/upx/upx), so they are much smaller than
|
||||
the individual tools combined. However, it also means the binaries need
|
||||
to be decompressed each time they are run, which can have a significant
|
||||
overhead. If that is an issue, you can either stick to the "classic"
|
||||
individual binaries or you can decompress the universal binary, e.g.:
|
||||
using [upx](https://github.com/upx/upx) where possible, and using a
|
||||
custom self-extractor on all other platforms. This means they are much
|
||||
smaller than the individual tools combined. However, it also means the
|
||||
binaries need to be decompressed each time they are run, which can add
|
||||
significant overhead. If that is an issue, you can either stick to the
|
||||
"classic" individual binaries or you can decompress the universal binary.
|
||||
For upx compressed binaries, you can use:
|
||||
|
||||
```
|
||||
upx -d dwarfs-universal-0.7.0-Linux-aarch64
|
||||
$ upx -d dwarfs-universal-0.7.0-Linux-aarch64
|
||||
```
|
||||
|
||||
The universal binaries can be run through symbolic links named after
|
||||
the proper tool. e.g.:
|
||||
For the binaries that use the custom self-extractor, you can use:
|
||||
|
||||
```
|
||||
$ ./dwarfs-universal-riscv64 --extract-wrapped-binary dwarfs-universal
|
||||
```
|
||||
|
||||
Note that both self-extractors need at least Linux kernel 3.17 to work
|
||||
properly. If you want to use the FUSE driver, you'll need to install
|
||||
the fuse3 tools for your distribution. If you want to run the binaries
|
||||
on an older kernel, you can unpack the universal binary (unpacking does
|
||||
*not* require kernel 3.17). If you're stuck with fuse2, you must use the
|
||||
individual `dwarfs2` driver instead of the universal binary.
|
||||
|
||||
You can run the universal binaries via symbolic links named after
|
||||
the tool. For example:
|
||||
|
||||
```
|
||||
$ ln -s dwarfs-universal-0.7.0-Linux-aarch64 mkdwarfs
|
||||
@ -289,10 +308,13 @@ space-efficient, memory-mappable and well defined format. It's also
|
||||
included as a submodule, and we only build the compiler and a very
|
||||
reduced library that contains just enough for DwarFS to work.
|
||||
|
||||
Other than that, DwarFS really only depends on FUSE3 and on a set
|
||||
of compression libraries that Folly already depends on (namely
|
||||
[lz4](https://github.com/lz4/lz4), [zstd](https://github.com/facebook/zstd)
|
||||
and [liblzma](https://github.com/kobolabs/liblzma)).
|
||||
Beyond that, DwarFS depends on FUSE3 and a set of compression
|
||||
libraries (namely [lz4](https://github.com/lz4/lz4),
|
||||
[zstd](https://github.com/facebook/zstd),
|
||||
[brotli](https://github.com/google/brotli),
|
||||
[xz](https://github.com/tukaani-project/xz), and
|
||||
[flac](https://github.com/xiph/flac)). Except for `zstd`, these
|
||||
are all optional.
|
||||
|
||||
The dependency on [googletest](https://github.com/google/googletest)
|
||||
will be automatically resolved if you build with tests.
|
||||
@ -392,7 +414,7 @@ $ ctest -j
|
||||
```
|
||||
|
||||
All binaries use [jemalloc](https://github.com/jemalloc/jemalloc)
|
||||
as a memory allocator by default, as it is typically uses much less
|
||||
as a memory allocator by default, as it typically uses much less
|
||||
system memory compared to the `glibc` or `tcmalloc` allocators.
|
||||
To disable the use of `jemalloc`, pass `-DUSE_JEMALLOC=0` on the
|
||||
`cmake` command line.
|
||||
@ -484,10 +506,9 @@ pages using the `--man` option to each binary, e.g.:
|
||||
$ mkdwarfs --man
|
||||
```
|
||||
|
||||
The [dwarfs](doc/dwarfs.md) manual page also shows an example for setting
|
||||
up DwarFS with [overlayfs](https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt)
|
||||
in order to create a writable file system mount on top a read-only
|
||||
DwarFS image.
|
||||
The [dwarfs](doc/dwarfs.md) manual page also shows an example for setting up DwarFS
|
||||
with [overlayfs](https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html)
|
||||
in order to create a writable file system mount on top of a read-only DwarFS image.
|
||||
|
||||
A description of the DwarFS file system format can be found in
|
||||
[dwarfs-format](doc/dwarfs-format.md).
|
||||
@ -511,7 +532,7 @@ There are five individual libraries:
|
||||
- `dwarfs_reader` contains all code required to read data from a
|
||||
DwarFS image. The interfaces are defined in [`dwarfs/reader/`](include/dwarfs/reader).
|
||||
|
||||
- `dwarfs_extractor` contains the ccode required to extract a DwarFS
|
||||
- `dwarfs_extractor` contains the code required to extract a DwarFS
|
||||
image using [`libarchive`](https://libarchive.org/). The interfaces
|
||||
are defined in [`dwarfs/utility/filesystem_extractor.h`](include/dwarfs/utility/filesystem_extractor.h).
|
||||
|
||||
@ -549,9 +570,9 @@ There are a few things worth pointing out, though:
|
||||
|
||||
- DwarFS supports both hardlinks and symlinks on Windows, just as it
|
||||
does on Linux. However, creating hardlinks and symlinks seems to
|
||||
require admin privileges on Windows, so if you want to e.g. extract
|
||||
a DwarFS image that contains links of some sort, you might run into
|
||||
errors if you don't have the right privileges.
|
||||
require admin privileges on Windows, so if, for example, you want to
|
||||
extract a DwarFS image that contains links of some sort, you might
|
||||
run into errors if you don't have the right privileges.
|
||||
|
||||
- Due to a [problem](https://github.com/winfsp/winfsp/issues/454) in
|
||||
WinFsp, symlinks cannot currently point outside of the mounted file
|
||||
@ -593,7 +614,7 @@ You'll need to install:
|
||||
if it's not, you'll need to set `WINFSP_PATH` when running CMake via
|
||||
`cmake/win.bat`.
|
||||
|
||||
Now you need to clone `vcpkg` and `dwarfs`:
|
||||
Clone `vcpkg` and `dwarfs`:
|
||||
|
||||
```
|
||||
> cd %HOMEPATH%
|
||||
@ -639,8 +660,8 @@ $ brew test dwarfs
|
||||
```
|
||||
|
||||
The macOS version of the DwarFS file system driver relies on the awesome
|
||||
[macFUSE](https://osxfuse.github.io/) project and is available from
|
||||
gromgit's [homebrew-fuse tap](https://github.com/gromgit/homebrew-fuse):
|
||||
[macFUSE](https://macfuse.io) project and is available via gromgit's
|
||||
[homebrew-fuse tap](https://github.com/gromgit/homebrew-fuse):
|
||||
|
||||
```
|
||||
$ brew tap gromgit/homebrew-fuse
|
||||
@ -652,7 +673,7 @@ $ brew install dwarfs-fuse-mac
|
||||
### Astrophotography
|
||||
|
||||
Astrophotography can generate huge amounts of raw image data. During a
|
||||
single night, it's not unlikely to end up with a few dozens of gigabytes
|
||||
single night, it's not unlikely to end up with a few dozen gigabytes
|
||||
of data. With most dedicated astrophotography cameras, this data ends up
|
||||
in the form of FITS images. These are usually uncompressed, don't compress
|
||||
very well with standard compression algorithms, and while there are certain
|
||||
@ -861,7 +882,7 @@ The source directory contained **1139 different Perl installations**
|
||||
from 284 distinct releases, a total of 47.65 GiB of data in 1,927,501
|
||||
files and 330,733 directories. The source directory was freshly
|
||||
unpacked from a tar archive to an XFS partition on a 970 EVO Plus 2TB
|
||||
NVME drive, so most of its contents were likely cached.
|
||||
NVMe drive, so most of its contents were likely cached.
|
||||
|
||||
I'm using the same compression type and compression level for
|
||||
SquashFS that is the default setting for DwarFS:
|
||||
@ -1959,7 +1980,7 @@ $ ls -l perl-install-small.*fs
|
||||
I noticed that the `blockifying` step that took ages for the full dataset
|
||||
with `mkcromfs` ran substantially faster (in terms of MiB/second) on the
|
||||
smaller dataset, which makes me wonder if there's some quadratic complexity
|
||||
behaviour that's slowing down `mkcromfs`.
|
||||
behavior that's slowing down `mkcromfs`.
|
||||
|
||||
In order to be completely fair, I also ran `mkdwarfs` with `-l 9` to enable
|
||||
LZMA compression (which is what `mkcromfs` uses by default):
|
||||
@ -2017,8 +2038,8 @@ it crashed right upon trying to list the directory after mounting.
|
||||
|
||||
### With EROFS
|
||||
|
||||
[EROFS](https://github.com/erofs/erofs-utils) is a read-only compressed
|
||||
file system that has been added to the Linux kernel recently.
|
||||
[EROFS](https://github.com/erofs/erofs-utils) is another read-only
|
||||
compressed file system included in the Linux kernel.
|
||||
Its goals are different from those of DwarFS, though. It is designed to
|
||||
be lightweight (which DwarFS is definitely not) and to run on constrained
|
||||
hardware like embedded devices or smartphones. It is not designed to provide
|
||||
@ -2251,7 +2272,7 @@ sys 0m0.610s
|
||||
```
|
||||
|
||||
Turns out that `tar --zstd` is easily winning the compression speed
|
||||
test. Looking at the file sizes did actually blow my mind just a bit:
|
||||
test. Looking at the file sizes did genuinely surprise me:
|
||||
|
||||
```
|
||||
$ ll zerotest.* --sort=size
|
||||
@ -2526,11 +2547,15 @@ typically want to run on your "performance" cores.
|
||||
|
||||
### Specifying file system offset and size
|
||||
|
||||
You can specify the byte offset at which the filesystem is located in the file using the `-o offset=N` option.
|
||||
This can be useful when mounting images where there is some preceding data before the filesystem or when mounting merged/concatenated images.
|
||||
When combined with the `-o imagesize=N` option you can mount merged filesystems, i.e. multiple filesystems stored in a single file.
|
||||
You can specify the byte offset at which the file system is located in the
|
||||
file using the `-o offset=N` option. This can be useful when mounting images
|
||||
where there is some preceding data before the file system or when mounting
|
||||
merged/concatenated images. When combined with the `-o imagesize=N` option
|
||||
you can mount merged file systems, i.e. multiple file systems stored in a
|
||||
single file.
|
||||
|
||||
Here is an example, you have two filesystems concatenated into a single file and you want to mount both of them, you can achieve this by running
|
||||
Here is an example, you have two file systems concatenated into a single
|
||||
file and you want to mount both of them, you can achieve this by running:
|
||||
```sh
|
||||
dwarfs merged.dwarfs /mnt/fs1 -o imagesize=9231
|
||||
dwarfs merged.dwarfs /mnt/fs2 -o offset=9231,imagesize=7999
|
||||
|
Loading…
x
Reference in New Issue
Block a user