mirror of
https://github.com/mhx/dwarfs.git
synced 2025-09-08 03:49:44 -04:00
298 lines
15 KiB
Markdown
298 lines
15 KiB
Markdown
mkdwarfs(1) -- create highly compressed read-only file systems
|
|
==============================================================
|
|
|
|
## SYNOPSIS
|
|
|
|
`mkdwarfs` -i *path* -o *file* [*options*...]<br>
|
|
`mkdwarfs` -i *file* -o *file* --recompress [*options*...]
|
|
|
|
## DESCRIPTION
|
|
|
|
**mkdwarfs** allows you to create highly compressed, read-only file systems
|
|
in the DwarFS format. DwarFS is similar to file systems like SquashFS,
|
|
cramfs or CromFS, but it has some distinct features. For more detail,
|
|
see dwarfs(1).
|
|
|
|
In its simplest usage form, you can create a file system containing the
|
|
full contents of `/path/dir` with:
|
|
|
|
mkdwarfs -i /path/dir -o image.dwarfs
|
|
|
|
After that, you can mount it with dwarfs(1):
|
|
|
|
dwarfs image.dwarfs /path/to/mountpoint
|
|
|
|
## OPTIONS
|
|
|
|
There two mandatory options for specifying the input and output:
|
|
|
|
* `-i`, `--input=`*path*|*file*:
|
|
Path to the root directory containing the files from which you want to
|
|
build a filesystem. If the `--recompress` option is given, this argument
|
|
is the source filesystem.
|
|
|
|
* `-o`, `--output=`*file*:
|
|
File name of the output filesystem.
|
|
|
|
Most other options are concerned with compression tuning:
|
|
|
|
* `-l`, `--compress-level=`*value*:
|
|
Compression level to use for the filesystem. **If you are unsure, please
|
|
stick to the default level of 7.** This is intended to provide some
|
|
sensible defaults and will depend on which compression libraries were
|
|
available at build time. **The default level has been chosen to provide
|
|
you with the best possible compression while still keeping the file
|
|
system very fast to access.** Levels 8 and 9 will switch to LZMA
|
|
compression (when available), which will likely reduce the file system
|
|
image size, but will make it about an order of magnitude slower to
|
|
access, so reserve these levels for cases where you only need to access
|
|
the data infrequently. This `-l` option is meant to be the "easy"
|
|
interface to configure `mkdwarfs`, and it will actually pick defaults
|
|
for six distinct options: `--block-size-bits`, `--compression`,
|
|
`--schema-compression`, `--metadata-compression`, `--window-size` and
|
|
`--order`. See the output of `mkdwarfs --help` for a table listing the
|
|
exact defaults used for each compression level.
|
|
|
|
* `-S`, `--block-size-bits=`*value*:
|
|
The block size used for the compressed filesystem. The actual block size
|
|
is two to the power of this value. The valid range of this option is from
|
|
12 to 28, i.e. block sizes between 4kiB and 256MiB. Larger block sizes
|
|
will offer better compression, but will be slower and consume more memory
|
|
when actually using the filesystem, as blocks will have to be fully or at
|
|
least partially decompressed into memory. Value between 20 and 24, i.e.
|
|
between 1MiB and 16MiB, are usually a good compromise.
|
|
|
|
* `-N`, `--num-workers=`*value*:
|
|
Number of worker threads used for building the filesystem. This defaults
|
|
to the number of processors available on your system. Use this option if
|
|
you want to limit the resources used by `mkdwarfs`.
|
|
|
|
* `-M`, `--max-scanner-workers=`*value*:
|
|
Maximum number of worker threads used for building the filesystem. This
|
|
defaults to the number of processors available on your system, but the
|
|
number of active workers will be automatically adjusted based on load.
|
|
With fast SSDs, scanning multiple files is probably fine, but with older
|
|
spinning disks, having less concurrency can improve overall speed.
|
|
|
|
* `-L`, `--memory-limit=`*value*:
|
|
Approximately how much memory you want `mkdwarfs` to use during filesystem
|
|
creation. Note that currently this will only affect the block manager
|
|
component, i.e. the number of filesystem blocks that are in flight but
|
|
haven't been compressed and written to the output file yet. So the memory
|
|
used by `mkdwarfs` can certainly be larger than this limit, but it's a
|
|
good option when building large filesystems with expensive compression
|
|
algorithms.
|
|
|
|
* `-C`, `--compression=`*algorithm*[:*algopt*[=*value*]]...:
|
|
The compression algorithm and configuration used for file system data.
|
|
The value for this option is a colon-separated list. The first item is
|
|
the compression algorithm, the remaining item are its options. Options
|
|
can be either boolean or have a value. For details on which algorithms
|
|
and options are available, see the output of `mkdwarfs --help`. `zstd`
|
|
will give you the best compression while still keeping decompression
|
|
*very* fast. `lzma` will compress even better, but decompression will
|
|
be around ten times slower.
|
|
|
|
* `--schema-compression=`*algorithm*[:*algopt*[=*value*]]...:
|
|
The compression algorithm and configuration used for the metadata schema.
|
|
Takes the same arguments as `--compression` above. The schema is *very*
|
|
small, in the hundreds of bytes, so this is only relevant for extremely
|
|
small file systems. The default (`zstd`) has shown to give considerably
|
|
better results than any other algorithms.
|
|
|
|
* `--metadata-compression=`*algorithm*[:*algopt*[=*value*]]...:
|
|
The compression algorithm and configuration used for the metadata.
|
|
Takes the same arguments as `--compression` above. The metadata has been
|
|
optimized for very little redundancy and leaving it uncompressed, the
|
|
default for all levels below 7, has the benefit that it can be mapped
|
|
to memory and used directly. This improves mount time for large file
|
|
systems compared to e.g. an lzma compressed metadata block. If you don't
|
|
care about mount time, you can safely choose `lzma` compression here, as
|
|
the data will only have to be decompressed once when mounting the image.
|
|
|
|
* `--recompress`[`=all|block|metadata|none`]:
|
|
Take an existing DwarFS file system and recompress it using different
|
|
compression algorithms. If no argument or `all` is given, all sections
|
|
in the file system image will be recompressed. Note that *only* the
|
|
compression algorithms, i.e. the `--compression`, `--schema-compression`
|
|
and `--metadata-compression` options, have an impact on how the new file
|
|
system is written. Other options, e.g. `--block-size-bits` or `--order`,
|
|
have no impact. If `none` is given as an argument, none of the sections
|
|
will be recompressed, but the file system is still rewritten in the
|
|
latest file system format. This is an easy way of upgrading an old file
|
|
system image to a new format. If `block` or `metadata` is given, only
|
|
the block sections (i.e. the actual file data) or the metadata sections
|
|
are recompressed. This can be useful if you want to switch from compressed
|
|
metadata to uncompressed metadata without having to rebuild or recompress
|
|
all the other data.
|
|
|
|
* `--set-owner=`*uid*:
|
|
Set the owner for all entities in the file system. This can reduce the
|
|
size of the file system. If the input only has a single owner already,
|
|
setting this won't make any difference.
|
|
|
|
* `--set-group=`*gid*:
|
|
Set the group for all entities in the file system. This can reduce the
|
|
size of the file system. If the input only has a single group already,
|
|
setting this won't make any difference.
|
|
|
|
* `--set-time=`*time*|`now`:
|
|
Set the time stamps for all entities to this value. This can significantly
|
|
reduce the size of the file system. You can pass either a unix time stamp
|
|
or `now`.
|
|
|
|
* `--time-resolution=`*sec*|`sec`|`min`|`hour`|`day`:
|
|
Specify the resolution with which time stamps are stored. By default,
|
|
time stamps are stored with second resolution. You can specify "odd"
|
|
resolutions as well, e.g. something like 15 second resolution is
|
|
entirely possible. Moving from second to minute resolution, for example,
|
|
will save roughly 6 bits per file system entry in the metadata block.
|
|
|
|
* `--keep-all-times`:
|
|
As of release 0.3.0, by default, `mkdwarfs` will only save the contents of
|
|
the `mtime` field in order to save metadata space. If you want to save
|
|
`atime` and `ctime` as well, use this option.
|
|
|
|
* `--order=none`|`path`|`similarity`|`nilsimsa`[`:`*limit*[`:`*depth*[`:`*mindepth*]]]|`script`:
|
|
The order in which inodes will be written to the file system. Choosing `none`,
|
|
the inodes will be stored in the order in which they are discovered. With
|
|
`path`, they will be sorted asciibetically by path name of the first file
|
|
representing this inode. With `similarity`, they will be ordered using a
|
|
simple, yet fast and efficient, similarity hash function. `nilsimsa` ordering
|
|
uses a more sophisticated similarity function that is typically better than
|
|
`similarity`, but is significantly slower to compute. However, computation
|
|
can happen in the background while already building the file system.
|
|
`nilsimsa` ordering can be further tweaked by specifying a *limit* and
|
|
*depth*. The *limit* determines how soon an inode is considered similar
|
|
enough for adding. A *limit* of 255 means "essentially identical", whereas
|
|
a *limit* of 0 means "not similar at all". The *depth* determines up to
|
|
how many inodes can be checked at most while searching for a similar one.
|
|
To avoid nilsimsa ordering to become a bottleneck when ordering lots of
|
|
small files, the *depth* is adjusted dynamically to keep the input queue
|
|
to the segmentation/compression stages adequately filled. You can specify
|
|
how much the *depth* can be adjusted by also specifying *mindepth*.
|
|
The default if you omit these values is a *limit* of 255, a *depth*
|
|
of 20000 and a *mindepth* of 1000. Note that if you want reproducible
|
|
results, you need to set *depth* and *mindepth* to the same value.
|
|
Last but not least, if scripting support is built into `mkdwarfs`, you can
|
|
choose `script` to let the script determine the order.
|
|
|
|
* `-W`, `--window-size=`*value*:
|
|
Window size of cyclic hash used for segmenting. This is again an exponent
|
|
to a base of two. Cyclic hashes are used by `mkdwarfs` for finding
|
|
identical segments across multiple files. This is done on top of duplicate
|
|
file detection. If a reasonable amount of duplicate segments is found,
|
|
this means less blocks will be used in the filesystem and potentially
|
|
less memory will be used when accessing the filesystem. It doesn't
|
|
necessarily mean that the filesystem will be much smaller, as this removes
|
|
redundany that cannot be exploited by the block compression any longer.
|
|
But it shouldn't make the resulting filesystem any bigger. This option
|
|
is used along with `--window-step` to determine how extensive this
|
|
segment search will be. The smaller the window sizes, the more segments
|
|
will obviously be found. However, this also means files will become more
|
|
fragmented and thus the filesystem can be slower to use and metadata
|
|
size will grow. Passing `-W0` will completely disable duplicate segment
|
|
search.
|
|
|
|
* `--window-step=`*value*:
|
|
This option specifies how often cyclic hash values are stored for lookup.
|
|
It is specified relative to the window size, as a base-2 exponent that
|
|
divides the window size. To give a concrete example, if `--window-size=16`
|
|
and `--window-step=1`, then a cyclic hash across 65536 bytes will be stored
|
|
at every 32768 bytes of input data. If `--window-step=2`, then a hash value
|
|
will be stored at every 16384 bytes. This means that not every possible
|
|
65536-byte duplicate segment will be detected, but it is guaranteed that
|
|
all duplicate segments of (`window_size` + `window_step`) bytes or more
|
|
will be detected (unless they span across block boundaries, of course).
|
|
If you use a larger value for this option, the increments become *smaller*,
|
|
and `mkdwarfs` will be slower and use more memory.
|
|
|
|
* `-B`, `--max-lookback-blocks=`*value*:
|
|
Specify how many of the most recent blocks to scan for duplicate segments.
|
|
By default, only the current block will be scanned. The larger this number,
|
|
the more duplicate segments will likely be found, which may further improve
|
|
compression. However, it can also slow down compression and could cause the
|
|
resulting filesystem to be less efficient to use, as single small files can
|
|
now potentially span multiple filesystem blocks. Passing `-B0` will completely
|
|
disable duplicate segment search.
|
|
|
|
* `--remove-empty-dirs`:
|
|
Removes all empty directories from the output file system, recursively.
|
|
This is particularly useful when using scripts that filter out a lot of
|
|
file system entries.
|
|
|
|
* `--with-devices`:
|
|
Include character and block devices in the output file system. These are
|
|
not included by default, and due to security measures in FUSE, they will
|
|
never work in the mounted file system. However, they can still be copied
|
|
out of the mounted file system, for example using `rsync`.
|
|
|
|
* `--with-specials`:
|
|
Include named fifos and sockets in the output file system. These are not
|
|
included by default.
|
|
|
|
* `--log-level=`*name*:
|
|
Specifiy a logging level.
|
|
|
|
* `--no-progress`:
|
|
Don't show progress output while building filesystem.
|
|
|
|
* `--progress=none`|`simple`|`ascii`|`unicode`:
|
|
Choosing `none` is equivalent to specifying `--no-progress`. `simple`
|
|
will print a single line of progress information whenever the progress
|
|
has significantly changed, but at most once every 2 seconds. This is
|
|
also the default when the output is not a tty. `unicode` is the default
|
|
behaviour, which shows a nice progress bar and lots of additional
|
|
information. If your terminal cannot deal with unicode characters,
|
|
you can switch to `ascii`, which is like `unicode`, but looks less
|
|
fancy.
|
|
|
|
* `--help`:
|
|
Show program help, including defaults, compression level detail and
|
|
supported compression algorithms.
|
|
|
|
If experimental Python support was compiled into `mkdwarfs`, you can use the
|
|
following option to enable customizations via the scripting interface:
|
|
|
|
* `--script=`*file*[`:`*class*[`(`arguments`...)`]]:
|
|
Specify the Python script to load. The class name is optional if there's
|
|
a class named `mkdwarfs` in the script. It is also possible to pass
|
|
arguments to the constuctor.
|
|
|
|
## TIPS & TRICKS
|
|
|
|
### Compression Ratio vs Decompression Speed
|
|
|
|
If high compression ratio is your primary goal, definitely go for lzma
|
|
compression. However, I've found that it's only about 10% better than
|
|
zstd at the highest level. The big advantage of zstd over lzma is that
|
|
its decompression speed is about an order of magnitude faster. So if
|
|
you're extensively using the compressed file system, you'll probably
|
|
find that it's much faster with zstd.
|
|
|
|
### Block, Schema and Metadata Compression
|
|
|
|
DwarFS filesystems consist of three distinct parts of data: A potentially
|
|
large number of blocks, which store actual file data and are decompressed
|
|
on demand, as well as one schema and one metadata section. The schema is
|
|
tiny, typically less than 1000 bytes, and holds the details for how to
|
|
interpret the metadata. The schema needs to be read into memory once and
|
|
is subsequently never accessed again. The metadata itself is compressed
|
|
by default, but it doesn't have to be. Actually, if you drop the compression
|
|
level from 7 (the default) to 6, the only difference is that the metadata
|
|
is left uncompressed. This can be useful if mounting speed of the file
|
|
system is important, as the uncompressed metadata part of the file can
|
|
then simply be mapped into memory.
|
|
|
|
## AUTHOR
|
|
|
|
Written by Marcus Holland-Moritz.
|
|
|
|
## COPYRIGHT
|
|
|
|
Copyright (C) Marcus Holland-Moritz.
|
|
|
|
## SEE ALSO
|
|
|
|
dwarfs(1), dwarfsextract(1)
|