mirror of
https://github.com/mhx/dwarfs.git
synced 2025-08-04 02:06:22 -04:00
Markdown cleanup
This commit is contained in:
parent
9ad4dd655f
commit
569966b752
@ -1,11 +1,9 @@
|
||||
dwarfs-format(5) -- DwarFS File System Format v2.3
|
||||
==================================================
|
||||
# dwarfs-format(5) -- DwarFS File System Format v2.3
|
||||
|
||||
## DESCRIPTION
|
||||
|
||||
This document describes the DwarFS file system format, version 2.3.
|
||||
|
||||
|
||||
## FILE STRUCTURE
|
||||
|
||||
A DwarFS file system image is just a sequence of blocks. Each block has the
|
||||
@ -65,25 +63,23 @@ A couple of notes:
|
||||
larger than the one it supports. However, a new program will still
|
||||
read all file systems with a smaller minor version number.
|
||||
|
||||
|
||||
### Section Types
|
||||
|
||||
There are currently 3 different section types.
|
||||
|
||||
* `BLOCK` (0):
|
||||
A block of data. This is where all file data is stored. There can be
|
||||
an arbitrary number of blocks of this type.
|
||||
- `BLOCK` (0):
|
||||
A block of data. This is where all file data is stored. There can be
|
||||
an arbitrary number of blocks of this type.
|
||||
|
||||
* `METADATA_V2_SCHEMA` (7):
|
||||
The schema used to layout the `METADATA_V2` block contents. This is
|
||||
stored in "compact" thrift encoding.
|
||||
|
||||
* `METADATA_V2` (8):
|
||||
This section contains the bulk of the metadata. It's essentially just
|
||||
a collection of bit-packed arrays and structures. The exact layout of
|
||||
each list and structure depends on the actual data and is stored
|
||||
separately in `METADATA_V2_SCHEMA`.
|
||||
- `METADATA_V2_SCHEMA` (7):
|
||||
The schema used to layout the `METADATA_V2` block contents. This is
|
||||
stored in "compact" thrift encoding.
|
||||
|
||||
- `METADATA_V2` (8):
|
||||
This section contains the bulk of the metadata. It's essentially just
|
||||
a collection of bit-packed arrays and structures. The exact layout of
|
||||
each list and structure depends on the actual data and is stored
|
||||
separately in `METADATA_V2_SCHEMA`.
|
||||
|
||||
## METADATA FORMAT
|
||||
|
||||
@ -169,17 +165,12 @@ list. The index into this list is the `inode_num` from `dir_entries`,
|
||||
but you can perform direct lookups based on the inode number as well.
|
||||
The `inodes` list is strictly in the following order:
|
||||
|
||||
* directory inodes (`S_IFDIR`)
|
||||
|
||||
* symlink inodes (`S_IFLNK`)
|
||||
|
||||
* regular *unique* file inodes (`S_IREG`)
|
||||
|
||||
* regular *shared* file inodes (`S_IREG`)
|
||||
|
||||
* character/block device inodes (`S_IFCHR`, `S_IFBLK`)
|
||||
|
||||
* socket/pipe inodes (`S_IFSOCK`, `S_IFIFO`)
|
||||
- directory inodes (`S_IFDIR`)
|
||||
- symlink inodes (`S_IFLNK`)
|
||||
- regular *unique* file inodes (`S_IREG`)
|
||||
- regular *shared* file inodes (`S_IREG`)
|
||||
- character/block device inodes (`S_IFCHR`, `S_IFBLK`)
|
||||
- socket/pipe inodes (`S_IFSOCK`, `S_IFIFO`)
|
||||
|
||||
The offsets can thus be found by using a binary search with a
|
||||
predicate on the inode more. The shared file offset can be found
|
||||
@ -287,7 +278,7 @@ is true.
|
||||
|
||||
The `directories` table, when stored in packed format, omits
|
||||
all `parent_entry` fields and uses delta compression for the
|
||||
`first_entry` fields.
|
||||
`first_entry` fields.
|
||||
|
||||
In order to unpack all information, you first have to delta-
|
||||
decompress the `first_entry` fields, then traverse the whole
|
||||
|
229
doc/dwarfs.md
229
doc/dwarfs.md
@ -1,5 +1,4 @@
|
||||
dwarfs(1) -- mount highly compressed read-only file system
|
||||
==========================================================
|
||||
# dwarfs(1) -- mount highly compressed read-only file system
|
||||
|
||||
## SYNOPSIS
|
||||
|
||||
@ -14,103 +13,105 @@ but it has some distinct features.
|
||||
Other than that, it's pretty straightforward to use. Once you've created a
|
||||
file system image using mkdwarfs(1), you can mount it with:
|
||||
|
||||
dwarfs image.dwarfs /path/to/mountpoint
|
||||
```
|
||||
dwarfs image.dwarfs /path/to/mountpoint
|
||||
```
|
||||
|
||||
## OPTIONS
|
||||
|
||||
In addition to the regular FUSE options, `dwarfs` supports the following
|
||||
options:
|
||||
|
||||
* `-o cachesize=`*value*:
|
||||
Size of the block cache, in bytes. You can append suffixes
|
||||
(`k`, `m`, `g`) to specify the size in KiB, MiB and GiB,
|
||||
respectively. Note that this is not the upper memory limit
|
||||
of the process, as there may be blocks in flight that are
|
||||
not stored in the cache. Also, each block that hasn't been
|
||||
fully decompressed yet will carry decompressor state along
|
||||
with it, which can use a significant amount of additional
|
||||
memory. For more details, see mkdwarfs(1).
|
||||
- `-o cachesize=`*value*:
|
||||
Size of the block cache, in bytes. You can append suffixes
|
||||
(`k`, `m`, `g`) to specify the size in KiB, MiB and GiB,
|
||||
respectively. Note that this is not the upper memory limit
|
||||
of the process, as there may be blocks in flight that are
|
||||
not stored in the cache. Also, each block that hasn't been
|
||||
fully decompressed yet will carry decompressor state along
|
||||
with it, which can use a significant amount of additional
|
||||
memory. For more details, see mkdwarfs(1).
|
||||
|
||||
* `-o workers=`*value*:
|
||||
Number of worker threads to use for decompressing blocks.
|
||||
If you have a lot of CPUs, increasing this number can help
|
||||
speed up access to files in the filesystem.
|
||||
- `-o workers=`*value*:
|
||||
Number of worker threads to use for decompressing blocks.
|
||||
If you have a lot of CPUs, increasing this number can help
|
||||
speed up access to files in the filesystem.
|
||||
|
||||
* `-o decratio=`*value*:
|
||||
The ratio over which a block is fully decompressed. Blocks
|
||||
are only decompressed partially, so each block has to carry
|
||||
the decompressor state with it until it is fully decompressed.
|
||||
However, if a certain fraction of the block has already been
|
||||
decompressed, it may be beneficial to just decompress the rest
|
||||
and free the decompressor state. This value determines the
|
||||
ratio at which we fully decompress the block rather than
|
||||
keeping a partially decompressed block. A value of 0.8 means
|
||||
that as long as we've decompressed less than 80% of the block,
|
||||
we keep the partially decompressed block, but if we've
|
||||
decompressed more then 80%, we'll fully decompress it.
|
||||
- `-o decratio=`*value*:
|
||||
The ratio over which a block is fully decompressed. Blocks
|
||||
are only decompressed partially, so each block has to carry
|
||||
the decompressor state with it until it is fully decompressed.
|
||||
However, if a certain fraction of the block has already been
|
||||
decompressed, it may be beneficial to just decompress the rest
|
||||
and free the decompressor state. This value determines the
|
||||
ratio at which we fully decompress the block rather than
|
||||
keeping a partially decompressed block. A value of 0.8 means
|
||||
that as long as we've decompressed less than 80% of the block,
|
||||
we keep the partially decompressed block, but if we've
|
||||
decompressed more then 80%, we'll fully decompress it.
|
||||
|
||||
* `-o offset=`*value*|`auto`:
|
||||
Specify the byte offset at which the filesystem is located in
|
||||
the image, or use `auto` to detect the offset automatically.
|
||||
This is only useful for images that have some header located
|
||||
before the actual filesystem data.
|
||||
- `-o offset=`*value*|`auto`:
|
||||
Specify the byte offset at which the filesystem is located in
|
||||
the image, or use `auto` to detect the offset automatically.
|
||||
This is only useful for images that have some header located
|
||||
before the actual filesystem data.
|
||||
|
||||
* `-o mlock=none`|`try`|`must`:
|
||||
Set this to `try` or `must` instead of the default `none` to
|
||||
try or require `mlock()`ing of the file system metadata into
|
||||
memory.
|
||||
- `-o mlock=none`|`try`|`must`:
|
||||
Set this to `try` or `must` instead of the default `none` to
|
||||
try or require `mlock()`ing of the file system metadata into
|
||||
memory.
|
||||
|
||||
* `-o enable_nlink`:
|
||||
Set this option if you want correct hardlink counts for regular
|
||||
files. If this is not specified, the hardlink count will be 1.
|
||||
Enabling this will slow down the initialization of the fuse
|
||||
driver as the hardlink counts will be determined by a full
|
||||
file system scan (it only takes about a millisecond to scan
|
||||
through 100,000 files, so this isn't dramatic). The fuse driver
|
||||
will also consume more memory to hold the hardlink count table.
|
||||
This will be 4 bytes for every regular file inode.
|
||||
- `-o enable_nlink`:
|
||||
Set this option if you want correct hardlink counts for regular
|
||||
files. If this is not specified, the hardlink count will be 1.
|
||||
Enabling this will slow down the initialization of the fuse
|
||||
driver as the hardlink counts will be determined by a full
|
||||
file system scan (it only takes about a millisecond to scan
|
||||
through 100,000 files, so this isn't dramatic). The fuse driver
|
||||
will also consume more memory to hold the hardlink count table.
|
||||
This will be 4 bytes for every regular file inode.
|
||||
|
||||
* `-o readonly`:
|
||||
Show all file system entries as read-only. By default, DwarFS
|
||||
will preserve the original writeability, which is obviously a
|
||||
lie as it's a read-only file system. However, this is needed
|
||||
for overlays to work correctly, as otherwise directories are
|
||||
seen as read-only by the overlay and it'll be impossible to
|
||||
create new files even in a writeable overlay. If you don't use
|
||||
overlays and want the file system to reflect its read-only
|
||||
state, you can set this option.
|
||||
- `-o readonly`:
|
||||
Show all file system entries as read-only. By default, DwarFS
|
||||
will preserve the original writeability, which is obviously a
|
||||
lie as it's a read-only file system. However, this is needed
|
||||
for overlays to work correctly, as otherwise directories are
|
||||
seen as read-only by the overlay and it'll be impossible to
|
||||
create new files even in a writeable overlay. If you don't use
|
||||
overlays and want the file system to reflect its read-only
|
||||
state, you can set this option.
|
||||
|
||||
* `-o (no_)cache_image`:
|
||||
By default, `dwarfs` tries to ensure that the compressed file
|
||||
system image will not be cached by the kernel (i.e. the default
|
||||
is `-o no_cache_image`). This will reduce the memory consumption
|
||||
of the FUSE driver to slightly more than the `cachesize`, plus
|
||||
the size of the metadata block. This usually isn't a problem,
|
||||
especially when the image is stored on an SSD, but if you want
|
||||
to maximize performance it can be beneficial to use
|
||||
`-o cache_image` to keep the compressed image data in the kernel
|
||||
cache.
|
||||
- `-o (no_)cache_image`:
|
||||
By default, `dwarfs` tries to ensure that the compressed file
|
||||
system image will not be cached by the kernel (i.e. the default
|
||||
is `-o no_cache_image`). This will reduce the memory consumption
|
||||
of the FUSE driver to slightly more than the `cachesize`, plus
|
||||
the size of the metadata block. This usually isn't a problem,
|
||||
especially when the image is stored on an SSD, but if you want
|
||||
to maximize performance it can be beneficial to use
|
||||
`-o cache_image` to keep the compressed image data in the kernel
|
||||
cache.
|
||||
|
||||
* `-o (no_)cache_files`:
|
||||
By default, files in the mounted file system will be cached by
|
||||
the kernel (i.e. the default is `-o cache_files`). This will
|
||||
significantly improve performance when accessing the same files
|
||||
over and over again, especially if the data from these files has
|
||||
been (partially) evicted from the block cache. By setting the
|
||||
`-o no_cache_files` option, you can force the fuse driver to not
|
||||
use the kernel cache for file data. If you're short on memory and
|
||||
only infrequently accessing files, this can be worth trying, even
|
||||
though it's likely that the kernel will already do the right thing
|
||||
even when the cache is enabled.
|
||||
- `-o (no_)cache_files`:
|
||||
By default, files in the mounted file system will be cached by
|
||||
the kernel (i.e. the default is `-o cache_files`). This will
|
||||
significantly improve performance when accessing the same files
|
||||
over and over again, especially if the data from these files has
|
||||
been (partially) evicted from the block cache. By setting the
|
||||
`-o no_cache_files` option, you can force the fuse driver to not
|
||||
use the kernel cache for file data. If you're short on memory and
|
||||
only infrequently accessing files, this can be worth trying, even
|
||||
though it's likely that the kernel will already do the right thing
|
||||
even when the cache is enabled.
|
||||
|
||||
* `-o debuglevel=`*name*:
|
||||
Use this for different levels of verbosity along with either
|
||||
the `-f` or `-d` FUSE options. This can give you some insight
|
||||
over what the file system driver is doing internally, but it's
|
||||
mainly meant for debugging and the `debug` and `trace` levels
|
||||
in particular will slow down the driver.
|
||||
- `-o debuglevel=`*name*:
|
||||
Use this for different levels of verbosity along with either
|
||||
the `-f` or `-d` FUSE options. This can give you some insight
|
||||
over what the file system driver is doing internally, but it's
|
||||
mainly meant for debugging and the `debug` and `trace` levels
|
||||
in particular will slow down the driver.
|
||||
|
||||
* `-o tidy_strategy=`*name*:
|
||||
- `-o tidy_strategy=`*name*:
|
||||
Use one of the following strategies to tidy the block cache:
|
||||
|
||||
- `none`:
|
||||
@ -128,14 +129,14 @@ options:
|
||||
cache is traversed and all blocks that have been fully or
|
||||
partially swapped out by the kernel will be removed.
|
||||
|
||||
* `-o tidy_interval=`*time*:
|
||||
- `-o tidy_interval=`*time*:
|
||||
Used only if `tidy_strategy` is not `none`. This is the interval
|
||||
at which the cache tidying thread wakes up to look for blocks
|
||||
that can be removed from the cache. This must be an integer value.
|
||||
Suffixes `ms`, `s`, `m`, `h` are supported. If no suffix is given,
|
||||
the value will be assumed to be in seconds.
|
||||
|
||||
* `-o tidy_max_age=`*time*:
|
||||
- `-o tidy_max_age=`*time*:
|
||||
Used only if `tidy_strategy` is `time`. A block will be removed
|
||||
from the cache if it hasn't been used for this time span. This must
|
||||
be an integer value. Suffixes `ms`, `s`, `m`, `h` are supported.
|
||||
@ -145,14 +146,14 @@ There's two particular FUSE options that you'll likely need at some
|
||||
point, e.g. when trying to set up an `overlayfs` mount on top of
|
||||
a DwarFS image:
|
||||
|
||||
* `-o allow_root` and `-o allow_other`:
|
||||
These will ensure that the mounted file system can be read by
|
||||
either `root` or any other user in addition to the user that
|
||||
started the fuse driver. So if you're running `dwarfs` as a
|
||||
non-privileged user, you want to `-o allow_root` in case `root`
|
||||
needs access, for example when you're trying to use `overlayfs`
|
||||
along with `dwarfs`. If you're running `dwarfs` as `root`, you
|
||||
need `allow_other`.
|
||||
- `-o allow_root` and `-o allow_other`:
|
||||
These will ensure that the mounted file system can be read by
|
||||
either `root` or any other user in addition to the user that
|
||||
started the fuse driver. So if you're running `dwarfs` as a
|
||||
non-privileged user, you want to `-o allow_root` in case `root`
|
||||
needs access, for example when you're trying to use `overlayfs`
|
||||
along with `dwarfs`. If you're running `dwarfs` as `root`, you
|
||||
need `allow_other`.
|
||||
|
||||
## TIPS & TRICKS
|
||||
|
||||
@ -193,28 +194,34 @@ set of Perl versions back.
|
||||
|
||||
Here's what you need to do:
|
||||
|
||||
* Create a set of directories. In my case, these are all located
|
||||
in `/tmp/perl` as this was the orginal install location.
|
||||
- Create a set of directories. In my case, these are all located
|
||||
in `/tmp/perl` as this was the orginal install location.
|
||||
|
||||
cd /tmp/perl
|
||||
mkdir install-ro
|
||||
mkdir install-rw
|
||||
mkdir install-work
|
||||
mkdir install
|
||||
```
|
||||
cd /tmp/perl
|
||||
mkdir install-ro
|
||||
mkdir install-rw
|
||||
mkdir install-work
|
||||
mkdir install
|
||||
```
|
||||
|
||||
* Mount the DwarFS image. `-o allow_root` is needed to make sure
|
||||
`overlayfs` has access to the mounted file system. In order
|
||||
to use `-o allow_root`, you may have to uncomment or add
|
||||
`user_allow_other` in `/etc/fuse.conf`.
|
||||
- Mount the DwarFS image. `-o allow_root` is needed to make sure
|
||||
`overlayfs` has access to the mounted file system. In order
|
||||
to use `-o allow_root`, you may have to uncomment or add
|
||||
`user_allow_other` in `/etc/fuse.conf`.
|
||||
|
||||
dwarfs perl-install.dwarfs install-ro -o allow_root
|
||||
```
|
||||
dwarfs perl-install.dwarfs install-ro -o allow_root
|
||||
```
|
||||
|
||||
* Now set up `overlayfs`.
|
||||
- Now set up `overlayfs`.
|
||||
|
||||
sudo mount -t overlay overlay -o lowerdir=install-ro,upperdir=install-rw,workdir=install-work install
|
||||
```
|
||||
sudo mount -t overlay overlay -o lowerdir=install-ro,upperdir=install-rw,workdir=install-work install
|
||||
```
|
||||
|
||||
* That's it. You should now be able to access a writeable version
|
||||
of your DwarFS image in `install`.
|
||||
- That's it. You should now be able to access a writeable version
|
||||
of your DwarFS image in `install`.
|
||||
|
||||
You can go even further than that. Say you have different sets of
|
||||
modules that you regularly want to layer on top of the base DwarFS
|
||||
@ -223,7 +230,9 @@ the read-write directory after unmounting the `overlayfs`, and
|
||||
selectively add this by passing a colon-separated list to the
|
||||
`lowerdir` option when setting up the `overlayfs` mount:
|
||||
|
||||
sudo mount -t overlay overlay -o lowerdir=install-ro:install-modules install
|
||||
```
|
||||
sudo mount -t overlay overlay -o lowerdir=install-ro:install-modules install
|
||||
```
|
||||
|
||||
If you want *this* merged overlay to be writable, just add in the
|
||||
`upperdir` and `workdir` options from before again.
|
||||
|
@ -1,5 +1,4 @@
|
||||
dwarfsck(1) -- check DwarFS image
|
||||
=================================
|
||||
# dwarfsck(1) -- check DwarFS image
|
||||
|
||||
## SYNOPSIS
|
||||
|
||||
@ -15,43 +14,43 @@ with a non-zero exit code.
|
||||
|
||||
## OPTIONS
|
||||
|
||||
* `-i`, `--input=`*file*:
|
||||
Path to the filesystem image.
|
||||
- `-i`, `--input=`*file*:
|
||||
Path to the filesystem image.
|
||||
|
||||
* `-d`, `--detail=`*value*:
|
||||
Level of filesystem information detail. The default is 2. Higher values
|
||||
mean more output. Values larger than 6 will currently not provide any
|
||||
further detail.
|
||||
- `-d`, `--detail=`*value*:
|
||||
Level of filesystem information detail. The default is 2. Higher values
|
||||
mean more output. Values larger than 6 will currently not provide any
|
||||
further detail.
|
||||
|
||||
* `-O`, `--image-offset=`*value*|`auto`:
|
||||
Specify the byte offset at which the filesystem is located in the image.
|
||||
Use `auto` to detect the offset automatically. This is also the default.
|
||||
This is only useful for images that have some header located before the
|
||||
actual filesystem data.
|
||||
- `-O`, `--image-offset=`*value*|`auto`:
|
||||
Specify the byte offset at which the filesystem is located in the image.
|
||||
Use `auto` to detect the offset automatically. This is also the default.
|
||||
This is only useful for images that have some header located before the
|
||||
actual filesystem data.
|
||||
|
||||
* `-H`, `--print-header`:
|
||||
Print the header located before the filesystem image to stdout. If no
|
||||
header is present, the program will exit with a non-zero exit code.
|
||||
- `-H`, `--print-header`:
|
||||
Print the header located before the filesystem image to stdout. If no
|
||||
header is present, the program will exit with a non-zero exit code.
|
||||
|
||||
* `-n`, `--num-workers=`*value*:
|
||||
Number of worker threads used for integrity checking.
|
||||
- `-n`, `--num-workers=`*value*:
|
||||
Number of worker threads used for integrity checking.
|
||||
|
||||
* `--check-integrity`:
|
||||
In addition to performing a fast checksum check, also perform a (much
|
||||
slower) verification of the embedded SHA-512/256 hashes.
|
||||
- `--check-integrity`:
|
||||
In addition to performing a fast checksum check, also perform a (much
|
||||
slower) verification of the embedded SHA-512/256 hashes.
|
||||
|
||||
* `--json`:
|
||||
Print a simple JSON representation of the filesystem metadata. Please
|
||||
note that the format is *not* stable.
|
||||
- `--json`:
|
||||
Print a simple JSON representation of the filesystem metadata. Please
|
||||
note that the format is *not* stable.
|
||||
|
||||
* `--export-metadata=`*file*:
|
||||
Export all filesystem meteadata in JSON format.
|
||||
- `--export-metadata=`*file*:
|
||||
Export all filesystem meteadata in JSON format.
|
||||
|
||||
* `--log-level=`*name*:
|
||||
Specifiy a logging level.
|
||||
- `--log-level=`*name*:
|
||||
Specifiy a logging level.
|
||||
|
||||
* `--help`:
|
||||
Show program help, including option defaults.
|
||||
- `--help`:
|
||||
Show program help, including option defaults.
|
||||
|
||||
## AUTHOR
|
||||
|
||||
|
@ -1,9 +1,8 @@
|
||||
dwarfsextract(1) -- extract DwarFS image
|
||||
========================================
|
||||
# dwarfsextract(1) -- extract DwarFS image
|
||||
|
||||
## SYNOPSIS
|
||||
|
||||
`dwarfsextract` `-i` *image* [`-o` *dir*] [*options*...]<br>
|
||||
`dwarfsextract` `-i` *image* [`-o` *dir*] [*options*...]
|
||||
`dwarfsextract` `-i` *image* -f *format* [`-o` *file*] [*options*...]
|
||||
|
||||
## DESCRIPTION
|
||||
@ -35,44 +34,44 @@ to disk:
|
||||
|
||||
## OPTIONS
|
||||
|
||||
* `-i`, `--input=`*file*:
|
||||
Path to the source filesystem.
|
||||
- `-i`, `--input=`*file*:
|
||||
Path to the source filesystem.
|
||||
|
||||
* `-o`, `--output=`*directory*|*file*:
|
||||
If no format is specified, this is the directory to which the contents
|
||||
of the filesystem should be extracted. If a format is specified, this
|
||||
is the name of the output archive. This option can be omitted, in which
|
||||
case the default is to extract the files to the current directory, or
|
||||
to write the archive data to stdout.
|
||||
- `-o`, `--output=`*directory*|*file*:
|
||||
If no format is specified, this is the directory to which the contents
|
||||
of the filesystem should be extracted. If a format is specified, this
|
||||
is the name of the output archive. This option can be omitted, in which
|
||||
case the default is to extract the files to the current directory, or
|
||||
to write the archive data to stdout.
|
||||
|
||||
* `-O`, `--image-offset=`*value*|`auto`:
|
||||
Specify the byte offset at which the filesystem is located in the image.
|
||||
Use `auto` to detect the offset automatically. This is also the default.
|
||||
This is only useful for images that have some header located before the
|
||||
actual filesystem data.
|
||||
- `-O`, `--image-offset=`*value*|`auto`:
|
||||
Specify the byte offset at which the filesystem is located in the image.
|
||||
Use `auto` to detect the offset automatically. This is also the default.
|
||||
This is only useful for images that have some header located before the
|
||||
actual filesystem data.
|
||||
|
||||
* `-f`, `--format=`*format*:
|
||||
The archive format to produce. If this is left empty or unspecified,
|
||||
files will be extracted to the output directory (or the current directory
|
||||
if no output directory is specified). For a full list of supported formats,
|
||||
see libarchive-formats(5).
|
||||
- `-f`, `--format=`*format*:
|
||||
The archive format to produce. If this is left empty or unspecified,
|
||||
files will be extracted to the output directory (or the current directory
|
||||
if no output directory is specified). For a full list of supported formats,
|
||||
see libarchive-formats(5).
|
||||
|
||||
* `-n`, `--num-workers=`*value*:
|
||||
Number of worker threads used for extracting the filesystem.
|
||||
- `-n`, `--num-workers=`*value*:
|
||||
Number of worker threads used for extracting the filesystem.
|
||||
|
||||
* `-s`, `--cache-size=`*value*:
|
||||
Size of the block cache, in bytes. You can append suffixes (`k`, `m`, `g`)
|
||||
to specify the size in KiB, MiB and GiB, respectively. Note that this is
|
||||
not the upper memory limit of the process, as there may be blocks in
|
||||
flight that are not stored in the cache. Also, each block that hasn't been
|
||||
fully decompressed yet will carry decompressor state along with it, which
|
||||
can use a significant amount of additional memory.
|
||||
- `-s`, `--cache-size=`*value*:
|
||||
Size of the block cache, in bytes. You can append suffixes (`k`, `m`, `g`)
|
||||
to specify the size in KiB, MiB and GiB, respectively. Note that this is
|
||||
not the upper memory limit of the process, as there may be blocks in
|
||||
flight that are not stored in the cache. Also, each block that hasn't been
|
||||
fully decompressed yet will carry decompressor state along with it, which
|
||||
can use a significant amount of additional memory.
|
||||
|
||||
* `--log-level=`*name*:
|
||||
Specifiy a logging level.
|
||||
- `--log-level=`*name*:
|
||||
Specifiy a logging level.
|
||||
|
||||
* `--help`:
|
||||
Show program help, including option defaults.
|
||||
- `--help`:
|
||||
Show program help, including option defaults.
|
||||
|
||||
## AUTHOR
|
||||
|
||||
|
578
doc/mkdwarfs.md
578
doc/mkdwarfs.md
@ -1,9 +1,8 @@
|
||||
mkdwarfs(1) -- create highly compressed read-only file systems
|
||||
==============================================================
|
||||
# mkdwarfs(1) -- create highly compressed read-only file systems
|
||||
|
||||
## SYNOPSIS
|
||||
|
||||
`mkdwarfs` `-i` *path* `-o` *file* [*options*...]<br>
|
||||
`mkdwarfs` `-i` *path* `-o` *file* [*options*...]
|
||||
`mkdwarfs` `-i` *file* `-o` *file* `--recompress` [*options*...]
|
||||
|
||||
## DESCRIPTION
|
||||
@ -26,272 +25,272 @@ After that, you can mount it with dwarfs(1):
|
||||
|
||||
There two mandatory options for specifying the input and output:
|
||||
|
||||
* `-i`, `--input=`*path*|*file*:
|
||||
Path to the root directory containing the files from which you want to
|
||||
build a filesystem. If the `--recompress` option is given, this argument
|
||||
is the source filesystem.
|
||||
- `-i`, `--input=`*path*|*file*:
|
||||
Path to the root directory containing the files from which you want to
|
||||
build a filesystem. If the `--recompress` option is given, this argument
|
||||
is the source filesystem.
|
||||
|
||||
* `-o`, `--output=`*file*:
|
||||
File name of the output filesystem.
|
||||
- `-o`, `--output=`*file*:
|
||||
File name of the output filesystem.
|
||||
|
||||
Most other options are concerned with compression tuning:
|
||||
|
||||
* `-l`, `--compress-level=`*value*:
|
||||
Compression level to use for the filesystem. **If you are unsure, please
|
||||
stick to the default level of 7.** This is intended to provide some
|
||||
sensible defaults and will depend on which compression libraries were
|
||||
available at build time. **The default level has been chosen to provide
|
||||
you with the best possible compression while still keeping the file
|
||||
system very fast to access.** Levels 8 and 9 will switch to LZMA
|
||||
compression (when available), which will likely reduce the file system
|
||||
image size, but will make it about an order of magnitude slower to
|
||||
access, so reserve these levels for cases where you only need to access
|
||||
the data infrequently. This `-l` option is meant to be the "easy"
|
||||
interface to configure `mkdwarfs`, and it will actually pick defaults
|
||||
for seven distinct options: `--block-size-bits`, `--compression`,
|
||||
`--schema-compression`, `--metadata-compression`, `--window-size`,
|
||||
`--window-step` and `--order`. See the output of `mkdwarfs --help` for
|
||||
a table listing the exact defaults used for each compression level.
|
||||
- `-l`, `--compress-level=`*value*:
|
||||
Compression level to use for the filesystem. **If you are unsure, please
|
||||
stick to the default level of 7.** This is intended to provide some
|
||||
sensible defaults and will depend on which compression libraries were
|
||||
available at build time. **The default level has been chosen to provide
|
||||
you with the best possible compression while still keeping the file
|
||||
system very fast to access.** Levels 8 and 9 will switch to LZMA
|
||||
compression (when available), which will likely reduce the file system
|
||||
image size, but will make it about an order of magnitude slower to
|
||||
access, so reserve these levels for cases where you only need to access
|
||||
the data infrequently. This `-l` option is meant to be the "easy"
|
||||
interface to configure `mkdwarfs`, and it will actually pick defaults
|
||||
for seven distinct options: `--block-size-bits`, `--compression`,
|
||||
`--schema-compression`, `--metadata-compression`, `--window-size`,
|
||||
`--window-step` and `--order`. See the output of `mkdwarfs --help` for
|
||||
a table listing the exact defaults used for each compression level.
|
||||
|
||||
* `-S`, `--block-size-bits=`*value*:
|
||||
The block size used for the compressed filesystem. The actual block size
|
||||
is two to the power of this value. Larger block sizes will offer better
|
||||
overall compression ratios, but will be slower and consume more memory
|
||||
when actually using the filesystem, as blocks will have to be fully or at
|
||||
least partially decompressed into memory. Values between 20 and 26, i.e.
|
||||
between 1MiB and 64MiB, usually work quite well.
|
||||
- `-S`, `--block-size-bits=`*value*:
|
||||
The block size used for the compressed filesystem. The actual block size
|
||||
is two to the power of this value. Larger block sizes will offer better
|
||||
overall compression ratios, but will be slower and consume more memory
|
||||
when actually using the filesystem, as blocks will have to be fully or at
|
||||
least partially decompressed into memory. Values between 20 and 26, i.e.
|
||||
between 1MiB and 64MiB, usually work quite well.
|
||||
|
||||
* `-N`, `--num-workers=`*value*:
|
||||
Number of worker threads used for building the filesystem. This defaults
|
||||
to the number of processors available on your system. Use this option if
|
||||
you want to limit the resources used by `mkdwarfs`.
|
||||
This option affects both the scanning phase and the compression phase.
|
||||
In the scanning phase, the worker threads are used to scan files in the
|
||||
background as they are discovered. File scanning includes checksumming
|
||||
for de-duplication as well as (optionally) checksumming for similarity
|
||||
computation, depending on the `--order` option. File discovery itself
|
||||
is single-threaded and runs independently from the scanning threads.
|
||||
In the compression phase, the worker threads are used to compress the
|
||||
individual filesystem blocks in the background. Ordering, segmenting
|
||||
and block building are, again, single-threaded and run independently.
|
||||
- `-N`, `--num-workers=`*value*:
|
||||
Number of worker threads used for building the filesystem. This defaults
|
||||
to the number of processors available on your system. Use this option if
|
||||
you want to limit the resources used by `mkdwarfs`.
|
||||
This option affects both the scanning phase and the compression phase.
|
||||
In the scanning phase, the worker threads are used to scan files in the
|
||||
background as they are discovered. File scanning includes checksumming
|
||||
for de-duplication as well as (optionally) checksumming for similarity
|
||||
computation, depending on the `--order` option. File discovery itself
|
||||
is single-threaded and runs independently from the scanning threads.
|
||||
In the compression phase, the worker threads are used to compress the
|
||||
individual filesystem blocks in the background. Ordering, segmenting
|
||||
and block building are, again, single-threaded and run independently.
|
||||
|
||||
* `-B`, `--max-lookback-blocks=`*value*:
|
||||
Specify how many of the most recent blocks to scan for duplicate segments.
|
||||
By default, only the current block will be scanned. The larger this number,
|
||||
the more duplicate segments will likely be found, which may further improve
|
||||
compression. Impact on compression speed is minimal, but this could cause
|
||||
resulting filesystem to be slightly less efficient to use, as single small
|
||||
files can now potentially span multiple filesystem blocks. Passing `-B0`
|
||||
will completely disable duplicate segment search.
|
||||
- `-B`, `--max-lookback-blocks=`*value*:
|
||||
Specify how many of the most recent blocks to scan for duplicate segments.
|
||||
By default, only the current block will be scanned. The larger this number,
|
||||
the more duplicate segments will likely be found, which may further improve
|
||||
compression. Impact on compression speed is minimal, but this could cause
|
||||
resulting filesystem to be slightly less efficient to use, as single small
|
||||
files can now potentially span multiple filesystem blocks. Passing `-B0`
|
||||
will completely disable duplicate segment search.
|
||||
|
||||
* `-W`, `--window-size=`*value*:
|
||||
Window size of cyclic hash used for segmenting. This is again an exponent
|
||||
to a base of two. Cyclic hashes are used by `mkdwarfs` for finding
|
||||
identical segments across multiple files. This is done on top of duplicate
|
||||
file detection. If a reasonable amount of duplicate segments is found,
|
||||
this means less blocks will be used in the filesystem and potentially
|
||||
less memory will be used when accessing the filesystem. It doesn't
|
||||
necessarily mean that the filesystem will be much smaller, as this removes
|
||||
redundany that cannot be exploited by the block compression any longer.
|
||||
But it shouldn't make the resulting filesystem any bigger. This option
|
||||
is used along with `--window-step` to determine how extensive this
|
||||
segment search will be. The smaller the window sizes, the more segments
|
||||
will obviously be found. However, this also means files will become more
|
||||
fragmented and thus the filesystem can be slower to use and metadata
|
||||
size will grow. Passing `-W0` will completely disable duplicate segment
|
||||
search.
|
||||
- `-W`, `--window-size=`*value*:
|
||||
Window size of cyclic hash used for segmenting. This is again an exponent
|
||||
to a base of two. Cyclic hashes are used by `mkdwarfs` for finding
|
||||
identical segments across multiple files. This is done on top of duplicate
|
||||
file detection. If a reasonable amount of duplicate segments is found,
|
||||
this means less blocks will be used in the filesystem and potentially
|
||||
less memory will be used when accessing the filesystem. It doesn't
|
||||
necessarily mean that the filesystem will be much smaller, as this removes
|
||||
redundany that cannot be exploited by the block compression any longer.
|
||||
But it shouldn't make the resulting filesystem any bigger. This option
|
||||
is used along with `--window-step` to determine how extensive this
|
||||
segment search will be. The smaller the window sizes, the more segments
|
||||
will obviously be found. However, this also means files will become more
|
||||
fragmented and thus the filesystem can be slower to use and metadata
|
||||
size will grow. Passing `-W0` will completely disable duplicate segment
|
||||
search.
|
||||
|
||||
* `-w`, `--window-step=`*value*:
|
||||
This option specifies how often cyclic hash values are stored for lookup.
|
||||
It is specified relative to the window size, as a base-2 exponent that
|
||||
divides the window size. To give a concrete example, if `--window-size=16`
|
||||
and `--window-step=1`, then a cyclic hash across 65536 bytes will be stored
|
||||
at every 32768 bytes of input data. If `--window-step=2`, then a hash value
|
||||
will be stored at every 16384 bytes. This means that not every possible
|
||||
65536-byte duplicate segment will be detected, but it is guaranteed that
|
||||
all duplicate segments of (`window_size` + `window_step`) bytes or more
|
||||
will be detected (unless they span across block boundaries, of course).
|
||||
If you use a larger value for this option, the increments become *smaller*,
|
||||
and `mkdwarfs` will be slightly slower and use more memory.
|
||||
- `-w`, `--window-step=`*value*:
|
||||
This option specifies how often cyclic hash values are stored for lookup.
|
||||
It is specified relative to the window size, as a base-2 exponent that
|
||||
divides the window size. To give a concrete example, if `--window-size=16`
|
||||
and `--window-step=1`, then a cyclic hash across 65536 bytes will be stored
|
||||
at every 32768 bytes of input data. If `--window-step=2`, then a hash value
|
||||
will be stored at every 16384 bytes. This means that not every possible
|
||||
65536-byte duplicate segment will be detected, but it is guaranteed that
|
||||
all duplicate segments of (`window_size` + `window_step`) bytes or more
|
||||
will be detected (unless they span across block boundaries, of course).
|
||||
If you use a larger value for this option, the increments become *smaller*,
|
||||
and `mkdwarfs` will be slightly slower and use more memory.
|
||||
|
||||
* `--bloom-filter-size`=*value*:
|
||||
The segmenting algorithm uses a bloom filter to determine quickly if
|
||||
there is *no* match at a given position. This will filter out more than
|
||||
90% of bad matches quickly with the default bloom filter size. The default
|
||||
is pretty much where the sweet spot lies. If you have copious amounts of
|
||||
RAM and CPU power, feel free to increase this by one or two and you *might*
|
||||
be able to see some improvement. If you're tight on memory, then decreasing
|
||||
this will potentially save a few MiBs.
|
||||
- `--bloom-filter-size`=*value*:
|
||||
The segmenting algorithm uses a bloom filter to determine quickly if
|
||||
there is *no* match at a given position. This will filter out more than
|
||||
90% of bad matches quickly with the default bloom filter size. The default
|
||||
is pretty much where the sweet spot lies. If you have copious amounts of
|
||||
RAM and CPU power, feel free to increase this by one or two and you *might*
|
||||
be able to see some improvement. If you're tight on memory, then decreasing
|
||||
this will potentially save a few MiBs.
|
||||
|
||||
* `-L`, `--memory-limit=`*value*:
|
||||
Approximately how much memory you want `mkdwarfs` to use during filesystem
|
||||
creation. Note that currently this will only affect the block manager
|
||||
component, i.e. the number of filesystem blocks that are in flight but
|
||||
haven't been compressed and written to the output file yet. So the memory
|
||||
used by `mkdwarfs` can certainly be larger than this limit, but it's a
|
||||
good option when building large filesystems with expensive compression
|
||||
algorithms. Also note that most memory is likely used by the compression
|
||||
algorithms, so if you're short on memory it might be worth tweaking the
|
||||
compression options.
|
||||
- `-L`, `--memory-limit=`*value*:
|
||||
Approximately how much memory you want `mkdwarfs` to use during filesystem
|
||||
creation. Note that currently this will only affect the block manager
|
||||
component, i.e. the number of filesystem blocks that are in flight but
|
||||
haven't been compressed and written to the output file yet. So the memory
|
||||
used by `mkdwarfs` can certainly be larger than this limit, but it's a
|
||||
good option when building large filesystems with expensive compression
|
||||
algorithms. Also note that most memory is likely used by the compression
|
||||
algorithms, so if you're short on memory it might be worth tweaking the
|
||||
compression options.
|
||||
|
||||
* `-C`, `--compression=`*algorithm*[`:`*algopt*[`=`*value*][`,`...]]:
|
||||
The compression algorithm and configuration used for file system data.
|
||||
The value for this option is a colon-separated list. The first item is
|
||||
the compression algorithm, the remaining item are its options. Options
|
||||
can be either boolean or have a value. For details on which algori`thms
|
||||
and options are available, see the output of `mkdwarfs --help`. `zstd`
|
||||
will give you the best compression while still keeping decompression
|
||||
*very* fast. `lzma` will compress even better, but decompression will
|
||||
be around ten times slower.
|
||||
- `-C`, `--compression=`*algorithm*[`:`*algopt*[`=`*value*][`,`...]]:
|
||||
The compression algorithm and configuration used for file system data.
|
||||
The value for this option is a colon-separated list. The first item is
|
||||
the compression algorithm, the remaining item are its options. Options
|
||||
can be either boolean or have a value. For details on which algorithms
|
||||
and options are available, see the output of `mkdwarfs --help`. `zstd`
|
||||
will give you the best compression while still keeping decompression
|
||||
*very* fast. `lzma` will compress even better, but decompression will
|
||||
be around ten times slower.
|
||||
|
||||
* `--schema-compression=`*algorithm*[`:`*algopt*[`=`*value*][`,`...]]:
|
||||
The compression algorithm and configuration used for the metadata schema.
|
||||
Takes the same arguments as `--compression` above. The schema is *very*
|
||||
small, in the hundreds of bytes, so this is only relevant for extremely
|
||||
small file systems. The default (`zstd`) has shown to give considerably
|
||||
better results than any other algorithms.
|
||||
- `--schema-compression=`*algorithm*[`:`*algopt*[`=`*value*][`,`...]]:
|
||||
The compression algorithm and configuration used for the metadata schema.
|
||||
Takes the same arguments as `--compression` above. The schema is *very*
|
||||
small, in the hundreds of bytes, so this is only relevant for extremely
|
||||
small file systems. The default (`zstd`) has shown to give considerably
|
||||
better results than any other algorithms.
|
||||
|
||||
* `--metadata-compression=`*algorithm*[`:`*algopt*[`=`*value*][`,`...]]:
|
||||
The compression algorithm and configuration used for the metadata.
|
||||
Takes the same arguments as `--compression` above. The metadata has been
|
||||
optimized for very little redundancy and leaving it uncompressed, the
|
||||
default for all levels below 7, has the benefit that it can be mapped
|
||||
to memory and used directly. This improves mount time for large file
|
||||
systems compared to e.g. an lzma compressed metadata block. If you don't
|
||||
care about mount time, you can safely choose `lzma` compression here, as
|
||||
the data will only have to be decompressed once when mounting the image.
|
||||
- `--metadata-compression=`*algorithm*[`:`*algopt*[`=`*value*][`,`...]]:
|
||||
The compression algorithm and configuration used for the metadata.
|
||||
Takes the same arguments as `--compression` above. The metadata has been
|
||||
optimized for very little redundancy and leaving it uncompressed, the
|
||||
default for all levels below 7, has the benefit that it can be mapped
|
||||
to memory and used directly. This improves mount time for large file
|
||||
systems compared to e.g. an lzma compressed metadata block. If you don't
|
||||
care about mount time, you can safely choose `lzma` compression here, as
|
||||
the data will only have to be decompressed once when mounting the image.
|
||||
|
||||
* `--recompress`[`=all`|`=block`|`=metadata`|`=none`]:
|
||||
Take an existing DwarFS file system and recompress it using different
|
||||
compression algorithms. If no argument or `all` is given, all sections
|
||||
in the file system image will be recompressed. Note that *only* the
|
||||
compression algorithms, i.e. the `--compression`, `--schema-compression`
|
||||
and `--metadata-compression` options, have an impact on how the new file
|
||||
system is written. Other options, e.g. `--block-size-bits` or `--order`,
|
||||
have no impact. If `none` is given as an argument, none of the sections
|
||||
will be recompressed, but the file system is still rewritten in the
|
||||
latest file system format. This is an easy way of upgrading an old file
|
||||
system image to a new format. If `block` or `metadata` is given, only
|
||||
the block sections (i.e. the actual file data) or the metadata sections
|
||||
are recompressed. This can be useful if you want to switch from compressed
|
||||
metadata to uncompressed metadata without having to rebuild or recompress
|
||||
all the other data.
|
||||
- `--recompress`[`=all`|`=block`|`=metadata`|`=none`]:
|
||||
Take an existing DwarFS file system and recompress it using different
|
||||
compression algorithms. If no argument or `all` is given, all sections
|
||||
in the file system image will be recompressed. Note that *only* the
|
||||
compression algorithms, i.e. the `--compression`, `--schema-compression`
|
||||
and `--metadata-compression` options, have an impact on how the new file
|
||||
system is written. Other options, e.g. `--block-size-bits` or `--order`,
|
||||
have no impact. If `none` is given as an argument, none of the sections
|
||||
will be recompressed, but the file system is still rewritten in the
|
||||
latest file system format. This is an easy way of upgrading an old file
|
||||
system image to a new format. If `block` or `metadata` is given, only
|
||||
the block sections (i.e. the actual file data) or the metadata sections
|
||||
are recompressed. This can be useful if you want to switch from compressed
|
||||
metadata to uncompressed metadata without having to rebuild or recompress
|
||||
all the other data.
|
||||
|
||||
* `-P`, `--pack-metadata=auto`|`none`|[`all`|`chunk_table`|`directories`|`shared_files`|`names`|`names_index`|`symlinks`|`symlinks_index`|`force`|`plain`[`,`...]]:
|
||||
Which metadata information to store in packed format. This is primarily
|
||||
useful when storing metadata uncompressed, as it allows for smaller
|
||||
metadata block size without having to turn on compression. Keep in mind,
|
||||
though, that *most* of the packed data must be unpacked into memory when
|
||||
reading the file system. If you want a purely memory-mappable metadata
|
||||
block, leave this at the default (`auto`), which will turn on `names` and
|
||||
`symlinks` packing if these actually help save data.
|
||||
Tweaking these options is mostly interesting when dealing with file
|
||||
systems that contain hundreds of thousands of files.
|
||||
See [Metadata Packing](#metadata-packing) for more details.
|
||||
- `-P`, `--pack-metadata=auto`|`none`|[`all`|`chunk_table`|`directories`|`shared_files`|`names`|`names_index`|`symlinks`|`symlinks_index`|`force`|`plain`[`,`...]]:
|
||||
Which metadata information to store in packed format. This is primarily
|
||||
useful when storing metadata uncompressed, as it allows for smaller
|
||||
metadata block size without having to turn on compression. Keep in mind,
|
||||
though, that *most* of the packed data must be unpacked into memory when
|
||||
reading the file system. If you want a purely memory-mappable metadata
|
||||
block, leave this at the default (`auto`), which will turn on `names` and
|
||||
`symlinks` packing if these actually help save data.
|
||||
Tweaking these options is mostly interesting when dealing with file
|
||||
systems that contain hundreds of thousands of files.
|
||||
See [Metadata Packing](#metadata-packing) for more details.
|
||||
|
||||
* `--set-owner=`*uid*:
|
||||
Set the owner for all entities in the file system. This can reduce the
|
||||
size of the file system. If the input only has a single owner already,
|
||||
setting this won't make any difference.
|
||||
- `--set-owner=`*uid*:
|
||||
Set the owner for all entities in the file system. This can reduce the
|
||||
size of the file system. If the input only has a single owner already,
|
||||
setting this won't make any difference.
|
||||
|
||||
* `--set-group=`*gid*:
|
||||
Set the group for all entities in the file system. This can reduce the
|
||||
size of the file system. If the input only has a single group already,
|
||||
setting this won't make any difference.
|
||||
- `--set-group=`*gid*:
|
||||
Set the group for all entities in the file system. This can reduce the
|
||||
size of the file system. If the input only has a single group already,
|
||||
setting this won't make any difference.
|
||||
|
||||
* `--set-time=`*time*|`now`:
|
||||
Set the time stamps for all entities to this value. This can significantly
|
||||
reduce the size of the file system. You can pass either a unix time stamp
|
||||
or `now`.
|
||||
- `--set-time=`*time*|`now`:
|
||||
Set the time stamps for all entities to this value. This can significantly
|
||||
reduce the size of the file system. You can pass either a unix time stamp
|
||||
or `now`.
|
||||
|
||||
* `--keep-all-times`:
|
||||
As of release 0.3.0, by default, `mkdwarfs` will only save the contents of
|
||||
the `mtime` field in order to save metadata space. If you want to save
|
||||
`atime` and `ctime` as well, use this option.
|
||||
- `--keep-all-times`:
|
||||
As of release 0.3.0, by default, `mkdwarfs` will only save the contents of
|
||||
the `mtime` field in order to save metadata space. If you want to save
|
||||
`atime` and `ctime` as well, use this option.
|
||||
|
||||
* `--time-resolution=`*sec*|`sec`|`min`|`hour`|`day`:
|
||||
Specify the resolution with which time stamps are stored. By default,
|
||||
time stamps are stored with second resolution. You can specify "odd"
|
||||
resolutions as well, e.g. something like 15 second resolution is
|
||||
entirely possible. Moving from second to minute resolution, for example,
|
||||
will save roughly 6 bits per file system entry in the metadata block.
|
||||
- `--time-resolution=`*sec*|`sec`|`min`|`hour`|`day`:
|
||||
Specify the resolution with which time stamps are stored. By default,
|
||||
time stamps are stored with second resolution. You can specify "odd"
|
||||
resolutions as well, e.g. something like 15 second resolution is
|
||||
entirely possible. Moving from second to minute resolution, for example,
|
||||
will save roughly 6 bits per file system entry in the metadata block.
|
||||
|
||||
* `--order=none`|`path`|`similarity`|`nilsimsa`[`:`*limit*[`:`*depth*[`:`*mindepth*]]]|`script`:
|
||||
The order in which inodes will be written to the file system. Choosing `none`,
|
||||
the inodes will be stored in the order in which they are discovered. With
|
||||
`path`, they will be sorted asciibetically by path name of the first file
|
||||
representing this inode. With `similarity`, they will be ordered using a
|
||||
simple, yet fast and efficient, similarity hash function. `nilsimsa` ordering
|
||||
uses a more sophisticated similarity function that is typically better than
|
||||
`similarity`, but is significantly slower to compute. However, computation
|
||||
can happen in the background while already building the file system.
|
||||
`nilsimsa` ordering can be further tweaked by specifying a *limit* and
|
||||
*depth*. The *limit* determines how soon an inode is considered similar
|
||||
enough for adding. A *limit* of 255 means "essentially identical", whereas
|
||||
a *limit* of 0 means "not similar at all". The *depth* determines up to
|
||||
how many inodes can be checked at most while searching for a similar one.
|
||||
To avoid `nilsimsa` ordering to become a bottleneck when ordering lots of
|
||||
small files, the *depth* is adjusted dynamically to keep the input queue
|
||||
to the segmentation/compression stages adequately filled. You can specify
|
||||
how much the *depth* can be adjusted by also specifying *mindepth*.
|
||||
The default if you omit these values is a *limit* of 255, a *depth*
|
||||
of 20000 and a *mindepth* of 1000. Note that if you want reproducible
|
||||
results, you need to set *depth* and *mindepth* to the same value. Also
|
||||
note that when you're compressing lots (as in hundreds of thousands) of
|
||||
small files, ordering them by `similarity` instead of `nilsimsa` is likely
|
||||
going to speed things up significantly without impacting compression too much.
|
||||
Last but not least, if scripting support is built into `mkdwarfs`, you can
|
||||
choose `script` to let the script determine the order.
|
||||
- `--order=none`|`path`|`similarity`|`nilsimsa`[`:`*limit*[`:`*depth*[`:`*mindepth*]]]|`script`:
|
||||
The order in which inodes will be written to the file system. Choosing `none`,
|
||||
the inodes will be stored in the order in which they are discovered. With
|
||||
`path`, they will be sorted asciibetically by path name of the first file
|
||||
representing this inode. With `similarity`, they will be ordered using a
|
||||
simple, yet fast and efficient, similarity hash function. `nilsimsa` ordering
|
||||
uses a more sophisticated similarity function that is typically better than
|
||||
`similarity`, but is significantly slower to compute. However, computation
|
||||
can happen in the background while already building the file system.
|
||||
`nilsimsa` ordering can be further tweaked by specifying a *limit* and
|
||||
*depth*. The *limit* determines how soon an inode is considered similar
|
||||
enough for adding. A *limit* of 255 means "essentially identical", whereas
|
||||
a *limit* of 0 means "not similar at all". The *depth* determines up to
|
||||
how many inodes can be checked at most while searching for a similar one.
|
||||
To avoid `nilsimsa` ordering to become a bottleneck when ordering lots of
|
||||
small files, the *depth* is adjusted dynamically to keep the input queue
|
||||
to the segmentation/compression stages adequately filled. You can specify
|
||||
how much the *depth* can be adjusted by also specifying *mindepth*.
|
||||
The default if you omit these values is a *limit* of 255, a *depth*
|
||||
of 20000 and a *mindepth* of 1000. Note that if you want reproducible
|
||||
results, you need to set *depth* and *mindepth* to the same value. Also
|
||||
note that when you're compressing lots (as in hundreds of thousands) of
|
||||
small files, ordering them by `similarity` instead of `nilsimsa` is likely
|
||||
going to speed things up significantly without impacting compression too much.
|
||||
Last but not least, if scripting support is built into `mkdwarfs`, you can
|
||||
choose `script` to let the script determine the order.
|
||||
|
||||
* `--remove-empty-dirs`:
|
||||
Removes all empty directories from the output file system, recursively.
|
||||
This is particularly useful when using scripts that filter out a lot of
|
||||
file system entries.
|
||||
- `--remove-empty-dirs`:
|
||||
Removes all empty directories from the output file system, recursively.
|
||||
This is particularly useful when using scripts that filter out a lot of
|
||||
file system entries.
|
||||
|
||||
* `--with-devices`:
|
||||
Include character and block devices in the output file system. These are
|
||||
not included by default, and due to security measures in FUSE, they will
|
||||
never work in the mounted file system. However, they can still be copied
|
||||
out of the mounted file system, for example using `rsync`.
|
||||
- `--with-devices`:
|
||||
Include character and block devices in the output file system. These are
|
||||
not included by default, and due to security measures in FUSE, they will
|
||||
never work in the mounted file system. However, they can still be copied
|
||||
out of the mounted file system, for example using `rsync`.
|
||||
|
||||
* `--with-specials`:
|
||||
Include named fifos and sockets in the output file system. These are not
|
||||
included by default.
|
||||
- `--with-specials`:
|
||||
Include named fifos and sockets in the output file system. These are not
|
||||
included by default.
|
||||
|
||||
* `--header=`*file*:
|
||||
Read header from file and place it before the output filesystem image.
|
||||
Can be used with `--recompress` to add or replace a header.
|
||||
- `--header=`*file*:
|
||||
Read header from file and place it before the output filesystem image.
|
||||
Can be used with `--recompress` to add or replace a header.
|
||||
|
||||
* `--remove-header`:
|
||||
Remove header from a filesystem image. Only useful with `--recompress`.
|
||||
- `--remove-header`:
|
||||
Remove header from a filesystem image. Only useful with `--recompress`.
|
||||
|
||||
* `--log-level=`*name*:
|
||||
Specifiy a logging level.
|
||||
- `--log-level=`*name*:
|
||||
Specifiy a logging level.
|
||||
|
||||
* `--no-progress`:
|
||||
Don't show progress output while building filesystem.
|
||||
- `--no-progress`:
|
||||
Don't show progress output while building filesystem.
|
||||
|
||||
* `--progress=none`|`simple`|`ascii`|`unicode`:
|
||||
Choosing `none` is equivalent to specifying `--no-progress`. `simple`
|
||||
will print a single line of progress information whenever the progress
|
||||
has significantly changed, but at most once every 2 seconds. This is
|
||||
also the default when the output is not a tty. `unicode` is the default
|
||||
behaviour, which shows a nice progress bar and lots of additional
|
||||
information. If your terminal cannot deal with unicode characters,
|
||||
you can switch to `ascii`, which is like `unicode`, but looks less
|
||||
fancy.
|
||||
- `--progress=none`|`simple`|`ascii`|`unicode`:
|
||||
Choosing `none` is equivalent to specifying `--no-progress`. `simple`
|
||||
will print a single line of progress information whenever the progress
|
||||
has significantly changed, but at most once every 2 seconds. This is
|
||||
also the default when the output is not a tty. `unicode` is the default
|
||||
behaviour, which shows a nice progress bar and lots of additional
|
||||
information. If your terminal cannot deal with unicode characters,
|
||||
you can switch to `ascii`, which is like `unicode`, but looks less
|
||||
fancy.
|
||||
|
||||
* `--help`:
|
||||
Show program help, including defaults, compression level detail and
|
||||
supported compression algorithms.
|
||||
- `--help`:
|
||||
Show program help, including defaults, compression level detail and
|
||||
supported compression algorithms.
|
||||
|
||||
If experimental Python support was compiled into `mkdwarfs`, you can use the
|
||||
following option to enable customizations via the scripting interface:
|
||||
|
||||
* `--script=`*file*[`:`*class*[`(`arguments`...)`]]:
|
||||
Specify the Python script to load. The class name is optional if there's
|
||||
a class named `mkdwarfs` in the script. It is also possible to pass
|
||||
arguments to the constuctor.
|
||||
- `--script=`*file*[`:`*class*[`(`arguments`...)`]]:
|
||||
Specify the Python script to load. The class name is optional if there's
|
||||
a class named `mkdwarfs` in the script. It is also possible to pass
|
||||
arguments to the constuctor.
|
||||
|
||||
## TIPS & TRICKS
|
||||
|
||||
@ -342,70 +341,70 @@ However, there are several options to choose from that allow you to
|
||||
further reduce metadata size without having to compress the metadata.
|
||||
These options are controlled by the `--pack-metadata` option.
|
||||
|
||||
* `auto`:
|
||||
This is the default. It will enable both `names` and `symlinks`.
|
||||
- `auto`:
|
||||
This is the default. It will enable both `names` and `symlinks`.
|
||||
|
||||
* `none`:
|
||||
Don't enable any packing. However, string tables (i.e. names and
|
||||
symlinks) will still be stored in "compact" rather than "plain"
|
||||
format. In order to force storage in plain format, use `plain`.
|
||||
- `none`:
|
||||
Don't enable any packing. However, string tables (i.e. names and
|
||||
symlinks) will still be stored in "compact" rather than "plain"
|
||||
format. In order to force storage in plain format, use `plain`.
|
||||
|
||||
* `all`:
|
||||
Enable all packing options. This does *not* force packing of
|
||||
string tables (i.e. names and symlinks) if the packing would
|
||||
actually increase the size, which can happen if the string tables
|
||||
are actually small. In order to force string table packing, use
|
||||
`all,force`.
|
||||
- `all`:
|
||||
Enable all packing options. This does *not* force packing of
|
||||
string tables (i.e. names and symlinks) if the packing would
|
||||
actually increase the size, which can happen if the string tables
|
||||
are actually small. In order to force string table packing, use
|
||||
`all,force`.
|
||||
|
||||
* `chunk_table`:
|
||||
Delta-compress chunk tables. This can reduce the size of the
|
||||
chunk tables for large file systems and help compression, however,
|
||||
it will likely require a lot of memory when unpacking the tables
|
||||
again. Only use this if you know what you're doing.
|
||||
- `chunk_table`:
|
||||
Delta-compress chunk tables. This can reduce the size of the
|
||||
chunk tables for large file systems and help compression, however,
|
||||
it will likely require a lot of memory when unpacking the tables
|
||||
again. Only use this if you know what you're doing.
|
||||
|
||||
* `directories`:
|
||||
Pack directories table by storing first entry pointers delta-
|
||||
compressed and completely removing parent directory pointers.
|
||||
The parent directory pointers can be rebuilt by tree traversal
|
||||
when the filesystem is loaded. If you have a large number of
|
||||
directories, this can reduce the metadata size, however, it
|
||||
will likely require a lot of memory when unpacking the tables
|
||||
again. Only use this if you know what you're doing.
|
||||
- `directories`:
|
||||
Pack directories table by storing first entry pointers delta-
|
||||
compressed and completely removing parent directory pointers.
|
||||
The parent directory pointers can be rebuilt by tree traversal
|
||||
when the filesystem is loaded. If you have a large number of
|
||||
directories, this can reduce the metadata size, however, it
|
||||
will likely require a lot of memory when unpacking the tables
|
||||
again. Only use this if you know what you're doing.
|
||||
|
||||
* `shared_files`:
|
||||
Pack shared files table. This is only useful if the filesystem
|
||||
contains lots of non-hardlinked duplicates. It gets more efficient
|
||||
the more copies of a file are in the filesystem.
|
||||
- `shared_files`:
|
||||
Pack shared files table. This is only useful if the filesystem
|
||||
contains lots of non-hardlinked duplicates. It gets more efficient
|
||||
the more copies of a file are in the filesystem.
|
||||
|
||||
* `names`,`symlinks`:
|
||||
Compress the names and symlink targets using the
|
||||
[fsst](https://github.com/cwida/fsst) compression scheme. This
|
||||
compresses each individual entry separately using a small,
|
||||
custom symbol table, and it's surprisingly efficient. It is
|
||||
not uncommon for names to make up for 50-70% of the metadata,
|
||||
and fsst compression typically reduces the size by a factor
|
||||
of two. The entries can be decompressed individually, so no
|
||||
extra memory is used when accessing the filesystem (except for
|
||||
the symbol table, which is only a few hundred bytes). This is
|
||||
turned on by default. For small filesystems, it's possible that
|
||||
the compressed strings plus symbol table are actually larger
|
||||
than the uncompressed strings. If this is the case, the strings
|
||||
will be stored uncompressed, unless `force` is also specified.
|
||||
- `names`,`symlinks`:
|
||||
Compress the names and symlink targets using the
|
||||
[fsst](https://github.com/cwida/fsst) compression scheme. This
|
||||
compresses each individual entry separately using a small,
|
||||
custom symbol table, and it's surprisingly efficient. It is
|
||||
not uncommon for names to make up for 50-70% of the metadata,
|
||||
and fsst compression typically reduces the size by a factor
|
||||
of two. The entries can be decompressed individually, so no
|
||||
extra memory is used when accessing the filesystem (except for
|
||||
the symbol table, which is only a few hundred bytes). This is
|
||||
turned on by default. For small filesystems, it's possible that
|
||||
the compressed strings plus symbol table are actually larger
|
||||
than the uncompressed strings. If this is the case, the strings
|
||||
will be stored uncompressed, unless `force` is also specified.
|
||||
|
||||
* `names_index`,`symlinks_index`:
|
||||
Delta-compress the names and symlink targets indices. The same
|
||||
caveats apply as for `chunk_table`.
|
||||
- `names_index`,`symlinks_index`:
|
||||
Delta-compress the names and symlink targets indices. The same
|
||||
caveats apply as for `chunk_table`.
|
||||
|
||||
* `force`:
|
||||
Forces the compression of the `names` and `symlinks` tables,
|
||||
even if that would make them use more memory than the
|
||||
uncompressed tables. This is really only useful for testing
|
||||
and development.
|
||||
- `force`:
|
||||
Forces the compression of the `names` and `symlinks` tables,
|
||||
even if that would make them use more memory than the
|
||||
uncompressed tables. This is really only useful for testing
|
||||
and development.
|
||||
|
||||
* `plain`:
|
||||
Store string tables in "plain" format. The plain format uses
|
||||
Frozen thrift arrays and was used in earlier metadata versions.
|
||||
It is useful for debugging, but wastes up to one byte per string.
|
||||
- `plain`:
|
||||
Store string tables in "plain" format. The plain format uses
|
||||
Frozen thrift arrays and was used in earlier metadata versions.
|
||||
It is useful for debugging, but wastes up to one byte per string.
|
||||
|
||||
To give you an idea of the metadata size using different packing options,
|
||||
here's the size of the metadata block for the Ubuntu 20.04.2.0 Desktop
|
||||
@ -430,7 +429,6 @@ further compress the block. So if you're really desperately trying
|
||||
to reduce the image size, enabling `all` packing would be an option
|
||||
at the cost of using a lot more memory when using the filesystem.
|
||||
|
||||
|
||||
## INTERNAL OPERATION
|
||||
|
||||
Internally, `mkdwarfs` runs in two completely separate phases. The first
|
||||
|
Loading…
x
Reference in New Issue
Block a user