mirror of
https://github.com/mhx/dwarfs.git
synced 2025-08-04 10:16:34 -04:00
Markdown cleanup
This commit is contained in:
parent
9ad4dd655f
commit
569966b752
@ -1,11 +1,9 @@
|
|||||||
dwarfs-format(5) -- DwarFS File System Format v2.3
|
# dwarfs-format(5) -- DwarFS File System Format v2.3
|
||||||
==================================================
|
|
||||||
|
|
||||||
## DESCRIPTION
|
## DESCRIPTION
|
||||||
|
|
||||||
This document describes the DwarFS file system format, version 2.3.
|
This document describes the DwarFS file system format, version 2.3.
|
||||||
|
|
||||||
|
|
||||||
## FILE STRUCTURE
|
## FILE STRUCTURE
|
||||||
|
|
||||||
A DwarFS file system image is just a sequence of blocks. Each block has the
|
A DwarFS file system image is just a sequence of blocks. Each block has the
|
||||||
@ -65,25 +63,23 @@ A couple of notes:
|
|||||||
larger than the one it supports. However, a new program will still
|
larger than the one it supports. However, a new program will still
|
||||||
read all file systems with a smaller minor version number.
|
read all file systems with a smaller minor version number.
|
||||||
|
|
||||||
|
|
||||||
### Section Types
|
### Section Types
|
||||||
|
|
||||||
There are currently 3 different section types.
|
There are currently 3 different section types.
|
||||||
|
|
||||||
* `BLOCK` (0):
|
- `BLOCK` (0):
|
||||||
A block of data. This is where all file data is stored. There can be
|
A block of data. This is where all file data is stored. There can be
|
||||||
an arbitrary number of blocks of this type.
|
an arbitrary number of blocks of this type.
|
||||||
|
|
||||||
* `METADATA_V2_SCHEMA` (7):
|
- `METADATA_V2_SCHEMA` (7):
|
||||||
The schema used to layout the `METADATA_V2` block contents. This is
|
The schema used to layout the `METADATA_V2` block contents. This is
|
||||||
stored in "compact" thrift encoding.
|
stored in "compact" thrift encoding.
|
||||||
|
|
||||||
* `METADATA_V2` (8):
|
|
||||||
This section contains the bulk of the metadata. It's essentially just
|
|
||||||
a collection of bit-packed arrays and structures. The exact layout of
|
|
||||||
each list and structure depends on the actual data and is stored
|
|
||||||
separately in `METADATA_V2_SCHEMA`.
|
|
||||||
|
|
||||||
|
- `METADATA_V2` (8):
|
||||||
|
This section contains the bulk of the metadata. It's essentially just
|
||||||
|
a collection of bit-packed arrays and structures. The exact layout of
|
||||||
|
each list and structure depends on the actual data and is stored
|
||||||
|
separately in `METADATA_V2_SCHEMA`.
|
||||||
|
|
||||||
## METADATA FORMAT
|
## METADATA FORMAT
|
||||||
|
|
||||||
@ -169,17 +165,12 @@ list. The index into this list is the `inode_num` from `dir_entries`,
|
|||||||
but you can perform direct lookups based on the inode number as well.
|
but you can perform direct lookups based on the inode number as well.
|
||||||
The `inodes` list is strictly in the following order:
|
The `inodes` list is strictly in the following order:
|
||||||
|
|
||||||
* directory inodes (`S_IFDIR`)
|
- directory inodes (`S_IFDIR`)
|
||||||
|
- symlink inodes (`S_IFLNK`)
|
||||||
* symlink inodes (`S_IFLNK`)
|
- regular *unique* file inodes (`S_IREG`)
|
||||||
|
- regular *shared* file inodes (`S_IREG`)
|
||||||
* regular *unique* file inodes (`S_IREG`)
|
- character/block device inodes (`S_IFCHR`, `S_IFBLK`)
|
||||||
|
- socket/pipe inodes (`S_IFSOCK`, `S_IFIFO`)
|
||||||
* regular *shared* file inodes (`S_IREG`)
|
|
||||||
|
|
||||||
* character/block device inodes (`S_IFCHR`, `S_IFBLK`)
|
|
||||||
|
|
||||||
* socket/pipe inodes (`S_IFSOCK`, `S_IFIFO`)
|
|
||||||
|
|
||||||
The offsets can thus be found by using a binary search with a
|
The offsets can thus be found by using a binary search with a
|
||||||
predicate on the inode more. The shared file offset can be found
|
predicate on the inode more. The shared file offset can be found
|
||||||
|
229
doc/dwarfs.md
229
doc/dwarfs.md
@ -1,5 +1,4 @@
|
|||||||
dwarfs(1) -- mount highly compressed read-only file system
|
# dwarfs(1) -- mount highly compressed read-only file system
|
||||||
==========================================================
|
|
||||||
|
|
||||||
## SYNOPSIS
|
## SYNOPSIS
|
||||||
|
|
||||||
@ -14,103 +13,105 @@ but it has some distinct features.
|
|||||||
Other than that, it's pretty straightforward to use. Once you've created a
|
Other than that, it's pretty straightforward to use. Once you've created a
|
||||||
file system image using mkdwarfs(1), you can mount it with:
|
file system image using mkdwarfs(1), you can mount it with:
|
||||||
|
|
||||||
dwarfs image.dwarfs /path/to/mountpoint
|
```
|
||||||
|
dwarfs image.dwarfs /path/to/mountpoint
|
||||||
|
```
|
||||||
|
|
||||||
## OPTIONS
|
## OPTIONS
|
||||||
|
|
||||||
In addition to the regular FUSE options, `dwarfs` supports the following
|
In addition to the regular FUSE options, `dwarfs` supports the following
|
||||||
options:
|
options:
|
||||||
|
|
||||||
* `-o cachesize=`*value*:
|
- `-o cachesize=`*value*:
|
||||||
Size of the block cache, in bytes. You can append suffixes
|
Size of the block cache, in bytes. You can append suffixes
|
||||||
(`k`, `m`, `g`) to specify the size in KiB, MiB and GiB,
|
(`k`, `m`, `g`) to specify the size in KiB, MiB and GiB,
|
||||||
respectively. Note that this is not the upper memory limit
|
respectively. Note that this is not the upper memory limit
|
||||||
of the process, as there may be blocks in flight that are
|
of the process, as there may be blocks in flight that are
|
||||||
not stored in the cache. Also, each block that hasn't been
|
not stored in the cache. Also, each block that hasn't been
|
||||||
fully decompressed yet will carry decompressor state along
|
fully decompressed yet will carry decompressor state along
|
||||||
with it, which can use a significant amount of additional
|
with it, which can use a significant amount of additional
|
||||||
memory. For more details, see mkdwarfs(1).
|
memory. For more details, see mkdwarfs(1).
|
||||||
|
|
||||||
* `-o workers=`*value*:
|
- `-o workers=`*value*:
|
||||||
Number of worker threads to use for decompressing blocks.
|
Number of worker threads to use for decompressing blocks.
|
||||||
If you have a lot of CPUs, increasing this number can help
|
If you have a lot of CPUs, increasing this number can help
|
||||||
speed up access to files in the filesystem.
|
speed up access to files in the filesystem.
|
||||||
|
|
||||||
* `-o decratio=`*value*:
|
- `-o decratio=`*value*:
|
||||||
The ratio over which a block is fully decompressed. Blocks
|
The ratio over which a block is fully decompressed. Blocks
|
||||||
are only decompressed partially, so each block has to carry
|
are only decompressed partially, so each block has to carry
|
||||||
the decompressor state with it until it is fully decompressed.
|
the decompressor state with it until it is fully decompressed.
|
||||||
However, if a certain fraction of the block has already been
|
However, if a certain fraction of the block has already been
|
||||||
decompressed, it may be beneficial to just decompress the rest
|
decompressed, it may be beneficial to just decompress the rest
|
||||||
and free the decompressor state. This value determines the
|
and free the decompressor state. This value determines the
|
||||||
ratio at which we fully decompress the block rather than
|
ratio at which we fully decompress the block rather than
|
||||||
keeping a partially decompressed block. A value of 0.8 means
|
keeping a partially decompressed block. A value of 0.8 means
|
||||||
that as long as we've decompressed less than 80% of the block,
|
that as long as we've decompressed less than 80% of the block,
|
||||||
we keep the partially decompressed block, but if we've
|
we keep the partially decompressed block, but if we've
|
||||||
decompressed more then 80%, we'll fully decompress it.
|
decompressed more then 80%, we'll fully decompress it.
|
||||||
|
|
||||||
* `-o offset=`*value*|`auto`:
|
- `-o offset=`*value*|`auto`:
|
||||||
Specify the byte offset at which the filesystem is located in
|
Specify the byte offset at which the filesystem is located in
|
||||||
the image, or use `auto` to detect the offset automatically.
|
the image, or use `auto` to detect the offset automatically.
|
||||||
This is only useful for images that have some header located
|
This is only useful for images that have some header located
|
||||||
before the actual filesystem data.
|
before the actual filesystem data.
|
||||||
|
|
||||||
* `-o mlock=none`|`try`|`must`:
|
- `-o mlock=none`|`try`|`must`:
|
||||||
Set this to `try` or `must` instead of the default `none` to
|
Set this to `try` or `must` instead of the default `none` to
|
||||||
try or require `mlock()`ing of the file system metadata into
|
try or require `mlock()`ing of the file system metadata into
|
||||||
memory.
|
memory.
|
||||||
|
|
||||||
* `-o enable_nlink`:
|
- `-o enable_nlink`:
|
||||||
Set this option if you want correct hardlink counts for regular
|
Set this option if you want correct hardlink counts for regular
|
||||||
files. If this is not specified, the hardlink count will be 1.
|
files. If this is not specified, the hardlink count will be 1.
|
||||||
Enabling this will slow down the initialization of the fuse
|
Enabling this will slow down the initialization of the fuse
|
||||||
driver as the hardlink counts will be determined by a full
|
driver as the hardlink counts will be determined by a full
|
||||||
file system scan (it only takes about a millisecond to scan
|
file system scan (it only takes about a millisecond to scan
|
||||||
through 100,000 files, so this isn't dramatic). The fuse driver
|
through 100,000 files, so this isn't dramatic). The fuse driver
|
||||||
will also consume more memory to hold the hardlink count table.
|
will also consume more memory to hold the hardlink count table.
|
||||||
This will be 4 bytes for every regular file inode.
|
This will be 4 bytes for every regular file inode.
|
||||||
|
|
||||||
* `-o readonly`:
|
- `-o readonly`:
|
||||||
Show all file system entries as read-only. By default, DwarFS
|
Show all file system entries as read-only. By default, DwarFS
|
||||||
will preserve the original writeability, which is obviously a
|
will preserve the original writeability, which is obviously a
|
||||||
lie as it's a read-only file system. However, this is needed
|
lie as it's a read-only file system. However, this is needed
|
||||||
for overlays to work correctly, as otherwise directories are
|
for overlays to work correctly, as otherwise directories are
|
||||||
seen as read-only by the overlay and it'll be impossible to
|
seen as read-only by the overlay and it'll be impossible to
|
||||||
create new files even in a writeable overlay. If you don't use
|
create new files even in a writeable overlay. If you don't use
|
||||||
overlays and want the file system to reflect its read-only
|
overlays and want the file system to reflect its read-only
|
||||||
state, you can set this option.
|
state, you can set this option.
|
||||||
|
|
||||||
* `-o (no_)cache_image`:
|
- `-o (no_)cache_image`:
|
||||||
By default, `dwarfs` tries to ensure that the compressed file
|
By default, `dwarfs` tries to ensure that the compressed file
|
||||||
system image will not be cached by the kernel (i.e. the default
|
system image will not be cached by the kernel (i.e. the default
|
||||||
is `-o no_cache_image`). This will reduce the memory consumption
|
is `-o no_cache_image`). This will reduce the memory consumption
|
||||||
of the FUSE driver to slightly more than the `cachesize`, plus
|
of the FUSE driver to slightly more than the `cachesize`, plus
|
||||||
the size of the metadata block. This usually isn't a problem,
|
the size of the metadata block. This usually isn't a problem,
|
||||||
especially when the image is stored on an SSD, but if you want
|
especially when the image is stored on an SSD, but if you want
|
||||||
to maximize performance it can be beneficial to use
|
to maximize performance it can be beneficial to use
|
||||||
`-o cache_image` to keep the compressed image data in the kernel
|
`-o cache_image` to keep the compressed image data in the kernel
|
||||||
cache.
|
cache.
|
||||||
|
|
||||||
* `-o (no_)cache_files`:
|
- `-o (no_)cache_files`:
|
||||||
By default, files in the mounted file system will be cached by
|
By default, files in the mounted file system will be cached by
|
||||||
the kernel (i.e. the default is `-o cache_files`). This will
|
the kernel (i.e. the default is `-o cache_files`). This will
|
||||||
significantly improve performance when accessing the same files
|
significantly improve performance when accessing the same files
|
||||||
over and over again, especially if the data from these files has
|
over and over again, especially if the data from these files has
|
||||||
been (partially) evicted from the block cache. By setting the
|
been (partially) evicted from the block cache. By setting the
|
||||||
`-o no_cache_files` option, you can force the fuse driver to not
|
`-o no_cache_files` option, you can force the fuse driver to not
|
||||||
use the kernel cache for file data. If you're short on memory and
|
use the kernel cache for file data. If you're short on memory and
|
||||||
only infrequently accessing files, this can be worth trying, even
|
only infrequently accessing files, this can be worth trying, even
|
||||||
though it's likely that the kernel will already do the right thing
|
though it's likely that the kernel will already do the right thing
|
||||||
even when the cache is enabled.
|
even when the cache is enabled.
|
||||||
|
|
||||||
* `-o debuglevel=`*name*:
|
- `-o debuglevel=`*name*:
|
||||||
Use this for different levels of verbosity along with either
|
Use this for different levels of verbosity along with either
|
||||||
the `-f` or `-d` FUSE options. This can give you some insight
|
the `-f` or `-d` FUSE options. This can give you some insight
|
||||||
over what the file system driver is doing internally, but it's
|
over what the file system driver is doing internally, but it's
|
||||||
mainly meant for debugging and the `debug` and `trace` levels
|
mainly meant for debugging and the `debug` and `trace` levels
|
||||||
in particular will slow down the driver.
|
in particular will slow down the driver.
|
||||||
|
|
||||||
* `-o tidy_strategy=`*name*:
|
- `-o tidy_strategy=`*name*:
|
||||||
Use one of the following strategies to tidy the block cache:
|
Use one of the following strategies to tidy the block cache:
|
||||||
|
|
||||||
- `none`:
|
- `none`:
|
||||||
@ -128,14 +129,14 @@ options:
|
|||||||
cache is traversed and all blocks that have been fully or
|
cache is traversed and all blocks that have been fully or
|
||||||
partially swapped out by the kernel will be removed.
|
partially swapped out by the kernel will be removed.
|
||||||
|
|
||||||
* `-o tidy_interval=`*time*:
|
- `-o tidy_interval=`*time*:
|
||||||
Used only if `tidy_strategy` is not `none`. This is the interval
|
Used only if `tidy_strategy` is not `none`. This is the interval
|
||||||
at which the cache tidying thread wakes up to look for blocks
|
at which the cache tidying thread wakes up to look for blocks
|
||||||
that can be removed from the cache. This must be an integer value.
|
that can be removed from the cache. This must be an integer value.
|
||||||
Suffixes `ms`, `s`, `m`, `h` are supported. If no suffix is given,
|
Suffixes `ms`, `s`, `m`, `h` are supported. If no suffix is given,
|
||||||
the value will be assumed to be in seconds.
|
the value will be assumed to be in seconds.
|
||||||
|
|
||||||
* `-o tidy_max_age=`*time*:
|
- `-o tidy_max_age=`*time*:
|
||||||
Used only if `tidy_strategy` is `time`. A block will be removed
|
Used only if `tidy_strategy` is `time`. A block will be removed
|
||||||
from the cache if it hasn't been used for this time span. This must
|
from the cache if it hasn't been used for this time span. This must
|
||||||
be an integer value. Suffixes `ms`, `s`, `m`, `h` are supported.
|
be an integer value. Suffixes `ms`, `s`, `m`, `h` are supported.
|
||||||
@ -145,14 +146,14 @@ There's two particular FUSE options that you'll likely need at some
|
|||||||
point, e.g. when trying to set up an `overlayfs` mount on top of
|
point, e.g. when trying to set up an `overlayfs` mount on top of
|
||||||
a DwarFS image:
|
a DwarFS image:
|
||||||
|
|
||||||
* `-o allow_root` and `-o allow_other`:
|
- `-o allow_root` and `-o allow_other`:
|
||||||
These will ensure that the mounted file system can be read by
|
These will ensure that the mounted file system can be read by
|
||||||
either `root` or any other user in addition to the user that
|
either `root` or any other user in addition to the user that
|
||||||
started the fuse driver. So if you're running `dwarfs` as a
|
started the fuse driver. So if you're running `dwarfs` as a
|
||||||
non-privileged user, you want to `-o allow_root` in case `root`
|
non-privileged user, you want to `-o allow_root` in case `root`
|
||||||
needs access, for example when you're trying to use `overlayfs`
|
needs access, for example when you're trying to use `overlayfs`
|
||||||
along with `dwarfs`. If you're running `dwarfs` as `root`, you
|
along with `dwarfs`. If you're running `dwarfs` as `root`, you
|
||||||
need `allow_other`.
|
need `allow_other`.
|
||||||
|
|
||||||
## TIPS & TRICKS
|
## TIPS & TRICKS
|
||||||
|
|
||||||
@ -193,28 +194,34 @@ set of Perl versions back.
|
|||||||
|
|
||||||
Here's what you need to do:
|
Here's what you need to do:
|
||||||
|
|
||||||
* Create a set of directories. In my case, these are all located
|
- Create a set of directories. In my case, these are all located
|
||||||
in `/tmp/perl` as this was the orginal install location.
|
in `/tmp/perl` as this was the orginal install location.
|
||||||
|
|
||||||
cd /tmp/perl
|
```
|
||||||
mkdir install-ro
|
cd /tmp/perl
|
||||||
mkdir install-rw
|
mkdir install-ro
|
||||||
mkdir install-work
|
mkdir install-rw
|
||||||
mkdir install
|
mkdir install-work
|
||||||
|
mkdir install
|
||||||
|
```
|
||||||
|
|
||||||
* Mount the DwarFS image. `-o allow_root` is needed to make sure
|
- Mount the DwarFS image. `-o allow_root` is needed to make sure
|
||||||
`overlayfs` has access to the mounted file system. In order
|
`overlayfs` has access to the mounted file system. In order
|
||||||
to use `-o allow_root`, you may have to uncomment or add
|
to use `-o allow_root`, you may have to uncomment or add
|
||||||
`user_allow_other` in `/etc/fuse.conf`.
|
`user_allow_other` in `/etc/fuse.conf`.
|
||||||
|
|
||||||
dwarfs perl-install.dwarfs install-ro -o allow_root
|
```
|
||||||
|
dwarfs perl-install.dwarfs install-ro -o allow_root
|
||||||
|
```
|
||||||
|
|
||||||
* Now set up `overlayfs`.
|
- Now set up `overlayfs`.
|
||||||
|
|
||||||
sudo mount -t overlay overlay -o lowerdir=install-ro,upperdir=install-rw,workdir=install-work install
|
```
|
||||||
|
sudo mount -t overlay overlay -o lowerdir=install-ro,upperdir=install-rw,workdir=install-work install
|
||||||
|
```
|
||||||
|
|
||||||
* That's it. You should now be able to access a writeable version
|
- That's it. You should now be able to access a writeable version
|
||||||
of your DwarFS image in `install`.
|
of your DwarFS image in `install`.
|
||||||
|
|
||||||
You can go even further than that. Say you have different sets of
|
You can go even further than that. Say you have different sets of
|
||||||
modules that you regularly want to layer on top of the base DwarFS
|
modules that you regularly want to layer on top of the base DwarFS
|
||||||
@ -223,7 +230,9 @@ the read-write directory after unmounting the `overlayfs`, and
|
|||||||
selectively add this by passing a colon-separated list to the
|
selectively add this by passing a colon-separated list to the
|
||||||
`lowerdir` option when setting up the `overlayfs` mount:
|
`lowerdir` option when setting up the `overlayfs` mount:
|
||||||
|
|
||||||
sudo mount -t overlay overlay -o lowerdir=install-ro:install-modules install
|
```
|
||||||
|
sudo mount -t overlay overlay -o lowerdir=install-ro:install-modules install
|
||||||
|
```
|
||||||
|
|
||||||
If you want *this* merged overlay to be writable, just add in the
|
If you want *this* merged overlay to be writable, just add in the
|
||||||
`upperdir` and `workdir` options from before again.
|
`upperdir` and `workdir` options from before again.
|
||||||
|
@ -1,5 +1,4 @@
|
|||||||
dwarfsck(1) -- check DwarFS image
|
# dwarfsck(1) -- check DwarFS image
|
||||||
=================================
|
|
||||||
|
|
||||||
## SYNOPSIS
|
## SYNOPSIS
|
||||||
|
|
||||||
@ -15,43 +14,43 @@ with a non-zero exit code.
|
|||||||
|
|
||||||
## OPTIONS
|
## OPTIONS
|
||||||
|
|
||||||
* `-i`, `--input=`*file*:
|
- `-i`, `--input=`*file*:
|
||||||
Path to the filesystem image.
|
Path to the filesystem image.
|
||||||
|
|
||||||
* `-d`, `--detail=`*value*:
|
- `-d`, `--detail=`*value*:
|
||||||
Level of filesystem information detail. The default is 2. Higher values
|
Level of filesystem information detail. The default is 2. Higher values
|
||||||
mean more output. Values larger than 6 will currently not provide any
|
mean more output. Values larger than 6 will currently not provide any
|
||||||
further detail.
|
further detail.
|
||||||
|
|
||||||
* `-O`, `--image-offset=`*value*|`auto`:
|
- `-O`, `--image-offset=`*value*|`auto`:
|
||||||
Specify the byte offset at which the filesystem is located in the image.
|
Specify the byte offset at which the filesystem is located in the image.
|
||||||
Use `auto` to detect the offset automatically. This is also the default.
|
Use `auto` to detect the offset automatically. This is also the default.
|
||||||
This is only useful for images that have some header located before the
|
This is only useful for images that have some header located before the
|
||||||
actual filesystem data.
|
actual filesystem data.
|
||||||
|
|
||||||
* `-H`, `--print-header`:
|
- `-H`, `--print-header`:
|
||||||
Print the header located before the filesystem image to stdout. If no
|
Print the header located before the filesystem image to stdout. If no
|
||||||
header is present, the program will exit with a non-zero exit code.
|
header is present, the program will exit with a non-zero exit code.
|
||||||
|
|
||||||
* `-n`, `--num-workers=`*value*:
|
- `-n`, `--num-workers=`*value*:
|
||||||
Number of worker threads used for integrity checking.
|
Number of worker threads used for integrity checking.
|
||||||
|
|
||||||
* `--check-integrity`:
|
- `--check-integrity`:
|
||||||
In addition to performing a fast checksum check, also perform a (much
|
In addition to performing a fast checksum check, also perform a (much
|
||||||
slower) verification of the embedded SHA-512/256 hashes.
|
slower) verification of the embedded SHA-512/256 hashes.
|
||||||
|
|
||||||
* `--json`:
|
- `--json`:
|
||||||
Print a simple JSON representation of the filesystem metadata. Please
|
Print a simple JSON representation of the filesystem metadata. Please
|
||||||
note that the format is *not* stable.
|
note that the format is *not* stable.
|
||||||
|
|
||||||
* `--export-metadata=`*file*:
|
- `--export-metadata=`*file*:
|
||||||
Export all filesystem meteadata in JSON format.
|
Export all filesystem meteadata in JSON format.
|
||||||
|
|
||||||
* `--log-level=`*name*:
|
- `--log-level=`*name*:
|
||||||
Specifiy a logging level.
|
Specifiy a logging level.
|
||||||
|
|
||||||
* `--help`:
|
- `--help`:
|
||||||
Show program help, including option defaults.
|
Show program help, including option defaults.
|
||||||
|
|
||||||
## AUTHOR
|
## AUTHOR
|
||||||
|
|
||||||
|
@ -1,9 +1,8 @@
|
|||||||
dwarfsextract(1) -- extract DwarFS image
|
# dwarfsextract(1) -- extract DwarFS image
|
||||||
========================================
|
|
||||||
|
|
||||||
## SYNOPSIS
|
## SYNOPSIS
|
||||||
|
|
||||||
`dwarfsextract` `-i` *image* [`-o` *dir*] [*options*...]<br>
|
`dwarfsextract` `-i` *image* [`-o` *dir*] [*options*...]
|
||||||
`dwarfsextract` `-i` *image* -f *format* [`-o` *file*] [*options*...]
|
`dwarfsextract` `-i` *image* -f *format* [`-o` *file*] [*options*...]
|
||||||
|
|
||||||
## DESCRIPTION
|
## DESCRIPTION
|
||||||
@ -35,44 +34,44 @@ to disk:
|
|||||||
|
|
||||||
## OPTIONS
|
## OPTIONS
|
||||||
|
|
||||||
* `-i`, `--input=`*file*:
|
- `-i`, `--input=`*file*:
|
||||||
Path to the source filesystem.
|
Path to the source filesystem.
|
||||||
|
|
||||||
* `-o`, `--output=`*directory*|*file*:
|
- `-o`, `--output=`*directory*|*file*:
|
||||||
If no format is specified, this is the directory to which the contents
|
If no format is specified, this is the directory to which the contents
|
||||||
of the filesystem should be extracted. If a format is specified, this
|
of the filesystem should be extracted. If a format is specified, this
|
||||||
is the name of the output archive. This option can be omitted, in which
|
is the name of the output archive. This option can be omitted, in which
|
||||||
case the default is to extract the files to the current directory, or
|
case the default is to extract the files to the current directory, or
|
||||||
to write the archive data to stdout.
|
to write the archive data to stdout.
|
||||||
|
|
||||||
* `-O`, `--image-offset=`*value*|`auto`:
|
- `-O`, `--image-offset=`*value*|`auto`:
|
||||||
Specify the byte offset at which the filesystem is located in the image.
|
Specify the byte offset at which the filesystem is located in the image.
|
||||||
Use `auto` to detect the offset automatically. This is also the default.
|
Use `auto` to detect the offset automatically. This is also the default.
|
||||||
This is only useful for images that have some header located before the
|
This is only useful for images that have some header located before the
|
||||||
actual filesystem data.
|
actual filesystem data.
|
||||||
|
|
||||||
* `-f`, `--format=`*format*:
|
- `-f`, `--format=`*format*:
|
||||||
The archive format to produce. If this is left empty or unspecified,
|
The archive format to produce. If this is left empty or unspecified,
|
||||||
files will be extracted to the output directory (or the current directory
|
files will be extracted to the output directory (or the current directory
|
||||||
if no output directory is specified). For a full list of supported formats,
|
if no output directory is specified). For a full list of supported formats,
|
||||||
see libarchive-formats(5).
|
see libarchive-formats(5).
|
||||||
|
|
||||||
* `-n`, `--num-workers=`*value*:
|
- `-n`, `--num-workers=`*value*:
|
||||||
Number of worker threads used for extracting the filesystem.
|
Number of worker threads used for extracting the filesystem.
|
||||||
|
|
||||||
* `-s`, `--cache-size=`*value*:
|
- `-s`, `--cache-size=`*value*:
|
||||||
Size of the block cache, in bytes. You can append suffixes (`k`, `m`, `g`)
|
Size of the block cache, in bytes. You can append suffixes (`k`, `m`, `g`)
|
||||||
to specify the size in KiB, MiB and GiB, respectively. Note that this is
|
to specify the size in KiB, MiB and GiB, respectively. Note that this is
|
||||||
not the upper memory limit of the process, as there may be blocks in
|
not the upper memory limit of the process, as there may be blocks in
|
||||||
flight that are not stored in the cache. Also, each block that hasn't been
|
flight that are not stored in the cache. Also, each block that hasn't been
|
||||||
fully decompressed yet will carry decompressor state along with it, which
|
fully decompressed yet will carry decompressor state along with it, which
|
||||||
can use a significant amount of additional memory.
|
can use a significant amount of additional memory.
|
||||||
|
|
||||||
* `--log-level=`*name*:
|
- `--log-level=`*name*:
|
||||||
Specifiy a logging level.
|
Specifiy a logging level.
|
||||||
|
|
||||||
* `--help`:
|
- `--help`:
|
||||||
Show program help, including option defaults.
|
Show program help, including option defaults.
|
||||||
|
|
||||||
## AUTHOR
|
## AUTHOR
|
||||||
|
|
||||||
|
578
doc/mkdwarfs.md
578
doc/mkdwarfs.md
@ -1,9 +1,8 @@
|
|||||||
mkdwarfs(1) -- create highly compressed read-only file systems
|
# mkdwarfs(1) -- create highly compressed read-only file systems
|
||||||
==============================================================
|
|
||||||
|
|
||||||
## SYNOPSIS
|
## SYNOPSIS
|
||||||
|
|
||||||
`mkdwarfs` `-i` *path* `-o` *file* [*options*...]<br>
|
`mkdwarfs` `-i` *path* `-o` *file* [*options*...]
|
||||||
`mkdwarfs` `-i` *file* `-o` *file* `--recompress` [*options*...]
|
`mkdwarfs` `-i` *file* `-o` *file* `--recompress` [*options*...]
|
||||||
|
|
||||||
## DESCRIPTION
|
## DESCRIPTION
|
||||||
@ -26,272 +25,272 @@ After that, you can mount it with dwarfs(1):
|
|||||||
|
|
||||||
There two mandatory options for specifying the input and output:
|
There two mandatory options for specifying the input and output:
|
||||||
|
|
||||||
* `-i`, `--input=`*path*|*file*:
|
- `-i`, `--input=`*path*|*file*:
|
||||||
Path to the root directory containing the files from which you want to
|
Path to the root directory containing the files from which you want to
|
||||||
build a filesystem. If the `--recompress` option is given, this argument
|
build a filesystem. If the `--recompress` option is given, this argument
|
||||||
is the source filesystem.
|
is the source filesystem.
|
||||||
|
|
||||||
* `-o`, `--output=`*file*:
|
- `-o`, `--output=`*file*:
|
||||||
File name of the output filesystem.
|
File name of the output filesystem.
|
||||||
|
|
||||||
Most other options are concerned with compression tuning:
|
Most other options are concerned with compression tuning:
|
||||||
|
|
||||||
* `-l`, `--compress-level=`*value*:
|
- `-l`, `--compress-level=`*value*:
|
||||||
Compression level to use for the filesystem. **If you are unsure, please
|
Compression level to use for the filesystem. **If you are unsure, please
|
||||||
stick to the default level of 7.** This is intended to provide some
|
stick to the default level of 7.** This is intended to provide some
|
||||||
sensible defaults and will depend on which compression libraries were
|
sensible defaults and will depend on which compression libraries were
|
||||||
available at build time. **The default level has been chosen to provide
|
available at build time. **The default level has been chosen to provide
|
||||||
you with the best possible compression while still keeping the file
|
you with the best possible compression while still keeping the file
|
||||||
system very fast to access.** Levels 8 and 9 will switch to LZMA
|
system very fast to access.** Levels 8 and 9 will switch to LZMA
|
||||||
compression (when available), which will likely reduce the file system
|
compression (when available), which will likely reduce the file system
|
||||||
image size, but will make it about an order of magnitude slower to
|
image size, but will make it about an order of magnitude slower to
|
||||||
access, so reserve these levels for cases where you only need to access
|
access, so reserve these levels for cases where you only need to access
|
||||||
the data infrequently. This `-l` option is meant to be the "easy"
|
the data infrequently. This `-l` option is meant to be the "easy"
|
||||||
interface to configure `mkdwarfs`, and it will actually pick defaults
|
interface to configure `mkdwarfs`, and it will actually pick defaults
|
||||||
for seven distinct options: `--block-size-bits`, `--compression`,
|
for seven distinct options: `--block-size-bits`, `--compression`,
|
||||||
`--schema-compression`, `--metadata-compression`, `--window-size`,
|
`--schema-compression`, `--metadata-compression`, `--window-size`,
|
||||||
`--window-step` and `--order`. See the output of `mkdwarfs --help` for
|
`--window-step` and `--order`. See the output of `mkdwarfs --help` for
|
||||||
a table listing the exact defaults used for each compression level.
|
a table listing the exact defaults used for each compression level.
|
||||||
|
|
||||||
* `-S`, `--block-size-bits=`*value*:
|
- `-S`, `--block-size-bits=`*value*:
|
||||||
The block size used for the compressed filesystem. The actual block size
|
The block size used for the compressed filesystem. The actual block size
|
||||||
is two to the power of this value. Larger block sizes will offer better
|
is two to the power of this value. Larger block sizes will offer better
|
||||||
overall compression ratios, but will be slower and consume more memory
|
overall compression ratios, but will be slower and consume more memory
|
||||||
when actually using the filesystem, as blocks will have to be fully or at
|
when actually using the filesystem, as blocks will have to be fully or at
|
||||||
least partially decompressed into memory. Values between 20 and 26, i.e.
|
least partially decompressed into memory. Values between 20 and 26, i.e.
|
||||||
between 1MiB and 64MiB, usually work quite well.
|
between 1MiB and 64MiB, usually work quite well.
|
||||||
|
|
||||||
* `-N`, `--num-workers=`*value*:
|
- `-N`, `--num-workers=`*value*:
|
||||||
Number of worker threads used for building the filesystem. This defaults
|
Number of worker threads used for building the filesystem. This defaults
|
||||||
to the number of processors available on your system. Use this option if
|
to the number of processors available on your system. Use this option if
|
||||||
you want to limit the resources used by `mkdwarfs`.
|
you want to limit the resources used by `mkdwarfs`.
|
||||||
This option affects both the scanning phase and the compression phase.
|
This option affects both the scanning phase and the compression phase.
|
||||||
In the scanning phase, the worker threads are used to scan files in the
|
In the scanning phase, the worker threads are used to scan files in the
|
||||||
background as they are discovered. File scanning includes checksumming
|
background as they are discovered. File scanning includes checksumming
|
||||||
for de-duplication as well as (optionally) checksumming for similarity
|
for de-duplication as well as (optionally) checksumming for similarity
|
||||||
computation, depending on the `--order` option. File discovery itself
|
computation, depending on the `--order` option. File discovery itself
|
||||||
is single-threaded and runs independently from the scanning threads.
|
is single-threaded and runs independently from the scanning threads.
|
||||||
In the compression phase, the worker threads are used to compress the
|
In the compression phase, the worker threads are used to compress the
|
||||||
individual filesystem blocks in the background. Ordering, segmenting
|
individual filesystem blocks in the background. Ordering, segmenting
|
||||||
and block building are, again, single-threaded and run independently.
|
and block building are, again, single-threaded and run independently.
|
||||||
|
|
||||||
* `-B`, `--max-lookback-blocks=`*value*:
|
- `-B`, `--max-lookback-blocks=`*value*:
|
||||||
Specify how many of the most recent blocks to scan for duplicate segments.
|
Specify how many of the most recent blocks to scan for duplicate segments.
|
||||||
By default, only the current block will be scanned. The larger this number,
|
By default, only the current block will be scanned. The larger this number,
|
||||||
the more duplicate segments will likely be found, which may further improve
|
the more duplicate segments will likely be found, which may further improve
|
||||||
compression. Impact on compression speed is minimal, but this could cause
|
compression. Impact on compression speed is minimal, but this could cause
|
||||||
resulting filesystem to be slightly less efficient to use, as single small
|
resulting filesystem to be slightly less efficient to use, as single small
|
||||||
files can now potentially span multiple filesystem blocks. Passing `-B0`
|
files can now potentially span multiple filesystem blocks. Passing `-B0`
|
||||||
will completely disable duplicate segment search.
|
will completely disable duplicate segment search.
|
||||||
|
|
||||||
* `-W`, `--window-size=`*value*:
|
- `-W`, `--window-size=`*value*:
|
||||||
Window size of cyclic hash used for segmenting. This is again an exponent
|
Window size of cyclic hash used for segmenting. This is again an exponent
|
||||||
to a base of two. Cyclic hashes are used by `mkdwarfs` for finding
|
to a base of two. Cyclic hashes are used by `mkdwarfs` for finding
|
||||||
identical segments across multiple files. This is done on top of duplicate
|
identical segments across multiple files. This is done on top of duplicate
|
||||||
file detection. If a reasonable amount of duplicate segments is found,
|
file detection. If a reasonable amount of duplicate segments is found,
|
||||||
this means less blocks will be used in the filesystem and potentially
|
this means less blocks will be used in the filesystem and potentially
|
||||||
less memory will be used when accessing the filesystem. It doesn't
|
less memory will be used when accessing the filesystem. It doesn't
|
||||||
necessarily mean that the filesystem will be much smaller, as this removes
|
necessarily mean that the filesystem will be much smaller, as this removes
|
||||||
redundany that cannot be exploited by the block compression any longer.
|
redundany that cannot be exploited by the block compression any longer.
|
||||||
But it shouldn't make the resulting filesystem any bigger. This option
|
But it shouldn't make the resulting filesystem any bigger. This option
|
||||||
is used along with `--window-step` to determine how extensive this
|
is used along with `--window-step` to determine how extensive this
|
||||||
segment search will be. The smaller the window sizes, the more segments
|
segment search will be. The smaller the window sizes, the more segments
|
||||||
will obviously be found. However, this also means files will become more
|
will obviously be found. However, this also means files will become more
|
||||||
fragmented and thus the filesystem can be slower to use and metadata
|
fragmented and thus the filesystem can be slower to use and metadata
|
||||||
size will grow. Passing `-W0` will completely disable duplicate segment
|
size will grow. Passing `-W0` will completely disable duplicate segment
|
||||||
search.
|
search.
|
||||||
|
|
||||||
* `-w`, `--window-step=`*value*:
|
- `-w`, `--window-step=`*value*:
|
||||||
This option specifies how often cyclic hash values are stored for lookup.
|
This option specifies how often cyclic hash values are stored for lookup.
|
||||||
It is specified relative to the window size, as a base-2 exponent that
|
It is specified relative to the window size, as a base-2 exponent that
|
||||||
divides the window size. To give a concrete example, if `--window-size=16`
|
divides the window size. To give a concrete example, if `--window-size=16`
|
||||||
and `--window-step=1`, then a cyclic hash across 65536 bytes will be stored
|
and `--window-step=1`, then a cyclic hash across 65536 bytes will be stored
|
||||||
at every 32768 bytes of input data. If `--window-step=2`, then a hash value
|
at every 32768 bytes of input data. If `--window-step=2`, then a hash value
|
||||||
will be stored at every 16384 bytes. This means that not every possible
|
will be stored at every 16384 bytes. This means that not every possible
|
||||||
65536-byte duplicate segment will be detected, but it is guaranteed that
|
65536-byte duplicate segment will be detected, but it is guaranteed that
|
||||||
all duplicate segments of (`window_size` + `window_step`) bytes or more
|
all duplicate segments of (`window_size` + `window_step`) bytes or more
|
||||||
will be detected (unless they span across block boundaries, of course).
|
will be detected (unless they span across block boundaries, of course).
|
||||||
If you use a larger value for this option, the increments become *smaller*,
|
If you use a larger value for this option, the increments become *smaller*,
|
||||||
and `mkdwarfs` will be slightly slower and use more memory.
|
and `mkdwarfs` will be slightly slower and use more memory.
|
||||||
|
|
||||||
* `--bloom-filter-size`=*value*:
|
- `--bloom-filter-size`=*value*:
|
||||||
The segmenting algorithm uses a bloom filter to determine quickly if
|
The segmenting algorithm uses a bloom filter to determine quickly if
|
||||||
there is *no* match at a given position. This will filter out more than
|
there is *no* match at a given position. This will filter out more than
|
||||||
90% of bad matches quickly with the default bloom filter size. The default
|
90% of bad matches quickly with the default bloom filter size. The default
|
||||||
is pretty much where the sweet spot lies. If you have copious amounts of
|
is pretty much where the sweet spot lies. If you have copious amounts of
|
||||||
RAM and CPU power, feel free to increase this by one or two and you *might*
|
RAM and CPU power, feel free to increase this by one or two and you *might*
|
||||||
be able to see some improvement. If you're tight on memory, then decreasing
|
be able to see some improvement. If you're tight on memory, then decreasing
|
||||||
this will potentially save a few MiBs.
|
this will potentially save a few MiBs.
|
||||||
|
|
||||||
* `-L`, `--memory-limit=`*value*:
|
- `-L`, `--memory-limit=`*value*:
|
||||||
Approximately how much memory you want `mkdwarfs` to use during filesystem
|
Approximately how much memory you want `mkdwarfs` to use during filesystem
|
||||||
creation. Note that currently this will only affect the block manager
|
creation. Note that currently this will only affect the block manager
|
||||||
component, i.e. the number of filesystem blocks that are in flight but
|
component, i.e. the number of filesystem blocks that are in flight but
|
||||||
haven't been compressed and written to the output file yet. So the memory
|
haven't been compressed and written to the output file yet. So the memory
|
||||||
used by `mkdwarfs` can certainly be larger than this limit, but it's a
|
used by `mkdwarfs` can certainly be larger than this limit, but it's a
|
||||||
good option when building large filesystems with expensive compression
|
good option when building large filesystems with expensive compression
|
||||||
algorithms. Also note that most memory is likely used by the compression
|
algorithms. Also note that most memory is likely used by the compression
|
||||||
algorithms, so if you're short on memory it might be worth tweaking the
|
algorithms, so if you're short on memory it might be worth tweaking the
|
||||||
compression options.
|
compression options.
|
||||||
|
|
||||||
* `-C`, `--compression=`*algorithm*[`:`*algopt*[`=`*value*][`,`...]]:
|
- `-C`, `--compression=`*algorithm*[`:`*algopt*[`=`*value*][`,`...]]:
|
||||||
The compression algorithm and configuration used for file system data.
|
The compression algorithm and configuration used for file system data.
|
||||||
The value for this option is a colon-separated list. The first item is
|
The value for this option is a colon-separated list. The first item is
|
||||||
the compression algorithm, the remaining item are its options. Options
|
the compression algorithm, the remaining item are its options. Options
|
||||||
can be either boolean or have a value. For details on which algori`thms
|
can be either boolean or have a value. For details on which algorithms
|
||||||
and options are available, see the output of `mkdwarfs --help`. `zstd`
|
and options are available, see the output of `mkdwarfs --help`. `zstd`
|
||||||
will give you the best compression while still keeping decompression
|
will give you the best compression while still keeping decompression
|
||||||
*very* fast. `lzma` will compress even better, but decompression will
|
*very* fast. `lzma` will compress even better, but decompression will
|
||||||
be around ten times slower.
|
be around ten times slower.
|
||||||
|
|
||||||
* `--schema-compression=`*algorithm*[`:`*algopt*[`=`*value*][`,`...]]:
|
- `--schema-compression=`*algorithm*[`:`*algopt*[`=`*value*][`,`...]]:
|
||||||
The compression algorithm and configuration used for the metadata schema.
|
The compression algorithm and configuration used for the metadata schema.
|
||||||
Takes the same arguments as `--compression` above. The schema is *very*
|
Takes the same arguments as `--compression` above. The schema is *very*
|
||||||
small, in the hundreds of bytes, so this is only relevant for extremely
|
small, in the hundreds of bytes, so this is only relevant for extremely
|
||||||
small file systems. The default (`zstd`) has shown to give considerably
|
small file systems. The default (`zstd`) has shown to give considerably
|
||||||
better results than any other algorithms.
|
better results than any other algorithms.
|
||||||
|
|
||||||
* `--metadata-compression=`*algorithm*[`:`*algopt*[`=`*value*][`,`...]]:
|
- `--metadata-compression=`*algorithm*[`:`*algopt*[`=`*value*][`,`...]]:
|
||||||
The compression algorithm and configuration used for the metadata.
|
The compression algorithm and configuration used for the metadata.
|
||||||
Takes the same arguments as `--compression` above. The metadata has been
|
Takes the same arguments as `--compression` above. The metadata has been
|
||||||
optimized for very little redundancy and leaving it uncompressed, the
|
optimized for very little redundancy and leaving it uncompressed, the
|
||||||
default for all levels below 7, has the benefit that it can be mapped
|
default for all levels below 7, has the benefit that it can be mapped
|
||||||
to memory and used directly. This improves mount time for large file
|
to memory and used directly. This improves mount time for large file
|
||||||
systems compared to e.g. an lzma compressed metadata block. If you don't
|
systems compared to e.g. an lzma compressed metadata block. If you don't
|
||||||
care about mount time, you can safely choose `lzma` compression here, as
|
care about mount time, you can safely choose `lzma` compression here, as
|
||||||
the data will only have to be decompressed once when mounting the image.
|
the data will only have to be decompressed once when mounting the image.
|
||||||
|
|
||||||
* `--recompress`[`=all`|`=block`|`=metadata`|`=none`]:
|
- `--recompress`[`=all`|`=block`|`=metadata`|`=none`]:
|
||||||
Take an existing DwarFS file system and recompress it using different
|
Take an existing DwarFS file system and recompress it using different
|
||||||
compression algorithms. If no argument or `all` is given, all sections
|
compression algorithms. If no argument or `all` is given, all sections
|
||||||
in the file system image will be recompressed. Note that *only* the
|
in the file system image will be recompressed. Note that *only* the
|
||||||
compression algorithms, i.e. the `--compression`, `--schema-compression`
|
compression algorithms, i.e. the `--compression`, `--schema-compression`
|
||||||
and `--metadata-compression` options, have an impact on how the new file
|
and `--metadata-compression` options, have an impact on how the new file
|
||||||
system is written. Other options, e.g. `--block-size-bits` or `--order`,
|
system is written. Other options, e.g. `--block-size-bits` or `--order`,
|
||||||
have no impact. If `none` is given as an argument, none of the sections
|
have no impact. If `none` is given as an argument, none of the sections
|
||||||
will be recompressed, but the file system is still rewritten in the
|
will be recompressed, but the file system is still rewritten in the
|
||||||
latest file system format. This is an easy way of upgrading an old file
|
latest file system format. This is an easy way of upgrading an old file
|
||||||
system image to a new format. If `block` or `metadata` is given, only
|
system image to a new format. If `block` or `metadata` is given, only
|
||||||
the block sections (i.e. the actual file data) or the metadata sections
|
the block sections (i.e. the actual file data) or the metadata sections
|
||||||
are recompressed. This can be useful if you want to switch from compressed
|
are recompressed. This can be useful if you want to switch from compressed
|
||||||
metadata to uncompressed metadata without having to rebuild or recompress
|
metadata to uncompressed metadata without having to rebuild or recompress
|
||||||
all the other data.
|
all the other data.
|
||||||
|
|
||||||
* `-P`, `--pack-metadata=auto`|`none`|[`all`|`chunk_table`|`directories`|`shared_files`|`names`|`names_index`|`symlinks`|`symlinks_index`|`force`|`plain`[`,`...]]:
|
- `-P`, `--pack-metadata=auto`|`none`|[`all`|`chunk_table`|`directories`|`shared_files`|`names`|`names_index`|`symlinks`|`symlinks_index`|`force`|`plain`[`,`...]]:
|
||||||
Which metadata information to store in packed format. This is primarily
|
Which metadata information to store in packed format. This is primarily
|
||||||
useful when storing metadata uncompressed, as it allows for smaller
|
useful when storing metadata uncompressed, as it allows for smaller
|
||||||
metadata block size without having to turn on compression. Keep in mind,
|
metadata block size without having to turn on compression. Keep in mind,
|
||||||
though, that *most* of the packed data must be unpacked into memory when
|
though, that *most* of the packed data must be unpacked into memory when
|
||||||
reading the file system. If you want a purely memory-mappable metadata
|
reading the file system. If you want a purely memory-mappable metadata
|
||||||
block, leave this at the default (`auto`), which will turn on `names` and
|
block, leave this at the default (`auto`), which will turn on `names` and
|
||||||
`symlinks` packing if these actually help save data.
|
`symlinks` packing if these actually help save data.
|
||||||
Tweaking these options is mostly interesting when dealing with file
|
Tweaking these options is mostly interesting when dealing with file
|
||||||
systems that contain hundreds of thousands of files.
|
systems that contain hundreds of thousands of files.
|
||||||
See [Metadata Packing](#metadata-packing) for more details.
|
See [Metadata Packing](#metadata-packing) for more details.
|
||||||
|
|
||||||
* `--set-owner=`*uid*:
|
- `--set-owner=`*uid*:
|
||||||
Set the owner for all entities in the file system. This can reduce the
|
Set the owner for all entities in the file system. This can reduce the
|
||||||
size of the file system. If the input only has a single owner already,
|
size of the file system. If the input only has a single owner already,
|
||||||
setting this won't make any difference.
|
setting this won't make any difference.
|
||||||
|
|
||||||
* `--set-group=`*gid*:
|
- `--set-group=`*gid*:
|
||||||
Set the group for all entities in the file system. This can reduce the
|
Set the group for all entities in the file system. This can reduce the
|
||||||
size of the file system. If the input only has a single group already,
|
size of the file system. If the input only has a single group already,
|
||||||
setting this won't make any difference.
|
setting this won't make any difference.
|
||||||
|
|
||||||
* `--set-time=`*time*|`now`:
|
- `--set-time=`*time*|`now`:
|
||||||
Set the time stamps for all entities to this value. This can significantly
|
Set the time stamps for all entities to this value. This can significantly
|
||||||
reduce the size of the file system. You can pass either a unix time stamp
|
reduce the size of the file system. You can pass either a unix time stamp
|
||||||
or `now`.
|
or `now`.
|
||||||
|
|
||||||
* `--keep-all-times`:
|
- `--keep-all-times`:
|
||||||
As of release 0.3.0, by default, `mkdwarfs` will only save the contents of
|
As of release 0.3.0, by default, `mkdwarfs` will only save the contents of
|
||||||
the `mtime` field in order to save metadata space. If you want to save
|
the `mtime` field in order to save metadata space. If you want to save
|
||||||
`atime` and `ctime` as well, use this option.
|
`atime` and `ctime` as well, use this option.
|
||||||
|
|
||||||
* `--time-resolution=`*sec*|`sec`|`min`|`hour`|`day`:
|
- `--time-resolution=`*sec*|`sec`|`min`|`hour`|`day`:
|
||||||
Specify the resolution with which time stamps are stored. By default,
|
Specify the resolution with which time stamps are stored. By default,
|
||||||
time stamps are stored with second resolution. You can specify "odd"
|
time stamps are stored with second resolution. You can specify "odd"
|
||||||
resolutions as well, e.g. something like 15 second resolution is
|
resolutions as well, e.g. something like 15 second resolution is
|
||||||
entirely possible. Moving from second to minute resolution, for example,
|
entirely possible. Moving from second to minute resolution, for example,
|
||||||
will save roughly 6 bits per file system entry in the metadata block.
|
will save roughly 6 bits per file system entry in the metadata block.
|
||||||
|
|
||||||
* `--order=none`|`path`|`similarity`|`nilsimsa`[`:`*limit*[`:`*depth*[`:`*mindepth*]]]|`script`:
|
- `--order=none`|`path`|`similarity`|`nilsimsa`[`:`*limit*[`:`*depth*[`:`*mindepth*]]]|`script`:
|
||||||
The order in which inodes will be written to the file system. Choosing `none`,
|
The order in which inodes will be written to the file system. Choosing `none`,
|
||||||
the inodes will be stored in the order in which they are discovered. With
|
the inodes will be stored in the order in which they are discovered. With
|
||||||
`path`, they will be sorted asciibetically by path name of the first file
|
`path`, they will be sorted asciibetically by path name of the first file
|
||||||
representing this inode. With `similarity`, they will be ordered using a
|
representing this inode. With `similarity`, they will be ordered using a
|
||||||
simple, yet fast and efficient, similarity hash function. `nilsimsa` ordering
|
simple, yet fast and efficient, similarity hash function. `nilsimsa` ordering
|
||||||
uses a more sophisticated similarity function that is typically better than
|
uses a more sophisticated similarity function that is typically better than
|
||||||
`similarity`, but is significantly slower to compute. However, computation
|
`similarity`, but is significantly slower to compute. However, computation
|
||||||
can happen in the background while already building the file system.
|
can happen in the background while already building the file system.
|
||||||
`nilsimsa` ordering can be further tweaked by specifying a *limit* and
|
`nilsimsa` ordering can be further tweaked by specifying a *limit* and
|
||||||
*depth*. The *limit* determines how soon an inode is considered similar
|
*depth*. The *limit* determines how soon an inode is considered similar
|
||||||
enough for adding. A *limit* of 255 means "essentially identical", whereas
|
enough for adding. A *limit* of 255 means "essentially identical", whereas
|
||||||
a *limit* of 0 means "not similar at all". The *depth* determines up to
|
a *limit* of 0 means "not similar at all". The *depth* determines up to
|
||||||
how many inodes can be checked at most while searching for a similar one.
|
how many inodes can be checked at most while searching for a similar one.
|
||||||
To avoid `nilsimsa` ordering to become a bottleneck when ordering lots of
|
To avoid `nilsimsa` ordering to become a bottleneck when ordering lots of
|
||||||
small files, the *depth* is adjusted dynamically to keep the input queue
|
small files, the *depth* is adjusted dynamically to keep the input queue
|
||||||
to the segmentation/compression stages adequately filled. You can specify
|
to the segmentation/compression stages adequately filled. You can specify
|
||||||
how much the *depth* can be adjusted by also specifying *mindepth*.
|
how much the *depth* can be adjusted by also specifying *mindepth*.
|
||||||
The default if you omit these values is a *limit* of 255, a *depth*
|
The default if you omit these values is a *limit* of 255, a *depth*
|
||||||
of 20000 and a *mindepth* of 1000. Note that if you want reproducible
|
of 20000 and a *mindepth* of 1000. Note that if you want reproducible
|
||||||
results, you need to set *depth* and *mindepth* to the same value. Also
|
results, you need to set *depth* and *mindepth* to the same value. Also
|
||||||
note that when you're compressing lots (as in hundreds of thousands) of
|
note that when you're compressing lots (as in hundreds of thousands) of
|
||||||
small files, ordering them by `similarity` instead of `nilsimsa` is likely
|
small files, ordering them by `similarity` instead of `nilsimsa` is likely
|
||||||
going to speed things up significantly without impacting compression too much.
|
going to speed things up significantly without impacting compression too much.
|
||||||
Last but not least, if scripting support is built into `mkdwarfs`, you can
|
Last but not least, if scripting support is built into `mkdwarfs`, you can
|
||||||
choose `script` to let the script determine the order.
|
choose `script` to let the script determine the order.
|
||||||
|
|
||||||
* `--remove-empty-dirs`:
|
- `--remove-empty-dirs`:
|
||||||
Removes all empty directories from the output file system, recursively.
|
Removes all empty directories from the output file system, recursively.
|
||||||
This is particularly useful when using scripts that filter out a lot of
|
This is particularly useful when using scripts that filter out a lot of
|
||||||
file system entries.
|
file system entries.
|
||||||
|
|
||||||
* `--with-devices`:
|
- `--with-devices`:
|
||||||
Include character and block devices in the output file system. These are
|
Include character and block devices in the output file system. These are
|
||||||
not included by default, and due to security measures in FUSE, they will
|
not included by default, and due to security measures in FUSE, they will
|
||||||
never work in the mounted file system. However, they can still be copied
|
never work in the mounted file system. However, they can still be copied
|
||||||
out of the mounted file system, for example using `rsync`.
|
out of the mounted file system, for example using `rsync`.
|
||||||
|
|
||||||
* `--with-specials`:
|
- `--with-specials`:
|
||||||
Include named fifos and sockets in the output file system. These are not
|
Include named fifos and sockets in the output file system. These are not
|
||||||
included by default.
|
included by default.
|
||||||
|
|
||||||
* `--header=`*file*:
|
- `--header=`*file*:
|
||||||
Read header from file and place it before the output filesystem image.
|
Read header from file and place it before the output filesystem image.
|
||||||
Can be used with `--recompress` to add or replace a header.
|
Can be used with `--recompress` to add or replace a header.
|
||||||
|
|
||||||
* `--remove-header`:
|
- `--remove-header`:
|
||||||
Remove header from a filesystem image. Only useful with `--recompress`.
|
Remove header from a filesystem image. Only useful with `--recompress`.
|
||||||
|
|
||||||
* `--log-level=`*name*:
|
- `--log-level=`*name*:
|
||||||
Specifiy a logging level.
|
Specifiy a logging level.
|
||||||
|
|
||||||
* `--no-progress`:
|
- `--no-progress`:
|
||||||
Don't show progress output while building filesystem.
|
Don't show progress output while building filesystem.
|
||||||
|
|
||||||
* `--progress=none`|`simple`|`ascii`|`unicode`:
|
- `--progress=none`|`simple`|`ascii`|`unicode`:
|
||||||
Choosing `none` is equivalent to specifying `--no-progress`. `simple`
|
Choosing `none` is equivalent to specifying `--no-progress`. `simple`
|
||||||
will print a single line of progress information whenever the progress
|
will print a single line of progress information whenever the progress
|
||||||
has significantly changed, but at most once every 2 seconds. This is
|
has significantly changed, but at most once every 2 seconds. This is
|
||||||
also the default when the output is not a tty. `unicode` is the default
|
also the default when the output is not a tty. `unicode` is the default
|
||||||
behaviour, which shows a nice progress bar and lots of additional
|
behaviour, which shows a nice progress bar and lots of additional
|
||||||
information. If your terminal cannot deal with unicode characters,
|
information. If your terminal cannot deal with unicode characters,
|
||||||
you can switch to `ascii`, which is like `unicode`, but looks less
|
you can switch to `ascii`, which is like `unicode`, but looks less
|
||||||
fancy.
|
fancy.
|
||||||
|
|
||||||
* `--help`:
|
- `--help`:
|
||||||
Show program help, including defaults, compression level detail and
|
Show program help, including defaults, compression level detail and
|
||||||
supported compression algorithms.
|
supported compression algorithms.
|
||||||
|
|
||||||
If experimental Python support was compiled into `mkdwarfs`, you can use the
|
If experimental Python support was compiled into `mkdwarfs`, you can use the
|
||||||
following option to enable customizations via the scripting interface:
|
following option to enable customizations via the scripting interface:
|
||||||
|
|
||||||
* `--script=`*file*[`:`*class*[`(`arguments`...)`]]:
|
- `--script=`*file*[`:`*class*[`(`arguments`...)`]]:
|
||||||
Specify the Python script to load. The class name is optional if there's
|
Specify the Python script to load. The class name is optional if there's
|
||||||
a class named `mkdwarfs` in the script. It is also possible to pass
|
a class named `mkdwarfs` in the script. It is also possible to pass
|
||||||
arguments to the constuctor.
|
arguments to the constuctor.
|
||||||
|
|
||||||
## TIPS & TRICKS
|
## TIPS & TRICKS
|
||||||
|
|
||||||
@ -342,70 +341,70 @@ However, there are several options to choose from that allow you to
|
|||||||
further reduce metadata size without having to compress the metadata.
|
further reduce metadata size without having to compress the metadata.
|
||||||
These options are controlled by the `--pack-metadata` option.
|
These options are controlled by the `--pack-metadata` option.
|
||||||
|
|
||||||
* `auto`:
|
- `auto`:
|
||||||
This is the default. It will enable both `names` and `symlinks`.
|
This is the default. It will enable both `names` and `symlinks`.
|
||||||
|
|
||||||
* `none`:
|
- `none`:
|
||||||
Don't enable any packing. However, string tables (i.e. names and
|
Don't enable any packing. However, string tables (i.e. names and
|
||||||
symlinks) will still be stored in "compact" rather than "plain"
|
symlinks) will still be stored in "compact" rather than "plain"
|
||||||
format. In order to force storage in plain format, use `plain`.
|
format. In order to force storage in plain format, use `plain`.
|
||||||
|
|
||||||
* `all`:
|
- `all`:
|
||||||
Enable all packing options. This does *not* force packing of
|
Enable all packing options. This does *not* force packing of
|
||||||
string tables (i.e. names and symlinks) if the packing would
|
string tables (i.e. names and symlinks) if the packing would
|
||||||
actually increase the size, which can happen if the string tables
|
actually increase the size, which can happen if the string tables
|
||||||
are actually small. In order to force string table packing, use
|
are actually small. In order to force string table packing, use
|
||||||
`all,force`.
|
`all,force`.
|
||||||
|
|
||||||
* `chunk_table`:
|
- `chunk_table`:
|
||||||
Delta-compress chunk tables. This can reduce the size of the
|
Delta-compress chunk tables. This can reduce the size of the
|
||||||
chunk tables for large file systems and help compression, however,
|
chunk tables for large file systems and help compression, however,
|
||||||
it will likely require a lot of memory when unpacking the tables
|
it will likely require a lot of memory when unpacking the tables
|
||||||
again. Only use this if you know what you're doing.
|
again. Only use this if you know what you're doing.
|
||||||
|
|
||||||
* `directories`:
|
- `directories`:
|
||||||
Pack directories table by storing first entry pointers delta-
|
Pack directories table by storing first entry pointers delta-
|
||||||
compressed and completely removing parent directory pointers.
|
compressed and completely removing parent directory pointers.
|
||||||
The parent directory pointers can be rebuilt by tree traversal
|
The parent directory pointers can be rebuilt by tree traversal
|
||||||
when the filesystem is loaded. If you have a large number of
|
when the filesystem is loaded. If you have a large number of
|
||||||
directories, this can reduce the metadata size, however, it
|
directories, this can reduce the metadata size, however, it
|
||||||
will likely require a lot of memory when unpacking the tables
|
will likely require a lot of memory when unpacking the tables
|
||||||
again. Only use this if you know what you're doing.
|
again. Only use this if you know what you're doing.
|
||||||
|
|
||||||
* `shared_files`:
|
- `shared_files`:
|
||||||
Pack shared files table. This is only useful if the filesystem
|
Pack shared files table. This is only useful if the filesystem
|
||||||
contains lots of non-hardlinked duplicates. It gets more efficient
|
contains lots of non-hardlinked duplicates. It gets more efficient
|
||||||
the more copies of a file are in the filesystem.
|
the more copies of a file are in the filesystem.
|
||||||
|
|
||||||
* `names`,`symlinks`:
|
- `names`,`symlinks`:
|
||||||
Compress the names and symlink targets using the
|
Compress the names and symlink targets using the
|
||||||
[fsst](https://github.com/cwida/fsst) compression scheme. This
|
[fsst](https://github.com/cwida/fsst) compression scheme. This
|
||||||
compresses each individual entry separately using a small,
|
compresses each individual entry separately using a small,
|
||||||
custom symbol table, and it's surprisingly efficient. It is
|
custom symbol table, and it's surprisingly efficient. It is
|
||||||
not uncommon for names to make up for 50-70% of the metadata,
|
not uncommon for names to make up for 50-70% of the metadata,
|
||||||
and fsst compression typically reduces the size by a factor
|
and fsst compression typically reduces the size by a factor
|
||||||
of two. The entries can be decompressed individually, so no
|
of two. The entries can be decompressed individually, so no
|
||||||
extra memory is used when accessing the filesystem (except for
|
extra memory is used when accessing the filesystem (except for
|
||||||
the symbol table, which is only a few hundred bytes). This is
|
the symbol table, which is only a few hundred bytes). This is
|
||||||
turned on by default. For small filesystems, it's possible that
|
turned on by default. For small filesystems, it's possible that
|
||||||
the compressed strings plus symbol table are actually larger
|
the compressed strings plus symbol table are actually larger
|
||||||
than the uncompressed strings. If this is the case, the strings
|
than the uncompressed strings. If this is the case, the strings
|
||||||
will be stored uncompressed, unless `force` is also specified.
|
will be stored uncompressed, unless `force` is also specified.
|
||||||
|
|
||||||
* `names_index`,`symlinks_index`:
|
- `names_index`,`symlinks_index`:
|
||||||
Delta-compress the names and symlink targets indices. The same
|
Delta-compress the names and symlink targets indices. The same
|
||||||
caveats apply as for `chunk_table`.
|
caveats apply as for `chunk_table`.
|
||||||
|
|
||||||
* `force`:
|
- `force`:
|
||||||
Forces the compression of the `names` and `symlinks` tables,
|
Forces the compression of the `names` and `symlinks` tables,
|
||||||
even if that would make them use more memory than the
|
even if that would make them use more memory than the
|
||||||
uncompressed tables. This is really only useful for testing
|
uncompressed tables. This is really only useful for testing
|
||||||
and development.
|
and development.
|
||||||
|
|
||||||
* `plain`:
|
- `plain`:
|
||||||
Store string tables in "plain" format. The plain format uses
|
Store string tables in "plain" format. The plain format uses
|
||||||
Frozen thrift arrays and was used in earlier metadata versions.
|
Frozen thrift arrays and was used in earlier metadata versions.
|
||||||
It is useful for debugging, but wastes up to one byte per string.
|
It is useful for debugging, but wastes up to one byte per string.
|
||||||
|
|
||||||
To give you an idea of the metadata size using different packing options,
|
To give you an idea of the metadata size using different packing options,
|
||||||
here's the size of the metadata block for the Ubuntu 20.04.2.0 Desktop
|
here's the size of the metadata block for the Ubuntu 20.04.2.0 Desktop
|
||||||
@ -430,7 +429,6 @@ further compress the block. So if you're really desperately trying
|
|||||||
to reduce the image size, enabling `all` packing would be an option
|
to reduce the image size, enabling `all` packing would be an option
|
||||||
at the cost of using a lot more memory when using the filesystem.
|
at the cost of using a lot more memory when using the filesystem.
|
||||||
|
|
||||||
|
|
||||||
## INTERNAL OPERATION
|
## INTERNAL OPERATION
|
||||||
|
|
||||||
Internally, `mkdwarfs` runs in two completely separate phases. The first
|
Internally, `mkdwarfs` runs in two completely separate phases. The first
|
||||||
|
Loading…
x
Reference in New Issue
Block a user