mirror of
https://github.com/mhx/dwarfs.git
synced 2025-09-13 06:16:55 -04:00
Update mkdwarfs man page
This commit is contained in:
parent
fcf872e5b5
commit
4c6580b9c7
106
doc/mkdwarfs.md
106
doc/mkdwarfs.md
@ -75,6 +75,54 @@ Most other options are concerned with compression tuning:
|
|||||||
individual filesystem blocks in the background. Ordering, segmenting
|
individual filesystem blocks in the background. Ordering, segmenting
|
||||||
and block building are, again, single-threaded and run independently.
|
and block building are, again, single-threaded and run independently.
|
||||||
|
|
||||||
|
* `-B`, `--max-lookback-blocks=`*value*:
|
||||||
|
Specify how many of the most recent blocks to scan for duplicate segments.
|
||||||
|
By default, only the current block will be scanned. The larger this number,
|
||||||
|
the more duplicate segments will likely be found, which may further improve
|
||||||
|
compression. Impact on compression speed is minimal, but this could cause
|
||||||
|
resulting filesystem to be slightly less efficient to use, as single small
|
||||||
|
files can now potentially span multiple filesystem blocks. Passing `-B0`
|
||||||
|
will completely disable duplicate segment search.
|
||||||
|
|
||||||
|
* `-W`, `--window-size=`*value*:
|
||||||
|
Window size of cyclic hash used for segmenting. This is again an exponent
|
||||||
|
to a base of two. Cyclic hashes are used by `mkdwarfs` for finding
|
||||||
|
identical segments across multiple files. This is done on top of duplicate
|
||||||
|
file detection. If a reasonable amount of duplicate segments is found,
|
||||||
|
this means less blocks will be used in the filesystem and potentially
|
||||||
|
less memory will be used when accessing the filesystem. It doesn't
|
||||||
|
necessarily mean that the filesystem will be much smaller, as this removes
|
||||||
|
redundany that cannot be exploited by the block compression any longer.
|
||||||
|
But it shouldn't make the resulting filesystem any bigger. This option
|
||||||
|
is used along with `--window-step` to determine how extensive this
|
||||||
|
segment search will be. The smaller the window sizes, the more segments
|
||||||
|
will obviously be found. However, this also means files will become more
|
||||||
|
fragmented and thus the filesystem can be slower to use and metadata
|
||||||
|
size will grow. Passing `-W0` will completely disable duplicate segment
|
||||||
|
search.
|
||||||
|
|
||||||
|
* `-w`, `--window-step=`*value*:
|
||||||
|
This option specifies how often cyclic hash values are stored for lookup.
|
||||||
|
It is specified relative to the window size, as a base-2 exponent that
|
||||||
|
divides the window size. To give a concrete example, if `--window-size=16`
|
||||||
|
and `--window-step=1`, then a cyclic hash across 65536 bytes will be stored
|
||||||
|
at every 32768 bytes of input data. If `--window-step=2`, then a hash value
|
||||||
|
will be stored at every 16384 bytes. This means that not every possible
|
||||||
|
65536-byte duplicate segment will be detected, but it is guaranteed that
|
||||||
|
all duplicate segments of (`window_size` + `window_step`) bytes or more
|
||||||
|
will be detected (unless they span across block boundaries, of course).
|
||||||
|
If you use a larger value for this option, the increments become *smaller*,
|
||||||
|
and `mkdwarfs` will be slightly slower and use more memory.
|
||||||
|
|
||||||
|
* `--bloom-filter-size`=*value*:
|
||||||
|
The segmenting algorithm uses a bloom filter to determine quickly if
|
||||||
|
there is *no* match at a given position. This will filter out more than
|
||||||
|
90% of bad matches quickly with the default bloom filter size. The default
|
||||||
|
is pretty much where the sweet spot lies. If you have copious amounts of
|
||||||
|
RAM and CPU power, feel free to increase this by one or two and you *might*
|
||||||
|
be able to see some improvement. If you're tight on memory, then decreasing
|
||||||
|
this will potentially save a few MiBs.
|
||||||
|
|
||||||
* `-L`, `--memory-limit=`*value*:
|
* `-L`, `--memory-limit=`*value*:
|
||||||
Approximately how much memory you want `mkdwarfs` to use during filesystem
|
Approximately how much memory you want `mkdwarfs` to use during filesystem
|
||||||
creation. Note that currently this will only affect the block manager
|
creation. Note that currently this will only affect the block manager
|
||||||
@ -156,6 +204,11 @@ Most other options are concerned with compression tuning:
|
|||||||
reduce the size of the file system. You can pass either a unix time stamp
|
reduce the size of the file system. You can pass either a unix time stamp
|
||||||
or `now`.
|
or `now`.
|
||||||
|
|
||||||
|
* `--keep-all-times`:
|
||||||
|
As of release 0.3.0, by default, `mkdwarfs` will only save the contents of
|
||||||
|
the `mtime` field in order to save metadata space. If you want to save
|
||||||
|
`atime` and `ctime` as well, use this option.
|
||||||
|
|
||||||
* `--time-resolution=`*sec*|`sec`|`min`|`hour`|`day`:
|
* `--time-resolution=`*sec*|`sec`|`min`|`hour`|`day`:
|
||||||
Specify the resolution with which time stamps are stored. By default,
|
Specify the resolution with which time stamps are stored. By default,
|
||||||
time stamps are stored with second resolution. You can specify "odd"
|
time stamps are stored with second resolution. You can specify "odd"
|
||||||
@ -163,11 +216,6 @@ Most other options are concerned with compression tuning:
|
|||||||
entirely possible. Moving from second to minute resolution, for example,
|
entirely possible. Moving from second to minute resolution, for example,
|
||||||
will save roughly 6 bits per file system entry in the metadata block.
|
will save roughly 6 bits per file system entry in the metadata block.
|
||||||
|
|
||||||
* `--keep-all-times`:
|
|
||||||
As of release 0.3.0, by default, `mkdwarfs` will only save the contents of
|
|
||||||
the `mtime` field in order to save metadata space. If you want to save
|
|
||||||
`atime` and `ctime` as well, use this option.
|
|
||||||
|
|
||||||
* `--order=none`|`path`|`similarity`|`nilsimsa`[`:`*limit*[`:`*depth*[`:`*mindepth*]]]|`script`:
|
* `--order=none`|`path`|`similarity`|`nilsimsa`[`:`*limit*[`:`*depth*[`:`*mindepth*]]]|`script`:
|
||||||
The order in which inodes will be written to the file system. Choosing `none`,
|
The order in which inodes will be written to the file system. Choosing `none`,
|
||||||
the inodes will be stored in the order in which they are discovered. With
|
the inodes will be stored in the order in which they are discovered. With
|
||||||
@ -195,54 +243,6 @@ Most other options are concerned with compression tuning:
|
|||||||
Last but not least, if scripting support is built into `mkdwarfs`, you can
|
Last but not least, if scripting support is built into `mkdwarfs`, you can
|
||||||
choose `script` to let the script determine the order.
|
choose `script` to let the script determine the order.
|
||||||
|
|
||||||
* `-W`, `--window-size=`*value*:
|
|
||||||
Window size of cyclic hash used for segmenting. This is again an exponent
|
|
||||||
to a base of two. Cyclic hashes are used by `mkdwarfs` for finding
|
|
||||||
identical segments across multiple files. This is done on top of duplicate
|
|
||||||
file detection. If a reasonable amount of duplicate segments is found,
|
|
||||||
this means less blocks will be used in the filesystem and potentially
|
|
||||||
less memory will be used when accessing the filesystem. It doesn't
|
|
||||||
necessarily mean that the filesystem will be much smaller, as this removes
|
|
||||||
redundany that cannot be exploited by the block compression any longer.
|
|
||||||
But it shouldn't make the resulting filesystem any bigger. This option
|
|
||||||
is used along with `--window-step` to determine how extensive this
|
|
||||||
segment search will be. The smaller the window sizes, the more segments
|
|
||||||
will obviously be found. However, this also means files will become more
|
|
||||||
fragmented and thus the filesystem can be slower to use and metadata
|
|
||||||
size will grow. Passing `-W0` will completely disable duplicate segment
|
|
||||||
search.
|
|
||||||
|
|
||||||
* `--window-step=`*value*:
|
|
||||||
This option specifies how often cyclic hash values are stored for lookup.
|
|
||||||
It is specified relative to the window size, as a base-2 exponent that
|
|
||||||
divides the window size. To give a concrete example, if `--window-size=16`
|
|
||||||
and `--window-step=1`, then a cyclic hash across 65536 bytes will be stored
|
|
||||||
at every 32768 bytes of input data. If `--window-step=2`, then a hash value
|
|
||||||
will be stored at every 16384 bytes. This means that not every possible
|
|
||||||
65536-byte duplicate segment will be detected, but it is guaranteed that
|
|
||||||
all duplicate segments of (`window_size` + `window_step`) bytes or more
|
|
||||||
will be detected (unless they span across block boundaries, of course).
|
|
||||||
If you use a larger value for this option, the increments become *smaller*,
|
|
||||||
and `mkdwarfs` will be slightly slower and use more memory.
|
|
||||||
|
|
||||||
* `-B`, `--max-lookback-blocks=`*value*:
|
|
||||||
Specify how many of the most recent blocks to scan for duplicate segments.
|
|
||||||
By default, only the current block will be scanned. The larger this number,
|
|
||||||
the more duplicate segments will likely be found, which may further improve
|
|
||||||
compression. Impact on compression speed is minimal, but this could cause
|
|
||||||
resulting filesystem to be slightly less efficient to use, as single small
|
|
||||||
files can now potentially span multiple filesystem blocks. Passing `-B0`
|
|
||||||
will completely disable duplicate segment search.
|
|
||||||
|
|
||||||
* `--bloom-filter-size`=*value*:
|
|
||||||
The segmenting algorithm uses a bloom filter to determine quickly if
|
|
||||||
there is *no* match at a given position. This will filter out more than
|
|
||||||
90% of bad matches quickly with the default bloom filter size. The default
|
|
||||||
is pretty much where the sweet spot lies. If you have copious amounts of
|
|
||||||
RAM and CPU power, feel free to increase this by one or two and you *might*
|
|
||||||
be able to see some improvement. If you're tight on memory, then decreasing
|
|
||||||
this will potentially save a few MiBs.
|
|
||||||
|
|
||||||
* `--remove-empty-dirs`:
|
* `--remove-empty-dirs`:
|
||||||
Removes all empty directories from the output file system, recursively.
|
Removes all empty directories from the output file system, recursively.
|
||||||
This is particularly useful when using scripts that filter out a lot of
|
This is particularly useful when using scripts that filter out a lot of
|
||||||
|
Loading…
x
Reference in New Issue
Block a user