mirror of
https://github.com/mhx/dwarfs.git
synced 2025-09-11 05:23:29 -04:00
Update internal operation section of mkdwarfs manpage
This commit is contained in:
parent
f95844a35d
commit
ee39c3eef7
@ -558,23 +558,32 @@ input directory recursively and builds an internal representation of the
|
|||||||
directory structure. Traversal is breadth-first and single-threaded.
|
directory structure. Traversal is breadth-first and single-threaded.
|
||||||
|
|
||||||
When a regular file is discovered, its hardlink count is checked and
|
When a regular file is discovered, its hardlink count is checked and
|
||||||
if non-zero, its inode is looked up in a hardlink cache. If the inode
|
if greater than one, its inode is looked up in a hardlink cache. Another
|
||||||
has not been scanned yet, a scanning job will be added to a pool of
|
lookup is performed to see if this is the first file/inode of a particular
|
||||||
`--num-workers` worker threads. These will perform a SHA1 checksum scan
|
size. If it's the first file, we just keep track of the file. If it's not
|
||||||
first, which is then used to determine duplicate files, as these will
|
the first file, we add a jobs to a pool of `--num-scanner-workers` worker
|
||||||
share the same data in the final DwarFS image. If a file is found not
|
threads to compute a hash (determined by the the `--file-hash` option)
|
||||||
to be a duplicate, it will now potentially be scanned again (by the
|
of the file. We also add a hash-computing job for the first file. These
|
||||||
same worker threads and using the same memory mapping) to generate a
|
hashes will be used for de-duplicating files. If `--order` is set to one
|
||||||
similarity hash value. This only happens if `--order` is set to one
|
of the similarity order modes, for each unique file, a further job is
|
||||||
of the two similarity order modes.
|
added to the pool to compute a similarity hash. This happens immediately
|
||||||
|
for each inode of a unique size, but it is guaranteed that duplicates
|
||||||
|
don't trigger another similarity hash scan (the implementation for this
|
||||||
|
is indeed a bit tricky).
|
||||||
|
|
||||||
Once all file contents have been scanned by the worker threads, all
|
Once all file contents have been scanned by the worker threads, all
|
||||||
unique files will be assigned an internal inode number.
|
unique files will be assigned an internal inode number.
|
||||||
|
|
||||||
|
This behaviour can be customized. When using `--file-hash=none`,
|
||||||
|
de-duplication is completely disabled. Using `--max-similarity-size`,
|
||||||
|
it is possible to prevent computation of similarity hashes for huge
|
||||||
|
files. These huge files will then be stored separately before all other
|
||||||
|
files in the image.
|
||||||
|
|
||||||
### Building
|
### Building
|
||||||
|
|
||||||
Building the filesystem image uses a number of separate threads. If
|
Building the filesystem image uses a `--num-workers` separate threads.
|
||||||
`nilsimsa` ordering is selected, the ordering algorithm runs in its
|
If `nilsimsa` ordering is selected, the ordering algorithm runs in its
|
||||||
own thread and continuously emits file inodes. These will be picked
|
own thread and continuously emits file inodes. These will be picked
|
||||||
up by the segmenter thread, which scans the inode contents using a
|
up by the segmenter thread, which scans the inode contents using a
|
||||||
cyclic hash and determines overlapping segments between previously
|
cyclic hash and determines overlapping segments between previously
|
||||||
@ -595,6 +604,10 @@ finalized and frozen into a compact representation. If metadata
|
|||||||
compression is enabled, the metadata is sent to the worker thread
|
compression is enabled, the metadata is sent to the worker thread
|
||||||
pool.
|
pool.
|
||||||
|
|
||||||
|
When using different ordering schemes, the file inodes will be
|
||||||
|
either sorted upfront, or just sent to the segmenter in the order
|
||||||
|
in which they were discovered.
|
||||||
|
|
||||||
## AUTHOR
|
## AUTHOR
|
||||||
|
|
||||||
Written by Marcus Holland-Moritz.
|
Written by Marcus Holland-Moritz.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user