mirror of
https://github.com/mhx/dwarfs.git
synced 2025-09-10 04:50:31 -04:00
Tweak the internal operation documentation
This commit is contained in:
parent
7c1eee8129
commit
8fcb03e8b7
@ -586,27 +586,33 @@ and excluded files without building an actual file system.
|
|||||||
|
|
||||||
Internally, `mkdwarfs` runs in two completely separate phases. The first
|
Internally, `mkdwarfs` runs in two completely separate phases. The first
|
||||||
phase is scanning the input data, the second phase is building the file
|
phase is scanning the input data, the second phase is building the file
|
||||||
system.
|
system. Both phases try to do as little work as possible, and try to run
|
||||||
|
as much of the remaining work as possible in parallel, while still making
|
||||||
|
sure that the file system images produced are reproducible (see the
|
||||||
|
`--order` option documentation for details on reproducible images).
|
||||||
|
|
||||||
### Scanning
|
### Scanning
|
||||||
|
|
||||||
The scanning process is driven by the main thread which traverses the
|
The scanning process is driven by the main thread which traverses the
|
||||||
input directory recursively and builds an internal representation of the
|
input directory recursively and builds an internal representation of the
|
||||||
directory structure. Traversal is breadth-first and single-threaded.
|
directory structure. Traversal is breadth-first and single-threaded.
|
||||||
|
Filter rules as specified by `--filter` are handled immediately during
|
||||||
|
traversal.
|
||||||
|
|
||||||
When a regular file is discovered, its hardlink count is checked and
|
When a regular file is discovered, its hardlink count is checked and
|
||||||
if greater than one, its inode is looked up in a hardlink cache. Another
|
if greater than one, its inode is looked up in a hardlink cache. Another
|
||||||
lookup is performed to see if this is the first file/inode of a particular
|
lookup is performed to see if this is the first file/inode of a particular
|
||||||
size. If it's the first file, we just keep track of the file. If it's not
|
size. If it's the first file, we just keep track of the file. If it's not
|
||||||
the first file, we add a jobs to a pool of `--num-scanner-workers` worker
|
the first file, we add a job to a pool of `--num-scanner-workers` worker
|
||||||
threads to compute a hash (determined by the the `--file-hash` option)
|
threads to compute a hash (which hash function is used is determined by
|
||||||
of the file. We also add a hash-computing job for the first file. These
|
the the `--file-hash` option) of the file. We also add a hash-computing
|
||||||
hashes will be used for de-duplicating files. If `--order` is set to one
|
job for the first file we found with this size earlier. These hashes will
|
||||||
of the similarity order modes, for each unique file, a further job is
|
then be used for de-duplicating files. If `--order` is set to one of the
|
||||||
added to the pool to compute a similarity hash. This happens immediately
|
similarity order modes, for each unique file, a further job is added to
|
||||||
for each inode of a unique size, but it is guaranteed that duplicates
|
the pool to compute a similarity hash. This happens immediately for each
|
||||||
don't trigger another similarity hash scan (the implementation for this
|
inode of a unique size, but it is guaranteed that duplicates don't trigger
|
||||||
is indeed a bit tricky).
|
another similarity hash scan (the implementation for this is actually a bit
|
||||||
|
tricky).
|
||||||
|
|
||||||
Once all file contents have been scanned by the worker threads, all
|
Once all file contents have been scanned by the worker threads, all
|
||||||
unique files will be assigned an internal inode number.
|
unique files will be assigned an internal inode number.
|
||||||
@ -620,11 +626,12 @@ files in the image.
|
|||||||
### Building
|
### Building
|
||||||
|
|
||||||
Building the filesystem image uses a `--num-workers` separate threads.
|
Building the filesystem image uses a `--num-workers` separate threads.
|
||||||
|
|
||||||
If `nilsimsa` ordering is selected, the ordering algorithm runs in its
|
If `nilsimsa` ordering is selected, the ordering algorithm runs in its
|
||||||
own thread and continuously emits file inodes. These will be picked
|
own thread and continuously emits file inodes. These will be picked up
|
||||||
up by the segmenter thread, which scans the inode contents using a
|
by the segmenter thread, which scans the inode contents using a cyclic
|
||||||
cyclic hash and determines overlapping segments between previously
|
hash and determines overlapping segments between previously written
|
||||||
written data and new incoming data. The segmenter can look at up to
|
data and new incoming data. The segmenter will look at up to
|
||||||
`--max-lookback-block` previous filesystem blocks to find overlaps.
|
`--max-lookback-block` previous filesystem blocks to find overlaps.
|
||||||
|
|
||||||
Once the segmenter has produced enough data to fill a filesystem
|
Once the segmenter has produced enough data to fill a filesystem
|
||||||
@ -639,7 +646,7 @@ thread that will ultimately produce the final filesystem image.
|
|||||||
When all data has been segmented, the filesystem metadata is being
|
When all data has been segmented, the filesystem metadata is being
|
||||||
finalized and frozen into a compact representation. If metadata
|
finalized and frozen into a compact representation. If metadata
|
||||||
compression is enabled, the metadata is sent to the worker thread
|
compression is enabled, the metadata is sent to the worker thread
|
||||||
pool.
|
pool for compression.
|
||||||
|
|
||||||
When using different ordering schemes, the file inodes will be
|
When using different ordering schemes, the file inodes will be
|
||||||
either sorted upfront, or just sent to the segmenter in the order
|
either sorted upfront, or just sent to the segmenter in the order
|
||||||
|
Loading…
x
Reference in New Issue
Block a user