Reintroduce --num-scanner-workers

This commit is contained in:
Marcus Holland-Moritz 2022-11-06 10:52:02 +01:00
parent c6a6ed4f8f
commit a14fa38a0d
2 changed files with 22 additions and 8 deletions

View File

@ -66,16 +66,23 @@ Most other options are concerned with compression tuning:
- `-N`, `--num-workers=`*value*:
Number of worker threads used for building the filesystem. This defaults
to the number of processors available on your system. Use this option if
you want to limit the resources used by `mkdwarfs`.
This option affects both the scanning phase and the compression phase.
you want to limit the resources used by `mkdwarfs` or to optimize build
speed. This option affects only the compression phase.
In the compression phase, the worker threads are used to compress the
individual filesystem blocks in the background. Ordering, segmenting
and block building are, again, single-threaded and run independently.
- `--num-scanner-workers=`*value*:
Number of worker threads used for building the filesystem. This defaults
to the number of processors available on your system. Use this option if
you want to limit the resources used by `mkdwarfs` or to optimize build
speed. This option affects only the scanning phase. By default, the same
value is used as for `--num-workers`.
In the scanning phase, the worker threads are used to scan files in the
background as they are discovered. File scanning includes checksumming
for de-duplication as well as (optionally) checksumming for similarity
computation, depending on the `--order` option. File discovery itself
is single-threaded and runs independently from the scanning threads.
In the compression phase, the worker threads are used to compress the
individual filesystem blocks in the background. Ordering, segmenting
and block building are, again, single-threaded and run independently.
- `-B`, `--max-lookback-blocks=`*value*:
Specify how many of the most recent blocks to scan for duplicate segments.

View File

@ -374,7 +374,7 @@ int mkdwarfs(int argc, char** argv) {
time_resolution, order, progress_mode, recompress_opts, pack_metadata,
file_hash_algo, debug_filter;
std::vector<std::string> filter;
size_t num_workers;
size_t num_workers, num_scanner_workers;
bool no_progress = false, remove_header = false, no_section_index = false,
force_overwrite = false;
unsigned level;
@ -421,7 +421,10 @@ int mkdwarfs(int argc, char** argv) {
"block size bits (size = 2^arg bits)")
("num-workers,N",
po::value<size_t>(&num_workers)->default_value(num_cpu),
"number of scanner/writer worker threads")
"number of writer (compression) worker threads")
("num-scanner-workers",
po::value<size_t>(&num_scanner_workers),
"number of scanner (hashing) worker threads")
("max-lookback-blocks,B",
po::value<size_t>(&cfg.max_active_blocks)->default_value(1),
"how many blocks to scan for segments")
@ -715,8 +718,12 @@ int mkdwarfs(int argc, char** argv) {
size_t mem_limit = parse_size_with_unit(memory_limit);
if (!vm.count("num-scanner-workers")) {
num_scanner_workers = num_workers;
}
worker_group wg_compress("compress", num_workers);
worker_group wg_scanner("scanner", num_workers);
worker_group wg_scanner("scanner", num_scanner_workers);
if (vm.count("debug-filter")) {
if (auto it = debug_filter_modes.find(debug_filter);