From 3a658981f85f6874224d13df15f37a7753385641 Mon Sep 17 00:00:00 2001 From: Marcus Holland-Moritz Date: Sun, 17 Dec 2023 20:28:11 +0100 Subject: [PATCH] chore: update TODO --- TODO | 58 ---------------------------------------------------------- 1 file changed, 58 deletions(-) diff --git a/TODO b/TODO index 295ea205..24cc49b8 100644 --- a/TODO +++ b/TODO @@ -8,46 +8,21 @@ obviously wouldn't be undo-able) - Packaging of libs added via FetchContent -- Remove [ MiB, MiB, MiB ] -- Generic hashing / scanning / categorizing progress? - Re-assemble global bloom filter rather than merging? - Use smaller bloom filters for individual blocks? - Use bigger (non-resettable?) global bloom filter? -- filesystem re-writing with categories :-) - -- let's try and keep forward compatibility for the 0.7 release - when not using new features; the only features relevant are - likely FLAC compression support and "features" support; in - theory, we don't even need to increment the minor version at - all, since unknown compressions will be caught and feature - flags will simply be ignored; maybe it makes sense to have - this mode of compatibility only for the 0.8 releases and in - 0.9 do a hard increment of the minor version; in 0.8, we can - use the old minor version if we don't use FLAC and the new - minor version if we do - - file discovery progress? -- reasonable defaults when `--categorize` is given without - any arguments - - show defaults for categorized options -- scanner / compressor progress contexts? - -- file system rewriting with categories :-) - - take a look at CPU measurements, those for nilsimsa ordering are probably wrong - segmenter tests with different granularities, block sizes, any other options -- configurable number of threads for ordering/segmenting - - - Bloom filters can be wasteful if lookback gets really long. Maybe we can use smaller bloom filters for individual blocks and one or two larger "global" bloom filters? It's going to @@ -74,10 +49,6 @@ in this case. -- Forward compatibility - - - Feature flags (feature strings) - - Wiki with use cases - Perl releases - Videos with shared streams @@ -86,25 +57,6 @@ - Mounting lots of images with shared cache? -- configuration ideas: - - --order FILETYPE::... - -B FILETYPE::... - -W FILETYPE::... - -w FILETYPE::... - -C FILETYPE::... - - -B pcmaudio::64 -W pcmaudio::16 -C pcmaudio::flac:level=8 - -C binary/x86::zstd:filter=x86 - -C mime:application/x-archive::null - - --filetype pcmaudio::mime:audio/x-wav,mime:audio/x-w64 - - - --categorize=pcmaudio,incompressible,binary,libmagic - --libmagic-types=application/x-archive - - - different scenarios for categorized files / chunks: - Video files @@ -122,11 +74,9 @@ This is actually quite easy: - Identify PCM audio files (libmagic?) - - Use libsndfile for parsing - Nilsimsa similarity works surprisingly well - We can potentially switch to larger window size for segmentation and use larger lookback - - Group by format (# of channels, resolution, endian-ness, signedness, sample rate) - Run segmentation as usual - Compress each block using FLAC (hopefully we can configure how much header data and/or seek points etc. gets stored) or maybe even WAVPACK is we don't need perf @@ -183,18 +133,12 @@ would only operate on a few instead of all bloom filters, which could be better from a cache locality pov) -- per-file progress for large files? -- throughput indicator - - similarity size limit to avoid similarity computation for huge files - store files without similarity hash first, sorted descending by size -- allow ordering by *reverse* path - use streaming interface for zstd decompressor - json metadata recovery -- add --chmod, --chown -- add some simple filter rules? - handle sparse files? - try to be more resilient to modifications of the input while creating fs @@ -221,8 +165,6 @@ - readahead? -- remove multiple blockhash window sizes, one is enough apparently? - - window-increment-shift seems silly to configure? - identify blocks that contain mostly binary data and adjust compressor?