chore: update TODO

This commit is contained in:
Marcus Holland-Moritz 2023-12-17 20:28:11 +01:00
parent fa6e7f5408
commit 3a658981f8

58
TODO
View File

@ -8,46 +8,21 @@
obviously wouldn't be undo-able)
- Packaging of libs added via FetchContent
- Remove [ MiB, MiB, MiB ]
- Generic hashing / scanning / categorizing progress?
- Re-assemble global bloom filter rather than merging?
- Use smaller bloom filters for individual blocks?
- Use bigger (non-resettable?) global bloom filter?
- filesystem re-writing with categories :-)
- let's try and keep forward compatibility for the 0.7 release
when not using new features; the only features relevant are
likely FLAC compression support and "features" support; in
theory, we don't even need to increment the minor version at
all, since unknown compressions will be caught and feature
flags will simply be ignored; maybe it makes sense to have
this mode of compatibility only for the 0.8 releases and in
0.9 do a hard increment of the minor version; in 0.8, we can
use the old minor version if we don't use FLAC and the new
minor version if we do
- file discovery progress?
- reasonable defaults when `--categorize` is given without
any arguments
- show defaults for categorized options
- scanner / compressor progress contexts?
- file system rewriting with categories :-)
- take a look at CPU measurements, those for nilsimsa
ordering are probably wrong
- segmenter tests with different granularities, block sizes,
any other options
- configurable number of threads for ordering/segmenting
- Bloom filters can be wasteful if lookback gets really long.
Maybe we can use smaller bloom filters for individual blocks
and one or two larger "global" bloom filters? It's going to
@ -74,10 +49,6 @@
in this case.
- Forward compatibility
- Feature flags (feature strings)
- Wiki with use cases
- Perl releases
- Videos with shared streams
@ -86,25 +57,6 @@
- Mounting lots of images with shared cache?
- configuration ideas:
--order FILETYPE::...
-B FILETYPE::...
-W FILETYPE::...
-w FILETYPE::...
-C FILETYPE::...
-B pcmaudio::64 -W pcmaudio::16 -C pcmaudio::flac:level=8
-C binary/x86::zstd:filter=x86
-C mime:application/x-archive::null
--filetype pcmaudio::mime:audio/x-wav,mime:audio/x-w64
--categorize=pcmaudio,incompressible,binary,libmagic
--libmagic-types=application/x-archive
- different scenarios for categorized files / chunks:
- Video files
@ -122,11 +74,9 @@
This is actually quite easy:
- Identify PCM audio files (libmagic?)
- Use libsndfile for parsing
- Nilsimsa similarity works surprisingly well
- We can potentially switch to larger window size for segmentation and use
larger lookback
- Group by format (# of channels, resolution, endian-ness, signedness, sample rate)
- Run segmentation as usual
- Compress each block using FLAC (hopefully we can configure how much header data
and/or seek points etc. gets stored) or maybe even WAVPACK is we don't need perf
@ -183,18 +133,12 @@
would only operate on a few instead of all bloom filters, which
could be better from a cache locality pov)
- per-file progress for large files?
- throughput indicator
- similarity size limit to avoid similarity computation for huge files
- store files without similarity hash first, sorted descending by size
- allow ordering by *reverse* path
- use streaming interface for zstd decompressor
- json metadata recovery
- add --chmod, --chown
- add some simple filter rules?
- handle sparse files?
- try to be more resilient to modifications of the input while creating fs
@ -221,8 +165,6 @@
- readahead?
- remove multiple blockhash window sizes, one is enough apparently?
- window-increment-shift seems silly to configure?
- identify blocks that contain mostly binary data and adjust compressor?