chore: update TODO

2025-09-09 20:41:04 -04:00 · 2023-12-17 20:28:11 +01:00 · 2023-12-17 20:28:11 +01:00 · 3a658981f8
commit 3a658981f8
parent fa6e7f5408
1 changed files with 0 additions and 58 deletions
--- a/58
+++ b/58
@ -8,46 +8,21 @@
  obviously wouldn't be undo-able)
 - Packaging of libs added via FetchContent
 - Remove [ MiB, MiB, MiB ]
 - Generic hashing / scanning / categorizing progress?
 - Re-assemble global bloom filter rather than merging?
 - Use smaller bloom filters for individual blocks?
 - Use bigger (non-resettable?) global bloom filter?
 - filesystem re-writing with categories :-)
 - let's try and keep forward compatibility for the 0.7 release
  when not using new features; the only features relevant are
  likely FLAC compression support and "features" support; in
  theory, we don't even need to increment the minor version at
  all, since unknown compressions will be caught and feature
  flags will simply be ignored; maybe it makes sense to have
  this mode of compatibility only for the 0.8 releases and in
  0.9 do a hard increment of the minor version; in 0.8, we can
  use the old minor version if we don't use FLAC and the new
  minor version if we do
 - file discovery progress?
 - reasonable defaults when `--categorize` is given without
  any arguments
 - show defaults for categorized options
 - scanner / compressor progress contexts?
 - file system rewriting with categories :-)
 - take a look at CPU measurements, those for nilsimsa
  ordering are probably wrong
 - segmenter tests with different granularities, block sizes,
  any other options
 - configurable number of threads for ordering/segmenting
 - Bloom filters can be wasteful if lookback gets really long.
  Maybe we can use smaller bloom filters for individual blocks
  and one or two larger "global" bloom filters? It's going to
@ -74,10 +49,6 @@
  in this case.
 - Forward compatibility
  - Feature flags (feature strings)
 - Wiki with use cases
  - Perl releases
  - Videos with shared streams
@ -86,25 +57,6 @@
 - Mounting lots of images with shared cache?
 - configuration ideas:
  --order FILETYPE::...
  -B FILETYPE::...
  -W FILETYPE::...
  -w FILETYPE::...
  -C FILETYPE::...
  -B pcmaudio::64 -W pcmaudio::16 -C pcmaudio::flac:level=8
  -C binary/x86::zstd:filter=x86
  -C mime:application/x-archive::null
  --filetype pcmaudio::mime:audio/x-wav,mime:audio/x-w64
  --categorize=pcmaudio,incompressible,binary,libmagic
  --libmagic-types=application/x-archive
 - different scenarios for categorized files / chunks:
  - Video files
@ -122,11 +74,9 @@
    This is actually quite easy:
    - Identify PCM audio files (libmagic?)
    - Use libsndfile for parsing
    - Nilsimsa similarity works surprisingly well
    - We can potentially switch to larger window size for segmentation and use
      larger lookback
    - Group by format (# of channels, resolution, endian-ness, signedness, sample rate)
    - Run segmentation as usual
    - Compress each block using FLAC (hopefully we can configure how much header data
      and/or seek points etc. gets stored) or maybe even WAVPACK is we don't need perf
@ -183,18 +133,12 @@
      would only operate on a few instead of all bloom filters, which
      could be better from a cache locality pov)
 - per-file progress for large files?
 - throughput indicator
 - similarity size limit to avoid similarity computation for huge files
 - store files without similarity hash first, sorted descending by size
 - allow ordering by *reverse* path
 - use streaming interface for zstd decompressor
 - json metadata recovery
 - add --chmod, --chown
 - add some simple filter rules?
 - handle sparse files?
 - try to be more resilient to modifications of the input while creating fs
@ -221,8 +165,6 @@
 - readahead?
 - remove multiple blockhash window sizes, one is enough apparently?
 - window-increment-shift seems silly to configure?
 - identify blocks that contain mostly binary data and adjust compressor?