mirror of
https://github.com/mhx/dwarfs.git
synced 2025-09-09 20:41:04 -04:00
chore: update TODO
This commit is contained in:
parent
fa6e7f5408
commit
3a658981f8
58
TODO
58
TODO
@ -8,46 +8,21 @@
|
|||||||
obviously wouldn't be undo-able)
|
obviously wouldn't be undo-able)
|
||||||
|
|
||||||
- Packaging of libs added via FetchContent
|
- Packaging of libs added via FetchContent
|
||||||
- Remove [ MiB, MiB, MiB ]
|
|
||||||
- Generic hashing / scanning / categorizing progress?
|
|
||||||
|
|
||||||
- Re-assemble global bloom filter rather than merging?
|
- Re-assemble global bloom filter rather than merging?
|
||||||
- Use smaller bloom filters for individual blocks?
|
- Use smaller bloom filters for individual blocks?
|
||||||
- Use bigger (non-resettable?) global bloom filter?
|
- Use bigger (non-resettable?) global bloom filter?
|
||||||
|
|
||||||
- filesystem re-writing with categories :-)
|
|
||||||
|
|
||||||
- let's try and keep forward compatibility for the 0.7 release
|
|
||||||
when not using new features; the only features relevant are
|
|
||||||
likely FLAC compression support and "features" support; in
|
|
||||||
theory, we don't even need to increment the minor version at
|
|
||||||
all, since unknown compressions will be caught and feature
|
|
||||||
flags will simply be ignored; maybe it makes sense to have
|
|
||||||
this mode of compatibility only for the 0.8 releases and in
|
|
||||||
0.9 do a hard increment of the minor version; in 0.8, we can
|
|
||||||
use the old minor version if we don't use FLAC and the new
|
|
||||||
minor version if we do
|
|
||||||
|
|
||||||
- file discovery progress?
|
- file discovery progress?
|
||||||
|
|
||||||
- reasonable defaults when `--categorize` is given without
|
|
||||||
any arguments
|
|
||||||
|
|
||||||
- show defaults for categorized options
|
- show defaults for categorized options
|
||||||
|
|
||||||
- scanner / compressor progress contexts?
|
|
||||||
|
|
||||||
- file system rewriting with categories :-)
|
|
||||||
|
|
||||||
- take a look at CPU measurements, those for nilsimsa
|
- take a look at CPU measurements, those for nilsimsa
|
||||||
ordering are probably wrong
|
ordering are probably wrong
|
||||||
|
|
||||||
- segmenter tests with different granularities, block sizes,
|
- segmenter tests with different granularities, block sizes,
|
||||||
any other options
|
any other options
|
||||||
|
|
||||||
- configurable number of threads for ordering/segmenting
|
|
||||||
|
|
||||||
|
|
||||||
- Bloom filters can be wasteful if lookback gets really long.
|
- Bloom filters can be wasteful if lookback gets really long.
|
||||||
Maybe we can use smaller bloom filters for individual blocks
|
Maybe we can use smaller bloom filters for individual blocks
|
||||||
and one or two larger "global" bloom filters? It's going to
|
and one or two larger "global" bloom filters? It's going to
|
||||||
@ -74,10 +49,6 @@
|
|||||||
in this case.
|
in this case.
|
||||||
|
|
||||||
|
|
||||||
- Forward compatibility
|
|
||||||
|
|
||||||
- Feature flags (feature strings)
|
|
||||||
|
|
||||||
- Wiki with use cases
|
- Wiki with use cases
|
||||||
- Perl releases
|
- Perl releases
|
||||||
- Videos with shared streams
|
- Videos with shared streams
|
||||||
@ -86,25 +57,6 @@
|
|||||||
|
|
||||||
- Mounting lots of images with shared cache?
|
- Mounting lots of images with shared cache?
|
||||||
|
|
||||||
- configuration ideas:
|
|
||||||
|
|
||||||
--order FILETYPE::...
|
|
||||||
-B FILETYPE::...
|
|
||||||
-W FILETYPE::...
|
|
||||||
-w FILETYPE::...
|
|
||||||
-C FILETYPE::...
|
|
||||||
|
|
||||||
-B pcmaudio::64 -W pcmaudio::16 -C pcmaudio::flac:level=8
|
|
||||||
-C binary/x86::zstd:filter=x86
|
|
||||||
-C mime:application/x-archive::null
|
|
||||||
|
|
||||||
--filetype pcmaudio::mime:audio/x-wav,mime:audio/x-w64
|
|
||||||
|
|
||||||
|
|
||||||
--categorize=pcmaudio,incompressible,binary,libmagic
|
|
||||||
--libmagic-types=application/x-archive
|
|
||||||
|
|
||||||
|
|
||||||
- different scenarios for categorized files / chunks:
|
- different scenarios for categorized files / chunks:
|
||||||
|
|
||||||
- Video files
|
- Video files
|
||||||
@ -122,11 +74,9 @@
|
|||||||
This is actually quite easy:
|
This is actually quite easy:
|
||||||
|
|
||||||
- Identify PCM audio files (libmagic?)
|
- Identify PCM audio files (libmagic?)
|
||||||
- Use libsndfile for parsing
|
|
||||||
- Nilsimsa similarity works surprisingly well
|
- Nilsimsa similarity works surprisingly well
|
||||||
- We can potentially switch to larger window size for segmentation and use
|
- We can potentially switch to larger window size for segmentation and use
|
||||||
larger lookback
|
larger lookback
|
||||||
- Group by format (# of channels, resolution, endian-ness, signedness, sample rate)
|
|
||||||
- Run segmentation as usual
|
- Run segmentation as usual
|
||||||
- Compress each block using FLAC (hopefully we can configure how much header data
|
- Compress each block using FLAC (hopefully we can configure how much header data
|
||||||
and/or seek points etc. gets stored) or maybe even WAVPACK is we don't need perf
|
and/or seek points etc. gets stored) or maybe even WAVPACK is we don't need perf
|
||||||
@ -183,18 +133,12 @@
|
|||||||
would only operate on a few instead of all bloom filters, which
|
would only operate on a few instead of all bloom filters, which
|
||||||
could be better from a cache locality pov)
|
could be better from a cache locality pov)
|
||||||
|
|
||||||
- per-file progress for large files?
|
|
||||||
- throughput indicator
|
|
||||||
|
|
||||||
- similarity size limit to avoid similarity computation for huge files
|
- similarity size limit to avoid similarity computation for huge files
|
||||||
- store files without similarity hash first, sorted descending by size
|
- store files without similarity hash first, sorted descending by size
|
||||||
- allow ordering by *reverse* path
|
|
||||||
|
|
||||||
|
|
||||||
- use streaming interface for zstd decompressor
|
- use streaming interface for zstd decompressor
|
||||||
- json metadata recovery
|
- json metadata recovery
|
||||||
- add --chmod, --chown
|
|
||||||
- add some simple filter rules?
|
|
||||||
- handle sparse files?
|
- handle sparse files?
|
||||||
- try to be more resilient to modifications of the input while creating fs
|
- try to be more resilient to modifications of the input while creating fs
|
||||||
|
|
||||||
@ -221,8 +165,6 @@
|
|||||||
|
|
||||||
- readahead?
|
- readahead?
|
||||||
|
|
||||||
- remove multiple blockhash window sizes, one is enough apparently?
|
|
||||||
|
|
||||||
- window-increment-shift seems silly to configure?
|
- window-increment-shift seems silly to configure?
|
||||||
|
|
||||||
- identify blocks that contain mostly binary data and adjust compressor?
|
- identify blocks that contain mostly binary data and adjust compressor?
|
||||||
|
Loading…
x
Reference in New Issue
Block a user