1465 Commits

Author SHA1 Message Date
Marcus Holland-Moritz
8550c47873 Add offset to section description 2022-11-08 11:45:06 +01:00
Marcus Holland-Moritz
18a065bb88 Try to dump file system information even if errors were encountered 2022-11-08 11:44:06 +01:00
Marcus Holland-Moritz
f231ce0878 Add some filter tests 2022-11-08 10:08:02 +01:00
Marcus Holland-Moritz
87fd512df7 Prettier time formatting 2022-11-07 10:48:20 +00:00
Marcus Holland-Moritz
eb8803d6df Add --input-list option to pass in a list of files 2022-11-07 10:48:20 +00:00
Marcus Holland-Moritz
ff5f99f3d9 Add --max-similarity-size option 2022-11-06 14:32:14 +01:00
Marcus Holland-Moritz
21fc4c9524 Consolidate tool header code 2022-11-06 11:36:04 +01:00
Marcus Holland-Moritz
a14fa38a0d Reintroduce --num-scanner-workers 2022-11-06 10:52:02 +01:00
Marcus Holland-Moritz
c6a6ed4f8f Support lz4 compression levels 10..12 2022-11-06 10:20:53 +01:00
Marcus Holland-Moritz
50fa3c8374 Update .travis.yml 2022-11-06 09:57:59 +01:00
Marcus Holland-Moritz
841dcf17ac Static build tweaks 2022-11-06 07:49:40 +00:00
Marcus Holland-Moritz
12b2e35f05 Add support for Brotli compression (fixes github #76) 2022-11-06 07:49:40 +00:00
Marcus Holland-Moritz
87271666ac Autogenerate compression type code 2022-11-05 22:23:58 +00:00
Marcus Holland-Moritz
03829e6da4 First step at making compression code more modular 2022-11-05 22:23:58 +00:00
Marcus Holland-Moritz
c2e3cdfecb Factor out file_scanner 2022-11-05 22:23:58 +00:00
Marcus Holland-Moritz
b41a400e32 Update module versions 2022-11-05 22:23:58 +00:00
Marcus Holland-Moritz
5ab9cbb3c4 Simplify libfmt setup and add override to use system library 2022-10-30 10:35:09 +01:00
Marcus Holland-Moritz
356120c058 Doc fix 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
1215a30f78 Support for simple filter rules (potential fix for github #6) 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
c2f00d78c3 Log full path when skipping files in scanner 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
aeeddaecab Honour user locale when formatting numbers 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
0ed3c933bf Some new TODOs 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
6920df7334 Improved deduplication algorithm
Instead of hashing all files unconditionally, the new algorithm first
checks if there are multiple files of the same size. Files with a
unique size cannot have duplicates and so don't have to be hashed at
all.
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
17567e009a Improved checks for openssl digest functions 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
b1db6470df Fix spelling 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
5256cf09ae Rename lookup tables in file scanner 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
6a6fe94228 Simplify original_size update in file::scan() 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
b001d9f28e Progress unit tests & fixes 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
148de5bf0d Add --file-hash option (fixes github #92)
This does not yet address the issue that uniquely sized files are
unnecessarily hashed, which is also mentioned in #92. This will be
addressed separately.
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
482a40560e Add inode ordering test (for image reproducibility, see gh #91) 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
c5ac04347c Rename cache tidying functions to be less ambiguous 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
b59a48c2b6 Replace use of boost posix_time with {fmt}
Only issue is that in order to properly format fractional seconds,
we need a bleeding edge version of {fmt}.
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
3053140e5c boost::system::system_error -> std::system_error 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
8ed2fceba3 Fix extract_blocks.py 2022-10-29 18:53:52 +02:00
Marcus Holland-Moritz
0b060a57f3 Document new command line options v0.6.2 2022-10-24 13:13:22 +02:00
Marcus Holland-Moritz
4dee67a50b Update change log 2022-10-24 13:07:27 +02:00
Marcus Holland-Moritz
842dbbadd6 Update folly/fbthrift 2022-10-24 10:06:54 +02:00
Marcus Holland-Moritz
99582205dd Remove deprecated required specifiers from thrift definitions
At least for the frozen representation, this doesn't make a difference.
2022-10-24 10:06:42 +02:00
Marcus Holland-Moritz
242d316007 Fix #99: build fails if build dir is outside of git repo 2022-10-24 09:53:31 +02:00
Marcus Holland-Moritz
f5e3108bb2 Fix #105: handle strrchr() returning NULL
This shouldn't have been a problem as the compiler generated source
file paths should always contain a `/`, but it's still good to play
safe.
2022-10-22 23:43:09 +02:00
Marcus Holland-Moritz
d583bd9a2d Update folly/fbthrift 2022-10-21 16:08:10 +02:00
Marcus Holland-Moritz
e94816fcbc Remove all traces of statvfs from dwarfsextract 2022-10-21 15:48:23 +02:00
Marcus Holland-Moritz
a1a9571b13 Update parallel-hashmap to v1.3.8 2022-10-21 13:44:43 +02:00
Marcus Holland-Moritz
01ee8916b6 clang-format 2022-10-21 13:37:43 +02:00
mhx
876fafdd55
Merge pull request #107 from MRWITEK/mincore
Fix `cached_block::is_swapped_out()`
2022-10-21 13:36:52 +02:00
Marcus Holland-Moritz
186eb763a3 Fix #104: read large files in chunks rather than fully
This changes the way data is sent to libarchive. For files larger
than `max_queued_bytes`, instead of fully reading the compressed
file and then sending the whole file to libarchive at once, the
code now reads chunks of at most `max_queued_bytes` and sends the
chunks to libarchive independently. Small files are treated as
before.

When extracting large files, this method is actually a lot faster
as it puts less strain on the memory allocator.
2022-10-21 11:12:28 +02:00
Marcus Holland-Moritz
dc8490f583 Remove conditional statvfs code in dwarfsextract
Keep things simple, just derive `max_queued_bytes` from the cache
size for now.
2022-10-21 11:07:27 +02:00
Marcus Holland-Moritz
bf2064f650 Fix heap-use-after-free when writing section index
When writing the section index block, an additional entry was added
to the index, potentially reallocating the vector containing the
index. However, we previously took the address of the vector data
in order to write the index, so that address is now invalid.

Fix is by not adding the extra entry to the index.
2022-10-21 11:06:28 +02:00
Marcus Holland-Moritz
59b87cdd9f Fix data race in cached_block 2022-10-21 08:57:21 +02:00
Marcus Holland-Moritz
b07863bcf6 Debug output, assertion & cleanup in op_readdir 2022-10-20 17:12:20 +02:00