747 Commits

Author SHA1 Message Date
Marcus Holland-Moritz
1215a30f78 Support for simple filter rules (potential fix for github #6) 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
c2f00d78c3 Log full path when skipping files in scanner 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
aeeddaecab Honour user locale when formatting numbers 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
0ed3c933bf Some new TODOs 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
6920df7334 Improved deduplication algorithm
Instead of hashing all files unconditionally, the new algorithm first
checks if there are multiple files of the same size. Files with a
unique size cannot have duplicates and so don't have to be hashed at
all.
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
17567e009a Improved checks for openssl digest functions 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
b1db6470df Fix spelling 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
5256cf09ae Rename lookup tables in file scanner 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
6a6fe94228 Simplify original_size update in file::scan() 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
b001d9f28e Progress unit tests & fixes 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
148de5bf0d Add --file-hash option (fixes github #92)
This does not yet address the issue that uniquely sized files are
unnecessarily hashed, which is also mentioned in #92. This will be
addressed separately.
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
482a40560e Add inode ordering test (for image reproducibility, see gh #91) 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
c5ac04347c Rename cache tidying functions to be less ambiguous 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
b59a48c2b6 Replace use of boost posix_time with {fmt}
Only issue is that in order to properly format fractional seconds,
we need a bleeding edge version of {fmt}.
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
3053140e5c boost::system::system_error -> std::system_error 2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
8ed2fceba3 Fix extract_blocks.py 2022-10-29 18:53:52 +02:00
Marcus Holland-Moritz
0b060a57f3 Document new command line options v0.6.2 2022-10-24 13:13:22 +02:00
Marcus Holland-Moritz
4dee67a50b Update change log 2022-10-24 13:07:27 +02:00
Marcus Holland-Moritz
842dbbadd6 Update folly/fbthrift 2022-10-24 10:06:54 +02:00
Marcus Holland-Moritz
99582205dd Remove deprecated required specifiers from thrift definitions
At least for the frozen representation, this doesn't make a difference.
2022-10-24 10:06:42 +02:00
Marcus Holland-Moritz
242d316007 Fix #99: build fails if build dir is outside of git repo 2022-10-24 09:53:31 +02:00
Marcus Holland-Moritz
f5e3108bb2 Fix #105: handle strrchr() returning NULL
This shouldn't have been a problem as the compiler generated source
file paths should always contain a `/`, but it's still good to play
safe.
2022-10-22 23:43:09 +02:00
Marcus Holland-Moritz
d583bd9a2d Update folly/fbthrift 2022-10-21 16:08:10 +02:00
Marcus Holland-Moritz
e94816fcbc Remove all traces of statvfs from dwarfsextract 2022-10-21 15:48:23 +02:00
Marcus Holland-Moritz
a1a9571b13 Update parallel-hashmap to v1.3.8 2022-10-21 13:44:43 +02:00
Marcus Holland-Moritz
01ee8916b6 clang-format 2022-10-21 13:37:43 +02:00
mhx
876fafdd55
Merge pull request #107 from MRWITEK/mincore
Fix `cached_block::is_swapped_out()`
2022-10-21 13:36:52 +02:00
Marcus Holland-Moritz
186eb763a3 Fix #104: read large files in chunks rather than fully
This changes the way data is sent to libarchive. For files larger
than `max_queued_bytes`, instead of fully reading the compressed
file and then sending the whole file to libarchive at once, the
code now reads chunks of at most `max_queued_bytes` and sends the
chunks to libarchive independently. Small files are treated as
before.

When extracting large files, this method is actually a lot faster
as it puts less strain on the memory allocator.
2022-10-21 11:12:28 +02:00
Marcus Holland-Moritz
dc8490f583 Remove conditional statvfs code in dwarfsextract
Keep things simple, just derive `max_queued_bytes` from the cache
size for now.
2022-10-21 11:07:27 +02:00
Marcus Holland-Moritz
bf2064f650 Fix heap-use-after-free when writing section index
When writing the section index block, an additional entry was added
to the index, potentially reallocating the vector containing the
index. However, we previously took the address of the vector data
in order to write the index, so that address is now invalid.

Fix is by not adding the extra entry to the index.
2022-10-21 11:06:28 +02:00
Marcus Holland-Moritz
59b87cdd9f Fix data race in cached_block 2022-10-21 08:57:21 +02:00
Marcus Holland-Moritz
b07863bcf6 Debug output, assertion & cleanup in op_readdir 2022-10-20 17:12:20 +02:00
mhx
e86a44e3dd
Merge pull request #106 from MRWITEK/main
Fix out of bounds access
2022-10-20 17:04:46 +02:00
Victor Dmitriev
301150c908 Fix cached_block::is_swapped_out()
see `mincore()` documentation
at https://man7.org/linux/man-pages/man2/mincore.2.html
2022-09-23 02:24:29 +03:00
Victor Dmitriev
ea3ffee377 Fix out of bounds access
when `written == buf.size()`
2022-09-23 01:36:21 +03:00
Marcus Holland-Moritz
e8f489a4c1 Update parallel-hashmap 2022-08-04 15:09:57 +02:00
Marcus Holland-Moritz
314d1320cf Only overwrite output image with --force option (fixes #93) 2022-08-03 19:16:30 +02:00
Marcus Holland-Moritz
183a16d953 fsst: deterministic symbol tables (needed to fix #91)
This fixes what I believe is a bug in the fsst library that causes
symbol tables to be non-deterministic. There's an open issue/PR for
the library, so it's not yet clear if this fix is correct/optimal.
2022-08-03 18:59:47 +02:00
Marcus Holland-Moritz
422146d7a2 Produce deterministic inode numbers (needed to fix #91)
While most of the code typically ensures that elements are kept
in a deterministic order, the code that assigned inode numbers
was iterating a hash table, which by itself guaranteed FIFO
semantics, but items were inserted from multiple threads when
scanning the input file system.

This change adds a sorting step before assigning inode numbers.
(This shouldn't be much of a performance hit.)
2022-08-03 18:54:57 +02:00
Marcus Holland-Moritz
6bbd4e3970 Add --no-create-timestamp option (needed to fix #91)
In order to produce bit-identical images, we need to be able to
drop create timestamps from the output.
2022-08-03 18:52:40 +02:00
Marcus Holland-Moritz
2a2d089cea Add Python script for block extraction 2022-07-04 12:45:14 +02:00
Marcus Holland-Moritz
525d7d6007 Fix formatting 2022-06-11 23:04:09 +02:00
Marcus Holland-Moritz
b1e4667aa3 Update change log v0.6.1 2022-06-11 22:46:24 +02:00
Marcus Holland-Moritz
977520cfa8 Fix binary installation 2022-06-11 22:38:04 +02:00
Marcus Holland-Moritz
76b6ffd818 Add codacy badge v0.6.0 2022-06-11 21:59:45 +02:00
Marcus Holland-Moritz
c10f7d2482 More travis tweaking 2022-06-11 21:34:03 +02:00
Marcus Holland-Moritz
2713e2166b Disable man pages on travis while their ruby environment is broken 2022-06-11 21:25:44 +02:00
Marcus Holland-Moritz
6c13150f17 See what's wrong with running ronn on travis 2022-06-11 21:06:16 +02:00
Marcus Holland-Moritz
f07e7e3e2b Update change log 2022-06-11 20:19:26 +02:00
Marcus Holland-Moritz
3bc0a3411e Run travis only on main branch 2022-06-11 20:01:50 +02:00