Marcus Holland-Moritz
1215a30f78
Support for simple filter rules (potential fix for github #6 )
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
c2f00d78c3
Log full path when skipping files in scanner
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
aeeddaecab
Honour user locale when formatting numbers
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
0ed3c933bf
Some new TODOs
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
6920df7334
Improved deduplication algorithm
...
Instead of hashing all files unconditionally, the new algorithm first
checks if there are multiple files of the same size. Files with a
unique size cannot have duplicates and so don't have to be hashed at
all.
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
17567e009a
Improved checks for openssl digest functions
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
b1db6470df
Fix spelling
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
5256cf09ae
Rename lookup tables in file scanner
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
6a6fe94228
Simplify original_size update in file::scan()
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
b001d9f28e
Progress unit tests & fixes
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
148de5bf0d
Add --file-hash option (fixes github #92 )
...
This does not yet address the issue that uniquely sized files are
unnecessarily hashed, which is also mentioned in #92 . This will be
addressed separately.
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
482a40560e
Add inode ordering test (for image reproducibility, see gh #91 )
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
c5ac04347c
Rename cache tidying functions to be less ambiguous
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
b59a48c2b6
Replace use of boost posix_time with {fmt}
...
Only issue is that in order to properly format fractional seconds,
we need a bleeding edge version of {fmt}.
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
3053140e5c
boost::system::system_error -> std::system_error
2022-10-29 18:54:31 +02:00
Marcus Holland-Moritz
8ed2fceba3
Fix extract_blocks.py
2022-10-29 18:53:52 +02:00
Marcus Holland-Moritz
0b060a57f3
Document new command line options
v0.6.2
2022-10-24 13:13:22 +02:00
Marcus Holland-Moritz
4dee67a50b
Update change log
2022-10-24 13:07:27 +02:00
Marcus Holland-Moritz
842dbbadd6
Update folly/fbthrift
2022-10-24 10:06:54 +02:00
Marcus Holland-Moritz
99582205dd
Remove deprecated required
specifiers from thrift definitions
...
At least for the frozen representation, this doesn't make a difference.
2022-10-24 10:06:42 +02:00
Marcus Holland-Moritz
242d316007
Fix #99 : build fails if build dir is outside of git repo
2022-10-24 09:53:31 +02:00
Marcus Holland-Moritz
f5e3108bb2
Fix #105 : handle strrchr() returning NULL
...
This shouldn't have been a problem as the compiler generated source
file paths should always contain a `/`, but it's still good to play
safe.
2022-10-22 23:43:09 +02:00
Marcus Holland-Moritz
d583bd9a2d
Update folly/fbthrift
2022-10-21 16:08:10 +02:00
Marcus Holland-Moritz
e94816fcbc
Remove all traces of statvfs from dwarfsextract
2022-10-21 15:48:23 +02:00
Marcus Holland-Moritz
a1a9571b13
Update parallel-hashmap to v1.3.8
2022-10-21 13:44:43 +02:00
Marcus Holland-Moritz
01ee8916b6
clang-format
2022-10-21 13:37:43 +02:00
mhx
876fafdd55
Merge pull request #107 from MRWITEK/mincore
...
Fix `cached_block::is_swapped_out()`
2022-10-21 13:36:52 +02:00
Marcus Holland-Moritz
186eb763a3
Fix #104 : read large files in chunks rather than fully
...
This changes the way data is sent to libarchive. For files larger
than `max_queued_bytes`, instead of fully reading the compressed
file and then sending the whole file to libarchive at once, the
code now reads chunks of at most `max_queued_bytes` and sends the
chunks to libarchive independently. Small files are treated as
before.
When extracting large files, this method is actually a lot faster
as it puts less strain on the memory allocator.
2022-10-21 11:12:28 +02:00
Marcus Holland-Moritz
dc8490f583
Remove conditional statvfs code in dwarfsextract
...
Keep things simple, just derive `max_queued_bytes` from the cache
size for now.
2022-10-21 11:07:27 +02:00
Marcus Holland-Moritz
bf2064f650
Fix heap-use-after-free when writing section index
...
When writing the section index block, an additional entry was added
to the index, potentially reallocating the vector containing the
index. However, we previously took the address of the vector data
in order to write the index, so that address is now invalid.
Fix is by not adding the extra entry to the index.
2022-10-21 11:06:28 +02:00
Marcus Holland-Moritz
59b87cdd9f
Fix data race in cached_block
2022-10-21 08:57:21 +02:00
Marcus Holland-Moritz
b07863bcf6
Debug output, assertion & cleanup in op_readdir
2022-10-20 17:12:20 +02:00
mhx
e86a44e3dd
Merge pull request #106 from MRWITEK/main
...
Fix out of bounds access
2022-10-20 17:04:46 +02:00
Victor Dmitriev
301150c908
Fix cached_block::is_swapped_out()
...
see `mincore()` documentation
at https://man7.org/linux/man-pages/man2/mincore.2.html
2022-09-23 02:24:29 +03:00
Victor Dmitriev
ea3ffee377
Fix out of bounds access
...
when `written == buf.size()`
2022-09-23 01:36:21 +03:00
Marcus Holland-Moritz
e8f489a4c1
Update parallel-hashmap
2022-08-04 15:09:57 +02:00
Marcus Holland-Moritz
314d1320cf
Only overwrite output image with --force option ( fixes #93 )
2022-08-03 19:16:30 +02:00
Marcus Holland-Moritz
183a16d953
fsst: deterministic symbol tables (needed to fix #91 )
...
This fixes what I believe is a bug in the fsst library that causes
symbol tables to be non-deterministic. There's an open issue/PR for
the library, so it's not yet clear if this fix is correct/optimal.
2022-08-03 18:59:47 +02:00
Marcus Holland-Moritz
422146d7a2
Produce deterministic inode numbers (needed to fix #91 )
...
While most of the code typically ensures that elements are kept
in a deterministic order, the code that assigned inode numbers
was iterating a hash table, which by itself guaranteed FIFO
semantics, but items were inserted from multiple threads when
scanning the input file system.
This change adds a sorting step before assigning inode numbers.
(This shouldn't be much of a performance hit.)
2022-08-03 18:54:57 +02:00
Marcus Holland-Moritz
6bbd4e3970
Add --no-create-timestamp option (needed to fix #91 )
...
In order to produce bit-identical images, we need to be able to
drop create timestamps from the output.
2022-08-03 18:52:40 +02:00
Marcus Holland-Moritz
2a2d089cea
Add Python script for block extraction
2022-07-04 12:45:14 +02:00
Marcus Holland-Moritz
525d7d6007
Fix formatting
2022-06-11 23:04:09 +02:00
Marcus Holland-Moritz
b1e4667aa3
Update change log
v0.6.1
2022-06-11 22:46:24 +02:00
Marcus Holland-Moritz
977520cfa8
Fix binary installation
2022-06-11 22:38:04 +02:00
Marcus Holland-Moritz
76b6ffd818
Add codacy badge
v0.6.0
2022-06-11 21:59:45 +02:00
Marcus Holland-Moritz
c10f7d2482
More travis tweaking
2022-06-11 21:34:03 +02:00
Marcus Holland-Moritz
2713e2166b
Disable man pages on travis while their ruby environment is broken
2022-06-11 21:25:44 +02:00
Marcus Holland-Moritz
6c13150f17
See what's wrong with running ronn on travis
2022-06-11 21:06:16 +02:00
Marcus Holland-Moritz
f07e7e3e2b
Update change log
2022-06-11 20:19:26 +02:00
Marcus Holland-Moritz
3bc0a3411e
Run travis only on main branch
2022-06-11 20:01:50 +02:00