mirror of
https://github.com/mhx/dwarfs.git
synced 2025-09-09 12:28:13 -04:00
Update format docs
This commit is contained in:
parent
88d684379e
commit
df5de1f486
@ -81,6 +81,8 @@ a collection of bit-packed arrays and structures. The exact layout of
|
||||
each list and structure depends on the actual data and is stored
|
||||
separately in `METADATA_V2_SCHEMA`.
|
||||
|
||||
## Metadata Format
|
||||
|
||||
Here is a high-level overview of how all the bits and pieces relate
|
||||
to each other:
|
||||
|
||||
@ -152,7 +154,10 @@ to each other:
|
||||
|
||||
Thanks to the bit-packing, fields that are unused or only contain a
|
||||
single (zero) value, e.g. a `group_index` that's always zero because
|
||||
all files belong to the same group, do not occupy any space.
|
||||
all files belong to the same group, do not occupy any space in the
|
||||
metadata block.
|
||||
|
||||
### Determining Inode Offsets
|
||||
|
||||
Before you can start traversing the metadata, you need to determine
|
||||
the offsets for symlinks, regular files, devices etc. in the `inodes`
|
||||
@ -172,29 +177,31 @@ The `inodes` list is strictly in the following order:
|
||||
|
||||
* socket/pipe inodes (`S_IFSOCK`, `S_IFIFO`)
|
||||
|
||||
The offsets can thus be found using a simple binary search.
|
||||
The offsets can thus be found by using a binary search with a
|
||||
predicate on the inode more. The shared file offset can be found
|
||||
by subtracting the length of `shared_files_table` from the total
|
||||
number of regular files.
|
||||
|
||||
### Unique and Shared File Inodes
|
||||
|
||||
The difference between *unique* and *shared* file inodes is that
|
||||
there is only one *unique* file inode that references a particular
|
||||
index in the `chunk_table`, whereas there are multiple *shared*
|
||||
file inodes that will reference the same index. This is how DwarFS
|
||||
implements file-level de-duplication beyond hardlinks. Hardlinks
|
||||
share the same inode. Duplicate files that are not hardlinked all
|
||||
share the same inode. Duplicate files that are not hardlinked each
|
||||
have a unique inode, but still reference the same content through
|
||||
the `chunk_table`.
|
||||
|
||||
The `shared_files_table` provides the necessary indirection that
|
||||
maps a *shared* file inode to a `chunk_table` index. However, the
|
||||
`shared_files_table` is stored in a packed format that only encodes
|
||||
the number of shared links to a `chunk_table` index, so it must be
|
||||
unpacked first.
|
||||
maps a *shared* file inode to a `chunk_table` index.
|
||||
|
||||
Once the offsets have been determined and the `shared_files_table`
|
||||
is unpacked, you can start traversing the metadata. Typically, you
|
||||
would start a the root directory which is at `dir_entries[0]`,
|
||||
### Traversing the Metadata
|
||||
|
||||
You typically start at the root directory which is at `dir_entries[0]`,
|
||||
`inodes[0]` and `directories[0]`. Note that the root directory
|
||||
implicitly has no name, so that `dir_entries[0].name_index`
|
||||
shouldn't be used.
|
||||
should not be used.
|
||||
|
||||
To determine the contents of a directory, we determine the range
|
||||
of entries from `directories[inode_num].first_entry` to
|
||||
@ -219,12 +226,12 @@ after adjusting the index:
|
||||
|
||||
chunk_index = inode_num - file_inode_offset
|
||||
|
||||
For *shared* regular file inodes, you can index into the unpacked
|
||||
For *shared* regular file inodes, you can index into the (unpacked)
|
||||
`shared_files_table`:
|
||||
|
||||
shared_index = shared_files[inode_num - file_inode_offset - num_unique_files]
|
||||
|
||||
The, you can index into `chunk_table`, but you need to adjust the
|
||||
Then, you can index into `chunk_table`, but you need to adjust the
|
||||
index once more:
|
||||
|
||||
chunk_index = shared_index + num_unique_files
|
||||
@ -244,3 +251,80 @@ Last but not least, to read the device id for a device inode, you
|
||||
can index into `devices`:
|
||||
|
||||
device_id = devices[inode_num - device_inode_offset]
|
||||
|
||||
### Optionally Packed Structures
|
||||
|
||||
The overview above assumes metadata without any additional packing,
|
||||
which can be produced using:
|
||||
|
||||
mkdwarfs --pack-metadata=none --plain-string-tables
|
||||
|
||||
However, this isn't the default, and parts of the metadata are
|
||||
likely stored in a packed format. These are mostly easy to unpack.
|
||||
|
||||
#### Shared Files Table Packing
|
||||
|
||||
The `shared_files_table` can be stored in a packed format that
|
||||
only encodes the number of shared links to a `chunk_table` index.
|
||||
As the minimum number of links is always 2 (otherwise it wouldn't
|
||||
be shared), the numbers in the packed format are additionally
|
||||
offset by 2. So for example, a packed table like
|
||||
|
||||
[0, 3, 1, 0, 1]
|
||||
|
||||
would unpack to:
|
||||
|
||||
[0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4]
|
||||
|
||||
The packed format is used when `options.packed_shared_files_table`
|
||||
is true.
|
||||
|
||||
#### Directories Packing
|
||||
|
||||
The `directories` table, when stored in packed format, omits
|
||||
all `parent_entry` fields and uses delta compression for the
|
||||
`first_entry` fields.
|
||||
|
||||
In order to unpack all information, you first have to delta-
|
||||
decompress the `first_entry` fields, then traverse the whole
|
||||
directory tree once to fill in the `parent_entry` fields.
|
||||
This sounds like a lot of work, but it's actually reasonably
|
||||
fast. For example, for a file system with 15 million entries
|
||||
in 90,000 directories, reconstructing the `directories` takes
|
||||
only about 50 milliseconds.
|
||||
|
||||
The packed format is used when `options.packed_directories`
|
||||
is true.
|
||||
|
||||
#### Chunk Table Packing
|
||||
|
||||
The `chunk_table` can also be stored delta-compressed and
|
||||
must be unpacked accordingly.
|
||||
|
||||
The packed format is used when `options.packed_chunk_table`
|
||||
is true.
|
||||
|
||||
#### Names and Symlinks String Table Packing
|
||||
|
||||
Both the `names` and `symlinks` tables can be stored in a
|
||||
packed format in `compact_names` and `compact_symlinks`.
|
||||
|
||||
There are two separate packing schemes that can be combined.
|
||||
If none of these schemes is active, the difference between
|
||||
e.g. `names` and `compact_names` is that the former is stored
|
||||
as a "proper" list, whereas the latter is stored as a single
|
||||
string plus an index of offsets. As lists of strings store
|
||||
both offset and length for each element, this already saves
|
||||
the storage for the length fields, which can easily be
|
||||
determined from the offsets at run-time.
|
||||
|
||||
If the `packed_index` scheme is used in addition, the index
|
||||
is stored delta-compressed.
|
||||
|
||||
Last but not least, the individual strings can be compressed
|
||||
as well. The [fsst library](https://github.com/cwida/fsst)
|
||||
allows for compression of short strings with random access
|
||||
and is typically able to reduce the overall size of the
|
||||
string tables by 50%, using a dictionary that is only a few
|
||||
hundred bytes long. If a `symtab` is set for the string table,
|
||||
this compression is used.
|
||||
|
Loading…
x
Reference in New Issue
Block a user