Update DwarFS format documentation

This commit is contained in:
Marcus Holland-Moritz 2023-05-24 18:10:41 +02:00
parent 50ef8bcc37
commit e58c80736b

View File

@ -6,8 +6,9 @@ This document describes the DwarFS file system format, version 2.3.
## FILE STRUCTURE
A DwarFS file system image is just a sequence of blocks. Each block has the
following format:
A DwarFS file system image is just a sequence of blocks, optionally
prefixed by a "header", which is typically some sort of shell script.
Each block has the following format:
┌───┬───┬───┬───┬───┬───┬───┬───┐
0x00 │'D'│'W'│'A'│'R'│'F'│'S'│MAJ│MIN│ MAJ=0x02, MIN=0x03 for v2.3
@ -63,9 +64,24 @@ A couple of notes:
larger than the one it supports. However, a new program will still
read all file systems with a smaller minor version number.
### Header Detection
In order to access the file system data when it is prefixed by a header,
the size of the header must be known. It can either be given to the
tools or the FUSE driver explicitly (using e.g. the `--image-offset` or
`-o offset` options), or it can be determined automatically (by passing
`auto` as the argument to the aforementioned options).
Automatic detection works by scanning the file for the section header
magic (`DWARFS`) and validating the match by looking up the second
section header using the length of the first section and also checking
its magic. It is rather unlikely that a file is created accidentally
that would pass this check, although one could be crafted manually
without any problems.
### Section Types
There are currently 3 different section types.
There are currently 4 different section types.
- `BLOCK` (0):
A block of data. This is where all file data is stored. There can be
@ -81,6 +97,23 @@ There are currently 3 different section types.
each list and structure depends on the actual data and is stored
separately in `METADATA_V2_SCHEMA`.
- `SECTION_INDEX` (9):
The section index is, well, an index of all sections in the file
system. If present (creation of the index can be suppressed with
`--no-section-index`), this is *required* to be the last section.
Each entry in the section index is a 64-bit value with the upper
16 bits being the section type and the lower 48 bits being the
offset relative to the first section. That is, the section index
is independent of whether or not a header is present before the
first section. The whole point of the section index is to avoid
having to build an index by visiting all section headers.
In order to find the start of the section index, you only have
to read the last 64-bit value from the file, check if the upper
16 bits match the `SECTION_INDEX`, then add the image offset
(header size) to the lower 48 bits. At that position in the
file, you should find a valid section header for the section
index.
## METADATA FORMAT
Here is a high-level overview of how all the bits and pieces relate