From e58c80736bb968e56d07fdacc6dd567bc62e8254 Mon Sep 17 00:00:00 2001 From: Marcus Holland-Moritz Date: Wed, 24 May 2023 18:10:41 +0200 Subject: [PATCH] Update DwarFS format documentation --- doc/dwarfs-format.md | 39 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/doc/dwarfs-format.md b/doc/dwarfs-format.md index 9f511ab0..51841f45 100644 --- a/doc/dwarfs-format.md +++ b/doc/dwarfs-format.md @@ -6,8 +6,9 @@ This document describes the DwarFS file system format, version 2.3. ## FILE STRUCTURE -A DwarFS file system image is just a sequence of blocks. Each block has the -following format: +A DwarFS file system image is just a sequence of blocks, optionally +prefixed by a "header", which is typically some sort of shell script. +Each block has the following format: ┌───┬───┬───┬───┬───┬───┬───┬───┐ 0x00 │'D'│'W'│'A'│'R'│'F'│'S'│MAJ│MIN│ MAJ=0x02, MIN=0x03 for v2.3 @@ -63,9 +64,24 @@ A couple of notes: larger than the one it supports. However, a new program will still read all file systems with a smaller minor version number. +### Header Detection + +In order to access the file system data when it is prefixed by a header, +the size of the header must be known. It can either be given to the +tools or the FUSE driver explicitly (using e.g. the `--image-offset` or +`-o offset` options), or it can be determined automatically (by passing +`auto` as the argument to the aforementioned options). + +Automatic detection works by scanning the file for the section header +magic (`DWARFS`) and validating the match by looking up the second +section header using the length of the first section and also checking +its magic. It is rather unlikely that a file is created accidentally +that would pass this check, although one could be crafted manually +without any problems. + ### Section Types -There are currently 3 different section types. +There are currently 4 different section types. - `BLOCK` (0): A block of data. This is where all file data is stored. There can be @@ -81,6 +97,23 @@ There are currently 3 different section types. each list and structure depends on the actual data and is stored separately in `METADATA_V2_SCHEMA`. +- `SECTION_INDEX` (9): + The section index is, well, an index of all sections in the file + system. If present (creation of the index can be suppressed with + `--no-section-index`), this is *required* to be the last section. + Each entry in the section index is a 64-bit value with the upper + 16 bits being the section type and the lower 48 bits being the + offset relative to the first section. That is, the section index + is independent of whether or not a header is present before the + first section. The whole point of the section index is to avoid + having to build an index by visiting all section headers. + In order to find the start of the section index, you only have + to read the last 64-bit value from the file, check if the upper + 16 bits match the `SECTION_INDEX`, then add the image offset + (header size) to the lower 48 bits. At that position in the + file, you should find a valid section header for the section + index. + ## METADATA FORMAT Here is a high-level overview of how all the bits and pieces relate