From e58c80736bb968e56d07fdacc6dd567bc62e8254 Mon Sep 17 00:00:00 2001
From: Marcus Holland-Moritz <github@mhxnet.de>
Date: Wed, 24 May 2023 18:10:41 +0200
Subject: [PATCH] Update DwarFS format documentation

---
 doc/dwarfs-format.md | 39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/doc/dwarfs-format.md b/doc/dwarfs-format.md
index 9f511ab0..51841f45 100644
--- a/doc/dwarfs-format.md
+++ b/doc/dwarfs-format.md
@@ -6,8 +6,9 @@ This document describes the DwarFS file system format, version 2.3.
 
 ## FILE STRUCTURE
 
-A DwarFS file system image is just a sequence of blocks. Each block has the
-following format:
+A DwarFS file system image is just a sequence of blocks, optionally
+prefixed by a "header", which is typically some sort of shell script.
+Each block has the following format:
 
          ┌───┬───┬───┬───┬───┬───┬───┬───┐
     0x00 │'D'│'W'│'A'│'R'│'F'│'S'│MAJ│MIN│  MAJ=0x02, MIN=0x03 for v2.3
@@ -63,9 +64,24 @@ A couple of notes:
   larger than the one it supports. However, a new program will still
   read all file systems with a smaller minor version number.
 
+### Header Detection
+
+In order to access the file system data when it is prefixed by a header,
+the size of the header must be known. It can either be given to the
+tools or the FUSE driver explicitly (using e.g. the `--image-offset` or
+`-o offset` options), or it can be determined automatically (by passing
+`auto` as the argument to the aforementioned options).
+
+Automatic detection works by scanning the file for the section header
+magic (`DWARFS`) and validating the match by looking up the second
+section header using the length of the first section and also checking
+its magic. It is rather unlikely that a file is created accidentally
+that would pass this check, although one could be crafted manually
+without any problems.
+
 ### Section Types
 
-There are currently 3 different section types.
+There are currently 4 different section types.
 
 - `BLOCK` (0):
   A block of data. This is where all file data is stored. There can be
@@ -81,6 +97,23 @@ There are currently 3 different section types.
   each list and structure depends on the actual data and is stored
   separately in `METADATA_V2_SCHEMA`.
 
+- `SECTION_INDEX` (9):
+  The section index is, well, an index of all sections in the file
+  system. If present (creation of the index can be suppressed with
+  `--no-section-index`), this is *required* to be the last section.
+  Each entry in the section index is a 64-bit value with the upper
+  16 bits being the section type and the lower 48 bits being the
+  offset relative to the first section. That is, the section index
+  is independent of whether or not a header is present before the
+  first section. The whole point of the section index is to avoid
+  having to build an index by visiting all section headers.
+  In order to find the start of the section index, you only have
+  to read the last 64-bit value from the file, check if the upper
+  16 bits match the `SECTION_INDEX`, then add the image offset
+  (header size) to the lower 48 bits. At that position in the
+  file, you should find a valid section header for the section
+  index.
+
 ## METADATA FORMAT
 
 Here is a high-level overview of how all the bits and pieces relate