From f3b76ad69b74160d1e5e39673701ecb470cfed28 Mon Sep 17 00:00:00 2001
From: Marcus Holland-Moritz <github@mhxnet.de>
Date: Mon, 7 Dec 2020 22:54:22 +0100
Subject: [PATCH] Update README with some nilsimsa data

---
 README.md | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/README.md b/README.md
index c2aa7e08..5962f41e 100644
--- a/README.md
+++ b/README.md
@@ -402,6 +402,22 @@ system with the best possible compression (`-l 9`):
 This reduces the file system size by another 18%, pushing the total
 compression ratio below 1%.
 
+You *may* be able to push things even further: there's the `nilsimsa`
+ordering option which enables a somewhat experimental LSH ordering
+scheme that's significantly slower than the default `similarity`
+scheme, but can deliver even better clustering of similar data. It
+also has the advantage that the ordering can be run while already
+compressing data, which counters the slowness of the algorithm. On
+the same Perl dataset, I was able to get these file system sizes
+without a significant change in file system build time:
+
+    $ ll perl-install-nilsimsa*.dwarfs
+    -rw-r--r-- 1 mhx users 546026189 Dec  7 21:50 perl-nilsimsa.dwarfs
+    -rw-r--r-- 1 mhx users 448614396 Dec  7 22:44 perl-nilsimsa-lzma.dwarfs
+
+That another 6-7% reduction in file system size for both the default
+ZSTD as well as the LZMA compression.
+
 In terms of how fast the file system is when using it, a quick test
 I've done is to freshly mount the filesystem created above and run
 each of the 1139 `perl` executables to print their version.