From 60b970f844497700105377fdf8f6e2ea0198cb88 Mon Sep 17 00:00:00 2001 From: benoit74 Date: Thu, 16 Nov 2023 13:13:31 +0100 Subject: [PATCH] Enhance README by removing Chrome and headless reference --- README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index c98d739..cdf0966 100644 --- a/README.md +++ b/README.md @@ -13,10 +13,11 @@ Zimit is a scraper allowing to create ZIM file from any Web site. Technical background -------------------- -This version of Zimit runs a single-site headless-Chrome based crawl in a Docker container and produces a ZIM of the crawled content. +Zimit runs a fully automated browser-based crawl of a website property and produces a ZIM of the crawled content. Zimit runs in a Docker container. -The system extends the crawling system in [Browsertrix Crawler](https://github.com/webrecorder/browsertrix-crawler) and converts -the crawled WARC files to ZIM using [warc2zim](https://github.com/openzim/warc2zim) +The system: +- runs a website crawl with [Browsertrix Crawler](https://github.com/webrecorder/browsertrix-crawler), which produces WARC files +- converts the crawled WARC files to a single ZIM using [warc2zim](https://github.com/openzim/warc2zim) The `zimit.py` is the entrypoint for the system.