From 9b23de828be02cef592827151a99a36e3654fb9f Mon Sep 17 00:00:00 2001 From: Ilya Kreymer Date: Sat, 19 Sep 2020 15:53:23 -0700 Subject: [PATCH] Update README.md --- README.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 4c47f89..0823579 100644 --- a/README.md +++ b/README.md @@ -17,24 +17,32 @@ After the crawl is done, warc2zim is used to write a zim to the `/output` direct `zimit` is intended to be run in Docker. -The following is an example usage. The `--cap-add` and `--shm-size` flags are needed for Chrome. +To build locally run: + +``` +docker build -t openzim/zimit . +``` The image accepts the following parameters: -- "" - the url to be crawled (required) +- `URL` - the url to be crawled (required) - `--workers N` - number of crawl workers to be run in parallel - `--wait-until` - Puppeteer setting for how long to wait for page load. See [page.goto waitUntil options](https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagegotourl-options). The default is `load`, but for static sites, `--wait-until domcontentloaded` may be used to speed up the crawl (to avoid waiting for ads to load for example). - `--name` - Name of ZIM file (defaults to the hostname of the URL) - `--output` - output directory (defaults to `/output`) - +The following is an example usage. The `--cap-add` and `--shm-size` flags are needed to run Chrome in Docker. Example command: ``` -docker run -v /output:/output --cap-add=SYS_ADMIN --cap-add=NET_ADMIN --shm-size=1gb openzim/zimit "" --name myzimfile --workers 2 --wait-until domcontentloaded +docker run -v /output:/output --cap-add=SYS_ADMIN --cap-add=NET_ADMIN --shm-size=1gb openzim/zimit URL --name myzimfile --workers 2 --wait-until domcontentloaded ``` +The puppeteer-cluster provides monitoring output which is enabled by default and prints the crawl status to the Docker log. + + +
## Previous version