Update README.md

This commit is contained in:
Ilya Kreymer 2020-09-19 15:53:23 -07:00 committed by GitHub
parent 4e04645e6b
commit 9b23de828b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -17,24 +17,32 @@ After the crawl is done, warc2zim is used to write a zim to the `/output` direct
`zimit` is intended to be run in Docker. `zimit` is intended to be run in Docker.
The following is an example usage. The `--cap-add` and `--shm-size` flags are needed for Chrome. To build locally run:
```
docker build -t openzim/zimit .
```
The image accepts the following parameters: The image accepts the following parameters:
- "<URL>" - the url to be crawled (required) - `URL` - the url to be crawled (required)
- `--workers N` - number of crawl workers to be run in parallel - `--workers N` - number of crawl workers to be run in parallel
- `--wait-until` - Puppeteer setting for how long to wait for page load. See [page.goto waitUntil options](https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagegotourl-options). The default is `load`, but for static sites, `--wait-until domcontentloaded` may be used to speed up the crawl (to avoid waiting for ads to load for example). - `--wait-until` - Puppeteer setting for how long to wait for page load. See [page.goto waitUntil options](https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagegotourl-options). The default is `load`, but for static sites, `--wait-until domcontentloaded` may be used to speed up the crawl (to avoid waiting for ads to load for example).
- `--name` - Name of ZIM file (defaults to the hostname of the URL) - `--name` - Name of ZIM file (defaults to the hostname of the URL)
- `--output` - output directory (defaults to `/output`) - `--output` - output directory (defaults to `/output`)
The following is an example usage. The `--cap-add` and `--shm-size` flags are needed to run Chrome in Docker.
Example command: Example command:
``` ```
docker run -v /output:/output --cap-add=SYS_ADMIN --cap-add=NET_ADMIN --shm-size=1gb openzim/zimit "<URL>" --name myzimfile --workers 2 --wait-until domcontentloaded docker run -v /output:/output --cap-add=SYS_ADMIN --cap-add=NET_ADMIN --shm-size=1gb openzim/zimit URL --name myzimfile --workers 2 --wait-until domcontentloaded
``` ```
The puppeteer-cluster provides monitoring output which is enabled by default and prints the crawl status to the Docker log.
<hr> <hr>
## Previous version ## Previous version