mirror of
https://github.com/openzim/zimit.git
synced 2025-09-22 11:22:23 -04:00
Merge pull request #249 from openzim/fix_readme
Enhance README by removing Chrome and headless reference
This commit is contained in:
commit
5512e814c7
@ -13,10 +13,11 @@ Zimit is a scraper allowing to create ZIM file from any Web site.
|
||||
Technical background
|
||||
--------------------
|
||||
|
||||
This version of Zimit runs a single-site headless-Chrome based crawl in a Docker container and produces a ZIM of the crawled content.
|
||||
Zimit runs a fully automated browser-based crawl of a website property and produces a ZIM of the crawled content. Zimit runs in a Docker container.
|
||||
|
||||
The system extends the crawling system in [Browsertrix Crawler](https://github.com/webrecorder/browsertrix-crawler) and converts
|
||||
the crawled WARC files to ZIM using [warc2zim](https://github.com/openzim/warc2zim)
|
||||
The system:
|
||||
- runs a website crawl with [Browsertrix Crawler](https://github.com/webrecorder/browsertrix-crawler), which produces WARC files
|
||||
- converts the crawled WARC files to a single ZIM using [warc2zim](https://github.com/openzim/warc2zim)
|
||||
|
||||
The `zimit.py` is the entrypoint for the system.
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user