12 Commits

Author SHA1 Message Date
benoit74
101fb71a0b
Better processing of crawler exit codes with soft/hard limits 2025-02-13 10:51:14 +00:00
benoit74
3a7f583a96
Upgrade to Browsertrix Crawler 1.5.3
Include restore of total number of pages, following upstream fix.
2025-02-13 10:44:20 +00:00
benoit74
6ec53f774f
Upgrade to Browsertrix Crawler 1.5.1 2025-02-07 08:24:27 +00:00
benoit74
9396cf1ca0
Alter crawl statistics following 1.5.0 release 2025-02-06 13:39:33 +00:00
benoit74
0f136d2f2f
Upgrade Python 3.13, Crawler 1.5.0 and others 2025-02-06 13:39:32 +00:00
benoit74
8d42a8dd93
Move integration tests to test website 2025-01-09 10:41:05 +00:00
benoit74
861751a7ed
Stop fetching and passing browsertrix crawler version as scraperSuffix to warc2zim 2024-08-07 12:06:43 +00:00
benoit74
097613de29
Add test checking that expected entries are present 2024-08-07 09:38:08 +00:00
benoit74
456219deb3
Fix tests, there are in fact only 7 items to be pushed to the ZIM
7 entries are expected:
https://isago.rskg.org/
https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css
https://isago.rskg.org/static/favicon256.png
https://isago.rskg.org/conseils
https://isago.rskg.org/faq
https://isago.rskg.org/a-propos
https://isago.rskg.org/static/tarifs-isago.pdf

1 unexpected entry is not produced anymore by Browsertrix crawler:
https://dict.brave.com/edgedl/chrome/dict/en-us-10-1.bdic

This was a technical artifact
2024-03-07 10:16:51 +00:00
benoit74
49da57c5b6
fixup! Set zimit and browsertrix crawler versions in final ZIM 'Scraper' metadata 2024-02-05 14:33:38 +01:00
benoit74
9244f2e69c
Set zimit and browsertrix crawler versions in final ZIM 'Scraper' metadata 2024-01-31 15:10:08 +01:00
benoit74
c0ffb74d8c
Adopt Python bootstrap conventions 2024-01-18 13:31:00 +01:00