benoit74
|
101fb71a0b
|
Better processing of crawler exit codes with soft/hard limits
|
2025-02-13 10:51:14 +00:00 |
|
benoit74
|
3a7f583a96
|
Upgrade to Browsertrix Crawler 1.5.3
Include restore of total number of pages, following upstream fix.
|
2025-02-13 10:44:20 +00:00 |
|
benoit74
|
6ec53f774f
|
Upgrade to Browsertrix Crawler 1.5.1
|
2025-02-07 08:24:27 +00:00 |
|
benoit74
|
9396cf1ca0
|
Alter crawl statistics following 1.5.0 release
|
2025-02-06 13:39:33 +00:00 |
|
benoit74
|
0f136d2f2f
|
Upgrade Python 3.13, Crawler 1.5.0 and others
|
2025-02-06 13:39:32 +00:00 |
|
benoit74
|
8d42a8dd93
|
Move integration tests to test website
|
2025-01-09 10:41:05 +00:00 |
|
benoit74
|
861751a7ed
|
Stop fetching and passing browsertrix crawler version as scraperSuffix to warc2zim
|
2024-08-07 12:06:43 +00:00 |
|
benoit74
|
097613de29
|
Add test checking that expected entries are present
|
2024-08-07 09:38:08 +00:00 |
|
benoit74
|
456219deb3
|
Fix tests, there are in fact only 7 items to be pushed to the ZIM
7 entries are expected:
https://isago.rskg.org/
https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css
https://isago.rskg.org/static/favicon256.png
https://isago.rskg.org/conseils
https://isago.rskg.org/faq
https://isago.rskg.org/a-propos
https://isago.rskg.org/static/tarifs-isago.pdf
1 unexpected entry is not produced anymore by Browsertrix crawler:
https://dict.brave.com/edgedl/chrome/dict/en-us-10-1.bdic
This was a technical artifact
|
2024-03-07 10:16:51 +00:00 |
|
benoit74
|
49da57c5b6
|
fixup! Set zimit and browsertrix crawler versions in final ZIM 'Scraper' metadata
|
2024-02-05 14:33:38 +01:00 |
|
benoit74
|
9244f2e69c
|
Set zimit and browsertrix crawler versions in final ZIM 'Scraper' metadata
|
2024-01-31 15:10:08 +01:00 |
|
benoit74
|
c0ffb74d8c
|
Adopt Python bootstrap conventions
|
2024-01-18 13:31:00 +01:00 |
|