benoit74
7305f70300
Prepare for 2.0.6
2024-07-24 06:39:21 +00:00
benoit74
021654e6b3
Release 2.0.5
v2.0.5
2024-07-24 06:37:27 +00:00
benoit74
7357b1f2ce
Merge pull request #358 from openzim/prepare_release
...
Upgrade to Browsertrix Crawler 1.2.5 and warc2zim 2.0.3
2024-07-24 07:41:17 +02:00
benoit74
8a64216ac0
Upgrade to warc2zim 2.0.3
2024-07-24 05:35:55 +00:00
benoit74
9d43636559
Upgrade to Browsertrix Crawler 1.2.5
2024-07-24 05:34:25 +00:00
benoit74
52e225619e
Merge pull request #350 from openzim/faq
...
Add link to the FAQ in README
2024-07-22 09:21:30 +02:00
Emmanuel Engelhart
3dc7327fb0
Add link to the FAQ in README
2024-07-20 12:12:50 +02:00
benoit74
dcd6427b8a
Prepare for 2.0.5
2024-07-15 08:58:03 +00:00
benoit74
fbd01a77ce
Release 2.0.4
v2.0.4
2024-07-15 08:52:48 +00:00
benoit74
24fdbe19d9
Merge pull request #341 from openzim/crawler_1_2_4
...
Upgrade to Browsertrix Crawler 1.2.4
2024-07-15 09:51:07 +02:00
benoit74
636a6a6d28
Upgrade to Browsertrix Crawler 1.2.4
2024-07-15 05:42:28 +00:00
benoit74
91a53f70ec
Prepare for 2.0.4
2024-06-24 07:56:35 +00:00
benoit74
e8995a9f59
Release 2.0.3
v2.0.3
2024-06-24 07:50:13 +00:00
benoit74
4265effe91
Merge pull request #326 from openzim/fix_youtube
...
Upgrade to crawler 1.2.0
2024-06-24 09:04:36 +02:00
benoit74
2be5650a8c
Upgrade to crawler 1.2.0
2024-06-24 06:48:38 +00:00
benoit74
de0720e301
Prepare for 2.0.3
2024-06-18 14:05:47 +00:00
benoit74
b73a3e04d0
Release 2.0.2
v2.0.2
2024-06-18 13:44:13 +00:00
benoit74
2f50db874d
Upgrade dependencies
2024-06-18 13:42:05 +00:00
benoit74
68f2ed14d6
Upgrade to warc2zim 2.0.2
2024-06-18 13:40:23 +00:00
benoit74
baa0d9ecc7
Prepare for next release
2024-06-13 11:42:17 +00:00
benoit74
2835c7b078
Release 2.0.1
v2.0.1
2024-06-13 11:32:13 +00:00
benoit74
e6a6560b85
Merge pull request #318 from openzim/upgrade_deps
...
Upgrade dependencies
2024-06-13 12:28:45 +02:00
benoit74
77747ec1d3
Upgrade dependencies
2024-06-13 10:26:04 +00:00
benoit74
c67ccb9528
Allow to run dev image update manually + use main warc2zim branch for zimit dev versions
2024-06-04 15:17:33 +00:00
benoit74
83690f410d
Prepare for 2.1.0
2024-06-04 15:14:43 +00:00
benoit74
d8e6d55f87
Release 2.0.0
v2.0.0
2024-06-03 19:59:04 +00:00
benoit74
ae6e5ffaf6
Merge pull request #309 from openzim/wait_until_choices
...
Fix `--waitUntil` crawler options
2024-06-03 17:17:34 +02:00
benoit74
59057bdbb1
Fix documentation about --waitUntil allowed values and drop choices checks
...
- add networkidle0, networkidle2 and drop networkidle to reflect crawler
changes
- drop choices check since this is anyway checked right at scraper start
in crawler startup (this ensures to be more permissive should one want
to use a different crawler version that the one supported in Docker
image)
2024-06-03 15:11:48 +00:00
benoit74
7806aeba63
Merge pull request #310 from openzim/invalid_user_agent
...
Strip user-agent whitespaces and ignore empty user agents
2024-06-03 17:11:16 +02:00
benoit74
936666917c
Strip user-agent leading whitespaces and ignore empty user agents
2024-06-03 15:06:39 +00:00
benoit74
957e52c57f
Rebuild with warc2zim 2.0.0-dev9
2024-05-30 09:29:48 +00:00
benoit74
36f2fa5f2b
Rebuild with warc2zim 2.0.0-dev8
2024-05-27 08:56:32 +00:00
benoit74
8fdad5954e
Bump Github CI Actions versions
2024-05-24 14:16:53 +00:00
benoit74
9e6c998816
Bump zimit to 2.0.0-dev5 + use warc2zim2 branch + remove zimit2 image workflow
2024-05-24 14:10:19 +00:00
benoit74
4cf6e01669
Bump browsertrix crawler to 1.1.3
2024-05-24 14:07:40 +00:00
benoit74
ce49a5d4e9
Merge branch 'zimit2'
2024-05-24 14:07:05 +00:00
benoit74
1d54b20873
Upgrade to warc2zim 2.0.0-dev6
2024-05-06 09:55:38 +00:00
benoit74
9a7415a402
Upgrade to Browsertrix Crawler 1.1.1
...
Continue to use warc2zim 2.0.0-dev5 for now, Docker build issue with new
stuff in warc2zim 2.0.0-dev6, will be fixed later on
2024-05-06 06:00:14 +00:00
benoit74
d54aa22bb2
Upgrade to Browsertrix Crawler 1.1.0
2024-04-19 12:30:53 +00:00
rgaudin
f637c3fccc
Merge pull request #292 from openzim/ua_not_mandatory
...
Change crawler default settings around userAgent and mobileDevice
2024-03-27 15:51:14 +00:00
benoit74
728784d6bf
Upgrade Browsertrix Crawler to 1.0.3
2024-03-27 15:08:59 +00:00
benoit74
e24479945f
Remove trailing characters when retrieving Browsertrix Crawler version
2024-03-27 15:08:58 +00:00
benoit74
3070fe9724
Rollback previous changes around the presence of a default user-agent
...
- Remove default userAgent value
- Set a default mobileDevice
- Add back comments explaining that userAgent overrides other settings
- Add back logic around the computation of the userAgentSuffix instead
of the userAgent
- Add new noMobileDevice argument to not set the default mobileDevice
2024-03-27 15:08:58 +00:00
benoit74
54732692ac
Bump dev version
2024-03-07 12:47:38 +00:00
benoit74
867d14fd00
Merge pull request #285 from openzim/crawler_beta5
...
Upgrade browsertrix crawler and remove redirect handling
2024-03-07 11:25:02 +01:00
benoit74
5c716747b4
Add CHANGELOG
2024-03-07 10:16:57 +00:00
benoit74
456219deb3
Fix tests, there are in fact only 7 items to be pushed to the ZIM
...
7 entries are expected:
https://isago.rskg.org/
https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css
https://isago.rskg.org/static/favicon256.png
https://isago.rskg.org/conseils
https://isago.rskg.org/faq
https://isago.rskg.org/a-propos
https://isago.rskg.org/static/tarifs-isago.pdf
1 unexpected entry is not produced anymore by Browsertrix crawler:
https://dict.brave.com/edgedl/chrome/dict/en-us-10-1.bdic
This was a technical artifact
2024-03-07 10:16:51 +00:00
benoit74
a9769b2871
Upgrade to crawler 1.0.0-beta6
2024-03-07 08:00:31 +00:00
benoit74
a4cb27a793
Fix clean_url method name
2024-03-07 07:59:41 +00:00
benoit74
4d31f8eabb
Remove handling of redirects which are now done by browsertrix crawler
2024-03-07 07:59:40 +00:00