benoit74
097613de29
Add test checking that expected entries are present
2024-08-07 09:38:08 +00:00
benoit74
4c35836395
Merge pull request #347 from openzim/fix_readme
...
Fix README and Dockerfile for imprecisions
2024-08-07 11:35:42 +02:00
benoit74
6e3951dfa7
Fix README and Dockerfile for imprecisions ( #314 )
2024-08-07 09:32:37 +00:00
benoit74
ea7653ef37
Merge pull request #346 from openzim/custom_behaviors
...
Add support for custom behaviors configuration
2024-08-07 11:31:57 +02:00
benoit74
80b6b26782
Add support for custom behaviors configuration
2024-08-07 09:28:07 +00:00
benoit74
6ab3401fa2
Merge pull request #345 from openzim/profile_is_url_doc
...
Make it clear that --profile argument can be an HTTP(S) URL
2024-08-07 11:26:55 +02:00
benoit74
a1efe8dccf
Make it clear that --profile argument can be an HTTP(S) URL (and not only a path)
2024-08-07 09:16:19 +00:00
benoit74
526019e095
Prepare for 2.0.7
2024-08-02 08:46:59 +00:00
benoit74
2452e60d9d
Release 2.0.6
v2.0.6
2024-08-02 08:17:58 +00:00
benoit74
dee57a8dd8
Merge pull request #363 from openzim/browsertrix_1_2_6
...
Upgrade to Browsertrix Crawler 1.2.6
2024-08-02 10:15:47 +02:00
benoit74
c92782bea0
Upgrade to Browsertrix Crawler 1.2.6
2024-08-02 08:07:46 +00:00
benoit74
7305f70300
Prepare for 2.0.6
2024-07-24 06:39:21 +00:00
benoit74
021654e6b3
Release 2.0.5
v2.0.5
2024-07-24 06:37:27 +00:00
benoit74
7357b1f2ce
Merge pull request #358 from openzim/prepare_release
...
Upgrade to Browsertrix Crawler 1.2.5 and warc2zim 2.0.3
2024-07-24 07:41:17 +02:00
benoit74
8a64216ac0
Upgrade to warc2zim 2.0.3
2024-07-24 05:35:55 +00:00
benoit74
9d43636559
Upgrade to Browsertrix Crawler 1.2.5
2024-07-24 05:34:25 +00:00
benoit74
52e225619e
Merge pull request #350 from openzim/faq
...
Add link to the FAQ in README
2024-07-22 09:21:30 +02:00
Emmanuel Engelhart
3dc7327fb0
Add link to the FAQ in README
2024-07-20 12:12:50 +02:00
benoit74
dcd6427b8a
Prepare for 2.0.5
2024-07-15 08:58:03 +00:00
benoit74
fbd01a77ce
Release 2.0.4
v2.0.4
2024-07-15 08:52:48 +00:00
benoit74
24fdbe19d9
Merge pull request #341 from openzim/crawler_1_2_4
...
Upgrade to Browsertrix Crawler 1.2.4
2024-07-15 09:51:07 +02:00
benoit74
636a6a6d28
Upgrade to Browsertrix Crawler 1.2.4
2024-07-15 05:42:28 +00:00
benoit74
91a53f70ec
Prepare for 2.0.4
2024-06-24 07:56:35 +00:00
benoit74
e8995a9f59
Release 2.0.3
v2.0.3
2024-06-24 07:50:13 +00:00
benoit74
4265effe91
Merge pull request #326 from openzim/fix_youtube
...
Upgrade to crawler 1.2.0
2024-06-24 09:04:36 +02:00
benoit74
2be5650a8c
Upgrade to crawler 1.2.0
2024-06-24 06:48:38 +00:00
benoit74
de0720e301
Prepare for 2.0.3
2024-06-18 14:05:47 +00:00
benoit74
b73a3e04d0
Release 2.0.2
v2.0.2
2024-06-18 13:44:13 +00:00
benoit74
2f50db874d
Upgrade dependencies
2024-06-18 13:42:05 +00:00
benoit74
68f2ed14d6
Upgrade to warc2zim 2.0.2
2024-06-18 13:40:23 +00:00
benoit74
baa0d9ecc7
Prepare for next release
2024-06-13 11:42:17 +00:00
benoit74
2835c7b078
Release 2.0.1
v2.0.1
2024-06-13 11:32:13 +00:00
benoit74
e6a6560b85
Merge pull request #318 from openzim/upgrade_deps
...
Upgrade dependencies
2024-06-13 12:28:45 +02:00
benoit74
77747ec1d3
Upgrade dependencies
2024-06-13 10:26:04 +00:00
benoit74
c67ccb9528
Allow to run dev image update manually + use main warc2zim branch for zimit dev versions
2024-06-04 15:17:33 +00:00
benoit74
83690f410d
Prepare for 2.1.0
2024-06-04 15:14:43 +00:00
benoit74
d8e6d55f87
Release 2.0.0
v2.0.0
2024-06-03 19:59:04 +00:00
benoit74
ae6e5ffaf6
Merge pull request #309 from openzim/wait_until_choices
...
Fix `--waitUntil` crawler options
2024-06-03 17:17:34 +02:00
benoit74
59057bdbb1
Fix documentation about --waitUntil allowed values and drop choices checks
...
- add networkidle0, networkidle2 and drop networkidle to reflect crawler
changes
- drop choices check since this is anyway checked right at scraper start
in crawler startup (this ensures to be more permissive should one want
to use a different crawler version that the one supported in Docker
image)
2024-06-03 15:11:48 +00:00
benoit74
7806aeba63
Merge pull request #310 from openzim/invalid_user_agent
...
Strip user-agent whitespaces and ignore empty user agents
2024-06-03 17:11:16 +02:00
benoit74
936666917c
Strip user-agent leading whitespaces and ignore empty user agents
2024-06-03 15:06:39 +00:00
benoit74
957e52c57f
Rebuild with warc2zim 2.0.0-dev9
2024-05-30 09:29:48 +00:00
benoit74
36f2fa5f2b
Rebuild with warc2zim 2.0.0-dev8
2024-05-27 08:56:32 +00:00
benoit74
8fdad5954e
Bump Github CI Actions versions
2024-05-24 14:16:53 +00:00
benoit74
9e6c998816
Bump zimit to 2.0.0-dev5 + use warc2zim2 branch + remove zimit2 image workflow
2024-05-24 14:10:19 +00:00
benoit74
4cf6e01669
Bump browsertrix crawler to 1.1.3
2024-05-24 14:07:40 +00:00
benoit74
ce49a5d4e9
Merge branch 'zimit2'
2024-05-24 14:07:05 +00:00
benoit74
1d54b20873
Upgrade to warc2zim 2.0.0-dev6
2024-05-06 09:55:38 +00:00
benoit74
9a7415a402
Upgrade to Browsertrix Crawler 1.1.1
...
Continue to use warc2zim 2.0.0-dev5 for now, Docker build issue with new
stuff in warc2zim 2.0.0-dev6, will be fixed later on
2024-05-06 06:00:14 +00:00
benoit74
d54aa22bb2
Upgrade to Browsertrix Crawler 1.1.0
2024-04-19 12:30:53 +00:00