benoit74
d24775d70c
Fix logic passing args to crawler
...
- do not set arg only if value is None or False
- remove default value 0 from args (this was not passed but would be
with new corrected code and would induce a different crawler behavior in fact)
2023-11-15 15:26:18 +01:00
benoit74
a73114d140
Release Browsertrix 0.12.1
v1.6.1
2023-11-06 10:00:03 +01:00
benoit74
c98e4505a8
Prepare next release
2023-11-02 21:10:28 +01:00
benoit74
9e9140690d
Revert temporary fix in tests now that UA has been fixed
v1.6.0
2023-11-02 21:05:23 +01:00
benoit74
31a652c8dd
Fix again number of items
2023-11-02 20:59:05 +01:00
benoit74
36ba61b0a5
Release v1.6.0
2023-11-02 20:54:07 +01:00
benoit74
24a250f0ee
Fix number of items for warc2zim since move to Brave changed this
2023-10-30 11:56:06 +01:00
benoit74
56fb86e531
Update to browsertrix crawler 0.12.0-beta2
2023-10-30 11:25:58 +01:00
rgaudin
e0a4d3ffef
Merge pull request #229 from openzim/user_agent
...
Revisit check-url behavior and provide User-Agent a custom default value
2023-10-26 09:02:09 +00:00
benoit74
b89f57b843
Fix line length
2023-10-26 08:33:06 +02:00
benoit74
18ed6ca540
Always pass UserAgent even when mobileDevice is set
2023-10-24 20:44:06 +02:00
benoit74
d487d658a4
Use GET instead of HEAD for greater compatibility + close the connection automatically
2023-10-23 14:13:07 +02:00
benoit74
2a317c91e4
User-Agent has a default and is used for check_url
2023-10-23 13:45:26 +02:00
benoit74
f22bb9218c
Merge pull request #226 from openzim/check_url_fail
...
Fail on all HTTP error codes in check_url
2023-10-23 11:14:41 +02:00
benoit74
d8f6cef7f3
Fail on all HTTP error codes in check_url
2023-10-23 11:09:16 +02:00
renaud gaudin
00051453e1
releasing 1.5.3 with crawler 0.11.2
v1.5.3
2023-10-02 10:51:06 +00:00
renaud gaudin
3769c77cd4
releasing with crawler 0.11.1
v1.5.2
2023-09-19 09:04:23 +00:00
benoit74
df2403c6dd
Update CHANGELOG.md
2023-09-18 16:16:12 +02:00
renaud gaudin
2be5562553
releasing 1.5.1 with updated crawler and warc2zim
v1.5.1
2023-09-18 08:28:09 +00:00
renaud gaudin
ea210bcd10
Using main warc2zim
2023-09-11 10:43:28 +00:00
rgaudin
da055a828d
Merge pull request #212 from openzim/no_empty_stats_file
...
Do not create empty stats file
2023-08-30 10:13:42 +00:00
benoit74
7e24388820
Do not create empty stats file
2023-08-28 13:10:07 +02:00
renaud gaudin
12dab25e61
v1.5.0 with --long-description
v1.5.0
2023-08-23 16:33:46 +00:00
renaud gaudin
951241d8bf
releasing 1.4.1 with crawler 0.10.4 and warc2zim 1.5.3
v1.4.1
2023-08-23 12:16:19 +00:00
renaud gaudin
df0fa9bbaf
releasing 1.4.1 with crawler 0.10.4
2023-08-23 12:15:01 +00:00
renaud gaudin
c9c7e7a26f
Fixed #178 : publish images for arm64
2023-08-23 12:14:12 +00:00
renaud gaudin
1224476b41
crawler 0.10.3 and main warc2zim
2023-08-10 18:51:19 +00:00
renaud gaudin
e590e851be
fixed device list source
2023-08-07 10:12:39 +00:00
renaud gaudin
906161ea51
fixed changelog (for 1.4.0)
2023-08-02 14:47:23 +00:00
renaud gaudin
cbaaa77a1f
releasing 1.4.0
v1.4.0
2023-08-02 14:42:10 +00:00
rgaudin
7cb118eaeb
Merge pull request #201 from openzim/lang
...
crawler 0.10.2
2023-08-02 14:35:52 +00:00
renaud gaudin
722306d3bf
Using a dedicated venv for zimit in image
...
zimit dependencies conflicts with crawler's python ones
2023-08-02 14:31:42 +00:00
renaud gaudin
61dc792653
Fixed #191 : --lang to crawler, --zim-lang to warc2zim
2023-08-02 11:26:47 +00:00
renaud gaudin
941db5fdfc
using crawler 0.10.2
2023-08-02 11:26:42 +00:00
rgaudin
47ede96f91
Merge pull request #198 from f0sh/add-config
...
added crawler --config option to arguments
2023-07-17 09:34:41 +00:00
f0sh
95c27bad08
added crawler config option to arguments
...
according to https://github.com/webrecorder/browsertrix-crawler#yaml-crawl-config the crawler can be configured with a yaml config files
which gives more options to configure the crawler to your needs without implementing all the options into zimit.py.
2023-07-17 09:29:45 +00:00
Popo le Chien
57e2f41439
Merge pull request #199 from yukiqt/patch-1
...
minor spelling mistake
2023-07-13 15:18:22 +02:00
yuki
b568848a98
minor spelling mistake
...
i win
2023-07-13 12:49:34 +00:00
renaud gaudin
af8196095d
using 0.10.0-beta.4
2023-05-23 08:10:03 +00:00
renaud gaudin
70a80681a6
use bet3 and --failOnFailedSeed
2023-05-22 11:23:46 +00:00
renaud gaudin
8d287466bd
Make sure to rebuild warc2zim from main to use unminified
2023-05-22 09:57:51 +00:00
renaud gaudin
fc9ad3759e
account for new failed field in crawl.json
2023-04-27 11:56:14 +00:00
renaud gaudin
c31e80608e
Using browsertrix-crawler 0.10.0-beta.0
2023-04-27 11:35:14 +00:00
renaud gaudin
8b4ea950a8
Using browsertrix-crawler 0.9.1
2023-04-25 08:53:52 +00:00
renaud gaudin
8ecd0a3210
upgraded to browsertrix-crawler 0.9.0
2023-04-10 13:08:12 +00:00
renaud gaudin
4f676e37c7
Using browsertrix-crawler 0.9.0-beta.2
2023-04-04 08:49:25 +00:00
renaud gaudin
b7265b49b6
updated to crawler 0.9 (b1)
2023-03-24 07:26:33 +00:00
renaud gaudin
b8714d1260
removed references to docker.io
2023-03-22 13:55:07 +00:00
renaud gaudin
6324b7c7c5
Fixed #172 : Disabled Chrome updates to prevent incidental inclusion of update data in WARC/ZIM
2023-03-10 12:10:06 +00:00
renaud gaudin
238d1a6016
using crawler 0.8.1 and warc2zim's main
2023-02-27 09:57:36 +00:00