233 Commits

Author SHA1 Message Date
Matthieu Gautier
af0c93f1df Update to new organization of warc2zim.
Older `warc2zim` method is now named `main`.
2024-01-12 12:17:35 +01:00
benoit74
cd6a55b179
Merge pull request #263 from openzim/cleanup 2024-01-08 17:13:26 +01:00
Matthieu Gautier
f80dbd11d9 Remove unwanted file.
Sound like a vim miss-manipulation.
2024-01-08 16:42:28 +01:00
rgaudin
a62f31ed0d
Merge pull request #254 from openzim/collections_param
Collections and temporary directory parameters
2023-11-23 14:50:35 +00:00
benoit74
d6c0c6ce63
Fixes following review + we need to create on subdir per run to not mix data / cleanup correctly afer run 2023-11-23 13:08:45 +01:00
benoit74
a2b4c71ec9
Display warc2zim call args 2023-11-23 09:02:33 +01:00
benoit74
b98e8f7027
Fix handling of '--collection' parameter + add '--tmp' + enhance logging 2023-11-23 09:02:08 +01:00
benoit74
79d5f8bc7b
Tidy code automatically 2023-11-23 08:50:59 +01:00
benoit74
216ac09d8c
Enhance .gitignore with toptal generated one 2023-11-23 08:48:00 +01:00
benoit74
51ef841836
Prepare next release 2023-11-17 11:30:37 +01:00
benoit74
6e6c0e8b39
Release 1.6.2 v1.6.2 2023-11-17 11:25:09 +01:00
benoit74
7ca08791e7
Upgrade to browsertrix crawler 0.12.3 2023-11-17 11:17:41 +01:00
rgaudin
5512e814c7
Merge pull request #249 from openzim/fix_readme
Enhance README by removing Chrome and headless reference
2023-11-16 12:56:46 +00:00
benoit74
60b970f844
Enhance README by removing Chrome and headless reference 2023-11-16 13:14:11 +01:00
rgaudin
99ca5ca901
Merge pull request #246 from openzim/fix_zero_arg
Fix zero arg + crawler 0.12.2
2023-11-16 09:15:25 +00:00
benoit74
51d0409128
Add venv to gitignore 2023-11-16 08:22:23 +01:00
benoit74
4ad41a7d54
Upgrade to browsertrix crawler 0.12.2 2023-11-15 15:26:49 +01:00
benoit74
d24775d70c
Fix logic passing args to crawler
- do not set arg only if value is None or False
- remove default value 0 from args (this was not passed but would be
  with new corrected code and would induce a different crawler behavior in fact)
2023-11-15 15:26:18 +01:00
benoit74
a73114d140
Release Browsertrix 0.12.1 v1.6.1 2023-11-06 10:00:03 +01:00
benoit74
c98e4505a8
Prepare next release 2023-11-02 21:10:28 +01:00
benoit74
9e9140690d
Revert temporary fix in tests now that UA has been fixed v1.6.0 2023-11-02 21:05:23 +01:00
benoit74
31a652c8dd
Fix again number of items 2023-11-02 20:59:05 +01:00
benoit74
36ba61b0a5
Release v1.6.0 2023-11-02 20:54:07 +01:00
benoit74
24a250f0ee
Fix number of items for warc2zim since move to Brave changed this 2023-10-30 11:56:06 +01:00
benoit74
56fb86e531
Update to browsertrix crawler 0.12.0-beta2 2023-10-30 11:25:58 +01:00
rgaudin
e0a4d3ffef
Merge pull request #229 from openzim/user_agent
Revisit check-url behavior and provide User-Agent a custom default value
2023-10-26 09:02:09 +00:00
benoit74
b89f57b843
Fix line length 2023-10-26 08:33:06 +02:00
benoit74
18ed6ca540
Always pass UserAgent even when mobileDevice is set 2023-10-24 20:44:06 +02:00
benoit74
d487d658a4
Use GET instead of HEAD for greater compatibility + close the connection automatically 2023-10-23 14:13:07 +02:00
benoit74
2a317c91e4
User-Agent has a default and is used for check_url 2023-10-23 13:45:26 +02:00
benoit74
f22bb9218c
Merge pull request #226 from openzim/check_url_fail
Fail on all HTTP error codes in check_url
2023-10-23 11:14:41 +02:00
benoit74
d8f6cef7f3
Fail on all HTTP error codes in check_url 2023-10-23 11:09:16 +02:00
renaud gaudin
00051453e1
releasing 1.5.3 with crawler 0.11.2 v1.5.3 2023-10-02 10:51:06 +00:00
renaud gaudin
3769c77cd4
releasing with crawler 0.11.1 v1.5.2 2023-09-19 09:04:23 +00:00
benoit74
df2403c6dd
Update CHANGELOG.md 2023-09-18 16:16:12 +02:00
renaud gaudin
2be5562553
releasing 1.5.1 with updated crawler and warc2zim v1.5.1 2023-09-18 08:28:09 +00:00
renaud gaudin
ea210bcd10
Using main warc2zim 2023-09-11 10:43:28 +00:00
rgaudin
da055a828d
Merge pull request #212 from openzim/no_empty_stats_file
Do not create empty stats file
2023-08-30 10:13:42 +00:00
benoit74
7e24388820
Do not create empty stats file 2023-08-28 13:10:07 +02:00
renaud gaudin
12dab25e61
v1.5.0 with --long-description v1.5.0 2023-08-23 16:33:46 +00:00
renaud gaudin
951241d8bf
releasing 1.4.1 with crawler 0.10.4 and warc2zim 1.5.3 v1.4.1 2023-08-23 12:16:19 +00:00
renaud gaudin
df0fa9bbaf
releasing 1.4.1 with crawler 0.10.4 2023-08-23 12:15:01 +00:00
renaud gaudin
c9c7e7a26f
Fixed #178: publish images for arm64 2023-08-23 12:14:12 +00:00
renaud gaudin
1224476b41
crawler 0.10.3 and main warc2zim 2023-08-10 18:51:19 +00:00
renaud gaudin
e590e851be
fixed device list source 2023-08-07 10:12:39 +00:00
renaud gaudin
906161ea51
fixed changelog (for 1.4.0) 2023-08-02 14:47:23 +00:00
renaud gaudin
cbaaa77a1f
releasing 1.4.0 v1.4.0 2023-08-02 14:42:10 +00:00
rgaudin
7cb118eaeb
Merge pull request #201 from openzim/lang
crawler 0.10.2
2023-08-02 14:35:52 +00:00
renaud gaudin
722306d3bf
Using a dedicated venv for zimit in image
zimit dependencies conflicts with crawler's python ones
2023-08-02 14:31:42 +00:00
renaud gaudin
61dc792653
Fixed #191: --lang to crawler, --zim-lang to warc2zim 2023-08-02 11:26:47 +00:00