benoit74
c0ffb74d8c
Adopt Python bootstrap conventions
2024-01-18 13:31:00 +01:00
benoit74
343fb7e770
Replace warning about service workers by a nota bene about there removal since 2.x
2024-01-18 13:28:11 +01:00
benoit74
909b6e3da8
Merge branch 'main' into zimit2
2024-01-18 09:27:00 +01:00
benoit74
f46f2568ff
Prepare for next release
2024-01-18 09:16:18 +01:00
benoit74
19b4898326
Release 1.6.3
v1.6.3
2024-01-18 09:12:36 +01:00
benoit74
10471c1ea9
Merge pull request #269 from openzim/prepare_1_6_3
2024-01-18 09:10:04 +01:00
benoit74
eebf26f7cb
Upgrade to browsertrix crawler 0.12.4 and warc2zim 1.5.5
2024-01-18 09:05:06 +01:00
benoit74
27f9dcc53f
Empty commit to release warc2zim2 commit aca2db3
2024-01-15 17:45:56 +01:00
benoit74
22551388e0
Merge pull request #264 from openzim/use_warc2zim2
2024-01-15 08:30:32 +01:00
benoit74
a352c0c402
Add temporary Github Actions workflow to build zimit2 image
2024-01-15 08:06:50 +01:00
benoit74
e034b08852
Update CHANGELOG
2024-01-15 08:06:50 +01:00
Matthieu Gautier
1c58bbe303
Adapt to warc2zim2
branch of warc2zim.
...
`warc2zim2` branch create zim files without service worker.
2024-01-15 08:00:05 +01:00
benoit74
eab3d1f189
Merge pull request #262 from openzim/warc2zim_update
2024-01-15 07:59:05 +01:00
benoit74
bbc8a48bc9
Update CHANGELOG
2024-01-15 07:55:53 +01:00
Matthieu Gautier
7bc0ed9c02
Use main branch of warc2zim in dockerfile instead of released version.
...
This PR adapt to API changed made in main branch of warc2zim, so we must
use it instead of released version.
2024-01-14 10:32:52 +01:00
Matthieu Gautier
af0c93f1df
Update to new organization of warc2zim.
...
Older `warc2zim` method is now named `main`.
2024-01-12 12:17:35 +01:00
benoit74
cd6a55b179
Merge pull request #263 from openzim/cleanup
2024-01-08 17:13:26 +01:00
Matthieu Gautier
f80dbd11d9
Remove unwanted file.
...
Sound like a vim miss-manipulation.
2024-01-08 16:42:28 +01:00
rgaudin
a62f31ed0d
Merge pull request #254 from openzim/collections_param
...
Collections and temporary directory parameters
2023-11-23 14:50:35 +00:00
benoit74
d6c0c6ce63
Fixes following review + we need to create on subdir per run to not mix data / cleanup correctly afer run
2023-11-23 13:08:45 +01:00
benoit74
a2b4c71ec9
Display warc2zim call args
2023-11-23 09:02:33 +01:00
benoit74
b98e8f7027
Fix handling of '--collection' parameter + add '--tmp' + enhance logging
2023-11-23 09:02:08 +01:00
benoit74
79d5f8bc7b
Tidy code automatically
2023-11-23 08:50:59 +01:00
benoit74
216ac09d8c
Enhance .gitignore with toptal generated one
2023-11-23 08:48:00 +01:00
benoit74
51ef841836
Prepare next release
2023-11-17 11:30:37 +01:00
benoit74
6e6c0e8b39
Release 1.6.2
v1.6.2
2023-11-17 11:25:09 +01:00
benoit74
7ca08791e7
Upgrade to browsertrix crawler 0.12.3
2023-11-17 11:17:41 +01:00
rgaudin
5512e814c7
Merge pull request #249 from openzim/fix_readme
...
Enhance README by removing Chrome and headless reference
2023-11-16 12:56:46 +00:00
benoit74
60b970f844
Enhance README by removing Chrome and headless reference
2023-11-16 13:14:11 +01:00
rgaudin
99ca5ca901
Merge pull request #246 from openzim/fix_zero_arg
...
Fix zero arg + crawler 0.12.2
2023-11-16 09:15:25 +00:00
benoit74
51d0409128
Add venv to gitignore
2023-11-16 08:22:23 +01:00
benoit74
4ad41a7d54
Upgrade to browsertrix crawler 0.12.2
2023-11-15 15:26:49 +01:00
benoit74
d24775d70c
Fix logic passing args to crawler
...
- do not set arg only if value is None or False
- remove default value 0 from args (this was not passed but would be
with new corrected code and would induce a different crawler behavior in fact)
2023-11-15 15:26:18 +01:00
benoit74
a73114d140
Release Browsertrix 0.12.1
v1.6.1
2023-11-06 10:00:03 +01:00
benoit74
c98e4505a8
Prepare next release
2023-11-02 21:10:28 +01:00
benoit74
9e9140690d
Revert temporary fix in tests now that UA has been fixed
v1.6.0
2023-11-02 21:05:23 +01:00
benoit74
31a652c8dd
Fix again number of items
2023-11-02 20:59:05 +01:00
benoit74
36ba61b0a5
Release v1.6.0
2023-11-02 20:54:07 +01:00
benoit74
24a250f0ee
Fix number of items for warc2zim since move to Brave changed this
2023-10-30 11:56:06 +01:00
benoit74
56fb86e531
Update to browsertrix crawler 0.12.0-beta2
2023-10-30 11:25:58 +01:00
rgaudin
e0a4d3ffef
Merge pull request #229 from openzim/user_agent
...
Revisit check-url behavior and provide User-Agent a custom default value
2023-10-26 09:02:09 +00:00
benoit74
b89f57b843
Fix line length
2023-10-26 08:33:06 +02:00
benoit74
18ed6ca540
Always pass UserAgent even when mobileDevice is set
2023-10-24 20:44:06 +02:00
benoit74
d487d658a4
Use GET instead of HEAD for greater compatibility + close the connection automatically
2023-10-23 14:13:07 +02:00
benoit74
2a317c91e4
User-Agent has a default and is used for check_url
2023-10-23 13:45:26 +02:00
benoit74
f22bb9218c
Merge pull request #226 from openzim/check_url_fail
...
Fail on all HTTP error codes in check_url
2023-10-23 11:14:41 +02:00
benoit74
d8f6cef7f3
Fail on all HTTP error codes in check_url
2023-10-23 11:09:16 +02:00
renaud gaudin
00051453e1
releasing 1.5.3 with crawler 0.11.2
v1.5.3
2023-10-02 10:51:06 +00:00
renaud gaudin
3769c77cd4
releasing with crawler 0.11.1
v1.5.2
2023-09-19 09:04:23 +00:00
benoit74
df2403c6dd
Update CHANGELOG.md
2023-09-18 16:16:12 +02:00