benoit74
a4cb27a793
Fix clean_url method name
2024-03-07 07:59:41 +00:00
benoit74
4d31f8eabb
Remove handling of redirects which are now done by browsertrix crawler
2024-03-07 07:59:40 +00:00
benoit74
b69f3d610f
Upgrade to crawler 1.0.0-beta5
2024-03-07 07:59:40 +00:00
benoit74
c2dc8c5ccc
Merge pull request #286 from openzim/upgrade_deps
...
Upgrade to Python 3.12, upgrade Python dependencies and add hatch-openzim plugin
2024-03-04 11:23:42 +01:00
benoit74
857ae5674d
Upgrade to Python 3.12
2024-03-01 14:03:25 +00:00
benoit74
89aea6b41e
Adopt hatch-openzim plugin
2024-03-01 14:03:24 +00:00
benoit74
a44c1a7c7f
Upgrade dependencies
2024-03-01 14:03:24 +00:00
benoit74
6ca9be48c7
Empty commit to release warc2zim2 commit 3c00da0
2024-02-16 10:03:04 +01:00
benoit74
01c5833c29
Empty commit to release warc2zim2 commit f837179
2024-02-09 11:10:57 +01:00
rgaudin
7caa355c31
Merge pull request #277 from openzim/scraper_suffix
...
Pass scraper suffix to warc2zim
2024-02-05 13:45:13 +00:00
benoit74
49da57c5b6
fixup! Set zimit and browsertrix crawler versions in final ZIM 'Scraper' metadata
2024-02-05 14:33:38 +01:00
benoit74
9244f2e69c
Set zimit and browsertrix crawler versions in final ZIM 'Scraper' metadata
2024-01-31 15:10:08 +01:00
benoit74
ef462b5024
Empty commit to release warc2zim2 commit ae18aed
2024-01-26 16:34:26 +01:00
benoit74
f4359022b2
Merge pull request #274 from openzim/add_logging
2024-01-25 08:38:35 +01:00
benoit74
a505df9fe0
Add support for --logging parameter of browsertrix crawler
2024-01-23 17:28:56 +01:00
benoit74
343d0040cf
Merge pull request #272 from openzim/adopt_bootstrap
2024-01-22 10:41:29 +01:00
benoit74
c7fdc1d11e
Simplify logger name code
2024-01-22 10:38:25 +01:00
benoit74
c0ffb74d8c
Adopt Python bootstrap conventions
2024-01-18 13:31:00 +01:00
benoit74
343fb7e770
Replace warning about service workers by a nota bene about there removal since 2.x
2024-01-18 13:28:11 +01:00
benoit74
909b6e3da8
Merge branch 'main' into zimit2
2024-01-18 09:27:00 +01:00
benoit74
f46f2568ff
Prepare for next release
2024-01-18 09:16:18 +01:00
benoit74
19b4898326
Release 1.6.3
v1.6.3
2024-01-18 09:12:36 +01:00
benoit74
10471c1ea9
Merge pull request #269 from openzim/prepare_1_6_3
2024-01-18 09:10:04 +01:00
benoit74
eebf26f7cb
Upgrade to browsertrix crawler 0.12.4 and warc2zim 1.5.5
2024-01-18 09:05:06 +01:00
benoit74
27f9dcc53f
Empty commit to release warc2zim2 commit aca2db3
2024-01-15 17:45:56 +01:00
benoit74
22551388e0
Merge pull request #264 from openzim/use_warc2zim2
2024-01-15 08:30:32 +01:00
benoit74
a352c0c402
Add temporary Github Actions workflow to build zimit2 image
2024-01-15 08:06:50 +01:00
benoit74
e034b08852
Update CHANGELOG
2024-01-15 08:06:50 +01:00
Matthieu Gautier
1c58bbe303
Adapt to warc2zim2
branch of warc2zim.
...
`warc2zim2` branch create zim files without service worker.
2024-01-15 08:00:05 +01:00
benoit74
eab3d1f189
Merge pull request #262 from openzim/warc2zim_update
2024-01-15 07:59:05 +01:00
benoit74
bbc8a48bc9
Update CHANGELOG
2024-01-15 07:55:53 +01:00
Matthieu Gautier
7bc0ed9c02
Use main branch of warc2zim in dockerfile instead of released version.
...
This PR adapt to API changed made in main branch of warc2zim, so we must
use it instead of released version.
2024-01-14 10:32:52 +01:00
Matthieu Gautier
af0c93f1df
Update to new organization of warc2zim.
...
Older `warc2zim` method is now named `main`.
2024-01-12 12:17:35 +01:00
benoit74
cd6a55b179
Merge pull request #263 from openzim/cleanup
2024-01-08 17:13:26 +01:00
Matthieu Gautier
f80dbd11d9
Remove unwanted file.
...
Sound like a vim miss-manipulation.
2024-01-08 16:42:28 +01:00
rgaudin
a62f31ed0d
Merge pull request #254 from openzim/collections_param
...
Collections and temporary directory parameters
2023-11-23 14:50:35 +00:00
benoit74
d6c0c6ce63
Fixes following review + we need to create on subdir per run to not mix data / cleanup correctly afer run
2023-11-23 13:08:45 +01:00
benoit74
a2b4c71ec9
Display warc2zim call args
2023-11-23 09:02:33 +01:00
benoit74
b98e8f7027
Fix handling of '--collection' parameter + add '--tmp' + enhance logging
2023-11-23 09:02:08 +01:00
benoit74
79d5f8bc7b
Tidy code automatically
2023-11-23 08:50:59 +01:00
benoit74
216ac09d8c
Enhance .gitignore with toptal generated one
2023-11-23 08:48:00 +01:00
benoit74
51ef841836
Prepare next release
2023-11-17 11:30:37 +01:00
benoit74
6e6c0e8b39
Release 1.6.2
v1.6.2
2023-11-17 11:25:09 +01:00
benoit74
7ca08791e7
Upgrade to browsertrix crawler 0.12.3
2023-11-17 11:17:41 +01:00
rgaudin
5512e814c7
Merge pull request #249 from openzim/fix_readme
...
Enhance README by removing Chrome and headless reference
2023-11-16 12:56:46 +00:00
benoit74
60b970f844
Enhance README by removing Chrome and headless reference
2023-11-16 13:14:11 +01:00
rgaudin
99ca5ca901
Merge pull request #246 from openzim/fix_zero_arg
...
Fix zero arg + crawler 0.12.2
2023-11-16 09:15:25 +00:00
benoit74
51d0409128
Add venv to gitignore
2023-11-16 08:22:23 +01:00
benoit74
4ad41a7d54
Upgrade to browsertrix crawler 0.12.2
2023-11-15 15:26:49 +01:00
benoit74
d24775d70c
Fix logic passing args to crawler
...
- do not set arg only if value is None or False
- remove default value 0 from args (this was not passed but would be
with new corrected code and would induce a different crawler behavior in fact)
2023-11-15 15:26:18 +01:00