benoit74
|
867d14fd00
|
Merge pull request #285 from openzim/crawler_beta5
Upgrade browsertrix crawler and remove redirect handling
|
2024-03-07 11:25:02 +01:00 |
|
benoit74
|
5c716747b4
|
Add CHANGELOG
|
2024-03-07 10:16:57 +00:00 |
|
benoit74
|
456219deb3
|
Fix tests, there are in fact only 7 items to be pushed to the ZIM
7 entries are expected:
https://isago.rskg.org/
https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css
https://isago.rskg.org/static/favicon256.png
https://isago.rskg.org/conseils
https://isago.rskg.org/faq
https://isago.rskg.org/a-propos
https://isago.rskg.org/static/tarifs-isago.pdf
1 unexpected entry is not produced anymore by Browsertrix crawler:
https://dict.brave.com/edgedl/chrome/dict/en-us-10-1.bdic
This was a technical artifact
|
2024-03-07 10:16:51 +00:00 |
|
benoit74
|
a9769b2871
|
Upgrade to crawler 1.0.0-beta6
|
2024-03-07 08:00:31 +00:00 |
|
benoit74
|
a4cb27a793
|
Fix clean_url method name
|
2024-03-07 07:59:41 +00:00 |
|
benoit74
|
4d31f8eabb
|
Remove handling of redirects which are now done by browsertrix crawler
|
2024-03-07 07:59:40 +00:00 |
|
benoit74
|
b69f3d610f
|
Upgrade to crawler 1.0.0-beta5
|
2024-03-07 07:59:40 +00:00 |
|
benoit74
|
c2dc8c5ccc
|
Merge pull request #286 from openzim/upgrade_deps
Upgrade to Python 3.12, upgrade Python dependencies and add hatch-openzim plugin
|
2024-03-04 11:23:42 +01:00 |
|
benoit74
|
857ae5674d
|
Upgrade to Python 3.12
|
2024-03-01 14:03:25 +00:00 |
|
benoit74
|
89aea6b41e
|
Adopt hatch-openzim plugin
|
2024-03-01 14:03:24 +00:00 |
|
benoit74
|
a44c1a7c7f
|
Upgrade dependencies
|
2024-03-01 14:03:24 +00:00 |
|
benoit74
|
6ca9be48c7
|
Empty commit to release warc2zim2 commit 3c00da0
|
2024-02-16 10:03:04 +01:00 |
|
benoit74
|
01c5833c29
|
Empty commit to release warc2zim2 commit f837179
|
2024-02-09 11:10:57 +01:00 |
|
rgaudin
|
7caa355c31
|
Merge pull request #277 from openzim/scraper_suffix
Pass scraper suffix to warc2zim
|
2024-02-05 13:45:13 +00:00 |
|
benoit74
|
49da57c5b6
|
fixup! Set zimit and browsertrix crawler versions in final ZIM 'Scraper' metadata
|
2024-02-05 14:33:38 +01:00 |
|
benoit74
|
9244f2e69c
|
Set zimit and browsertrix crawler versions in final ZIM 'Scraper' metadata
|
2024-01-31 15:10:08 +01:00 |
|
benoit74
|
ef462b5024
|
Empty commit to release warc2zim2 commit ae18aed
|
2024-01-26 16:34:26 +01:00 |
|
benoit74
|
f4359022b2
|
Merge pull request #274 from openzim/add_logging
|
2024-01-25 08:38:35 +01:00 |
|
benoit74
|
a505df9fe0
|
Add support for --logging parameter of browsertrix crawler
|
2024-01-23 17:28:56 +01:00 |
|
benoit74
|
343d0040cf
|
Merge pull request #272 from openzim/adopt_bootstrap
|
2024-01-22 10:41:29 +01:00 |
|
benoit74
|
c7fdc1d11e
|
Simplify logger name code
|
2024-01-22 10:38:25 +01:00 |
|
benoit74
|
c0ffb74d8c
|
Adopt Python bootstrap conventions
|
2024-01-18 13:31:00 +01:00 |
|
benoit74
|
343fb7e770
|
Replace warning about service workers by a nota bene about there removal since 2.x
|
2024-01-18 13:28:11 +01:00 |
|
benoit74
|
909b6e3da8
|
Merge branch 'main' into zimit2
|
2024-01-18 09:27:00 +01:00 |
|
benoit74
|
f46f2568ff
|
Prepare for next release
|
2024-01-18 09:16:18 +01:00 |
|
benoit74
|
19b4898326
|
Release 1.6.3
v1.6.3
|
2024-01-18 09:12:36 +01:00 |
|
benoit74
|
10471c1ea9
|
Merge pull request #269 from openzim/prepare_1_6_3
|
2024-01-18 09:10:04 +01:00 |
|
benoit74
|
eebf26f7cb
|
Upgrade to browsertrix crawler 0.12.4 and warc2zim 1.5.5
|
2024-01-18 09:05:06 +01:00 |
|
benoit74
|
27f9dcc53f
|
Empty commit to release warc2zim2 commit aca2db3
|
2024-01-15 17:45:56 +01:00 |
|
benoit74
|
22551388e0
|
Merge pull request #264 from openzim/use_warc2zim2
|
2024-01-15 08:30:32 +01:00 |
|
benoit74
|
a352c0c402
|
Add temporary Github Actions workflow to build zimit2 image
|
2024-01-15 08:06:50 +01:00 |
|
benoit74
|
e034b08852
|
Update CHANGELOG
|
2024-01-15 08:06:50 +01:00 |
|
Matthieu Gautier
|
1c58bbe303
|
Adapt to warc2zim2 branch of warc2zim.
`warc2zim2` branch create zim files without service worker.
|
2024-01-15 08:00:05 +01:00 |
|
benoit74
|
eab3d1f189
|
Merge pull request #262 from openzim/warc2zim_update
|
2024-01-15 07:59:05 +01:00 |
|
benoit74
|
bbc8a48bc9
|
Update CHANGELOG
|
2024-01-15 07:55:53 +01:00 |
|
Matthieu Gautier
|
7bc0ed9c02
|
Use main branch of warc2zim in dockerfile instead of released version.
This PR adapt to API changed made in main branch of warc2zim, so we must
use it instead of released version.
|
2024-01-14 10:32:52 +01:00 |
|
Matthieu Gautier
|
af0c93f1df
|
Update to new organization of warc2zim.
Older `warc2zim` method is now named `main`.
|
2024-01-12 12:17:35 +01:00 |
|
benoit74
|
cd6a55b179
|
Merge pull request #263 from openzim/cleanup
|
2024-01-08 17:13:26 +01:00 |
|
Matthieu Gautier
|
f80dbd11d9
|
Remove unwanted file.
Sound like a vim miss-manipulation.
|
2024-01-08 16:42:28 +01:00 |
|
rgaudin
|
a62f31ed0d
|
Merge pull request #254 from openzim/collections_param
Collections and temporary directory parameters
|
2023-11-23 14:50:35 +00:00 |
|
benoit74
|
d6c0c6ce63
|
Fixes following review + we need to create on subdir per run to not mix data / cleanup correctly afer run
|
2023-11-23 13:08:45 +01:00 |
|
benoit74
|
a2b4c71ec9
|
Display warc2zim call args
|
2023-11-23 09:02:33 +01:00 |
|
benoit74
|
b98e8f7027
|
Fix handling of '--collection' parameter + add '--tmp' + enhance logging
|
2023-11-23 09:02:08 +01:00 |
|
benoit74
|
79d5f8bc7b
|
Tidy code automatically
|
2023-11-23 08:50:59 +01:00 |
|
benoit74
|
216ac09d8c
|
Enhance .gitignore with toptal generated one
|
2023-11-23 08:48:00 +01:00 |
|
benoit74
|
51ef841836
|
Prepare next release
|
2023-11-17 11:30:37 +01:00 |
|
benoit74
|
6e6c0e8b39
|
Release 1.6.2
v1.6.2
|
2023-11-17 11:25:09 +01:00 |
|
benoit74
|
7ca08791e7
|
Upgrade to browsertrix crawler 0.12.3
|
2023-11-17 11:17:41 +01:00 |
|
rgaudin
|
5512e814c7
|
Merge pull request #249 from openzim/fix_readme
Enhance README by removing Chrome and headless reference
|
2023-11-16 12:56:46 +00:00 |
|
benoit74
|
60b970f844
|
Enhance README by removing Chrome and headless reference
|
2023-11-16 13:14:11 +01:00 |
|