388 Commits

Author SHA1 Message Date
renaud gaudin
906161ea51
fixed changelog (for 1.4.0) 2023-08-02 14:47:23 +00:00
renaud gaudin
cbaaa77a1f
releasing 1.4.0 v1.4.0 2023-08-02 14:42:10 +00:00
rgaudin
7cb118eaeb
Merge pull request #201 from openzim/lang
crawler 0.10.2
2023-08-02 14:35:52 +00:00
renaud gaudin
722306d3bf
Using a dedicated venv for zimit in image
zimit dependencies conflicts with crawler's python ones
2023-08-02 14:31:42 +00:00
renaud gaudin
61dc792653
Fixed #191: --lang to crawler, --zim-lang to warc2zim 2023-08-02 11:26:47 +00:00
renaud gaudin
941db5fdfc
using crawler 0.10.2 2023-08-02 11:26:42 +00:00
rgaudin
47ede96f91
Merge pull request #198 from f0sh/add-config
added crawler --config option to arguments
2023-07-17 09:34:41 +00:00
f0sh
95c27bad08 added crawler config option to arguments
according to https://github.com/webrecorder/browsertrix-crawler#yaml-crawl-config the crawler can be configured with a yaml config files
which gives more options to configure the crawler to your needs without implementing all the options into zimit.py.
2023-07-17 09:29:45 +00:00
Popo le Chien
57e2f41439
Merge pull request #199 from yukiqt/patch-1
minor spelling mistake
2023-07-13 15:18:22 +02:00
yuki
b568848a98
minor spelling mistake
i win
2023-07-13 12:49:34 +00:00
renaud gaudin
af8196095d
using 0.10.0-beta.4 2023-05-23 08:10:03 +00:00
renaud gaudin
70a80681a6
use bet3 and --failOnFailedSeed 2023-05-22 11:23:46 +00:00
renaud gaudin
8d287466bd
Make sure to rebuild warc2zim from main to use unminified 2023-05-22 09:57:51 +00:00
renaud gaudin
fc9ad3759e account for new failed field in crawl.json 2023-04-27 11:56:14 +00:00
renaud gaudin
c31e80608e Using browsertrix-crawler 0.10.0-beta.0 2023-04-27 11:35:14 +00:00
renaud gaudin
8b4ea950a8 Using browsertrix-crawler 0.9.1 2023-04-25 08:53:52 +00:00
renaud gaudin
8ecd0a3210 upgraded to browsertrix-crawler 0.9.0 2023-04-10 13:08:12 +00:00
renaud gaudin
4f676e37c7 Using browsertrix-crawler 0.9.0-beta.2 2023-04-04 08:49:25 +00:00
renaud gaudin
b7265b49b6 updated to crawler 0.9 (b1) 2023-03-24 07:26:33 +00:00
renaud gaudin
b8714d1260 removed references to docker.io 2023-03-22 13:55:07 +00:00
renaud gaudin
6324b7c7c5 Fixed #172: Disabled Chrome updates to prevent incidental inclusion of update data in WARC/ZIM 2023-03-10 12:10:06 +00:00
renaud gaudin
238d1a6016 using crawler 0.8.1 and warc2zim's main 2023-02-27 09:57:36 +00:00
Emmanuel Engelhart
79d444e7ea
Update GitHub workflow actions 2023-02-07 14:24:29 +01:00
renaud gaudin
64bc8bf09f releasing 1.3.1 v1.3.1 2023-02-06 11:48:44 +01:00
renaud gaudin
459778d472 released v1.3.0 v1.3.0 2023-02-02 16:31:45 +00:00
renaud gaudin
af9a3d24d9 removed obsolete ref to cap-add in README 2023-02-02 16:30:15 +00:00
renaud gaudin
4b7e504d99 Updated test and stats to new crawl.json format 2023-01-31 11:12:36 +00:00
renaud gaudin
554fff5c87 Using browsertrix-crawler 0.8.0-beta.1 2023-01-31 10:34:32 +00:00
renaud gaudin
8fd9462e25 triggering a rebuild with updated (still main) warc2zim 2023-01-16 11:39:05 +00:00
renaud gaudin
0172c53c50 warc2zim is now at main branch, not master 2023-01-13 10:02:29 +00:00
renaud gaudin
3756c6612f Using browsertrix-crawler 0.8.0-beta.0 2023-01-13 09:59:07 +00:00
Kelson
511fccdc56
"main" is the new default branch 2022-12-21 11:07:37 +01:00
Kelson
859e79c165
"main" is the new default branch 2022-12-21 11:06:50 +01:00
renaud gaudin
cf26f8c33a Using browsertrix-crawler 0.7.1 2022-11-16 11:20:39 +00:00
renaud gaudin
0624c50121 Using browsertrix-crawler 0.7.0 (release) 2022-10-12 14:57:01 +00:00
renaud gaudin
fab4ff6bf5 using crawler 0.7.0-beta.5 2022-09-21 08:29:59 +00:00
renaud gaudin
a9cf1cd9c3 using crawler 0.7.0-beta.4 2022-09-09 07:26:03 +00:00
renaud gaudin
2d4375fd0a use crawler 0.7.0-beta.3 2022-09-03 18:44:48 +00:00
renaud gaudin
472c4cf41a trigger build for warc2zim update 2022-08-30 10:53:03 +00:00
renaud gaudin
ce68493087 increased check_url timeouts 2022-07-25 08:41:08 +00:00
renaud gaudin
857e044c84 Fixed --allowHashUrls incorrectly requiring a value 2022-07-18 10:23:16 +00:00
renaud gaudin
8c6d2bfb45 using browsertrix-crawler 0.7 beta 2022-07-04 15:08:49 +00:00
renaud gaudin
b79ad1b138 use master warc2zim in-between releases 2022-06-30 09:42:50 +00:00
renaud gaudin
142970bc0a Fixed #137: normalizes homepage redirects to standart ports 2022-06-22 09:57:01 +00:00
renaud gaudin
b29aeb08e6 back to dev 2022-06-21 17:20:30 +00:00
renaud gaudin
0eeb2ad9e3 Releasing 1.2.0 v1.2.0 2022-06-21 17:08:38 +00:00
renaud gaudin
dffc81860e updated docker publish action 2022-06-21 17:06:40 +00:00
renaud gaudin
e32aac3ec0 code styling 2022-06-21 17:05:08 +00:00
rgaudin
b2bb77cd65
Merge pull request #108 from openzim/crawler-with-video
update to latest browsertrix-crawler + warc2zim
2022-06-21 16:59:15 +00:00
renaud gaudin
932f97c999 updated tests for crawler and warc2zim 2022-06-21 16:55:32 +00:00