144 Commits

Author SHA1 Message Date
renaud gaudin
b29aeb08e6 back to dev 2022-06-21 17:20:30 +00:00
renaud gaudin
0eeb2ad9e3 Releasing 1.2.0 v1.2.0 2022-06-21 17:08:38 +00:00
renaud gaudin
dffc81860e updated docker publish action 2022-06-21 17:06:40 +00:00
renaud gaudin
e32aac3ec0 code styling 2022-06-21 17:05:08 +00:00
rgaudin
b2bb77cd65
Merge pull request #108 from openzim/crawler-with-video
update to latest browsertrix-crawler + warc2zim
2022-06-21 16:59:15 +00:00
renaud gaudin
932f97c999 updated tests for crawler and warc2zim 2022-06-21 16:55:32 +00:00
renaud gaudin
1f490ace8f Updated to browsertrix-crawler 0.6 and warc2zim 1.4 2022-06-21 12:04:56 +00:00
renaud gaudin
8b5eeb31c7 using crawler 0.6beta1 2022-06-14 14:58:33 +00:00
Ilya Kreymer
acf0aaf552 update to latest browsertrix-crawler
test with dev build of warc2zim 1.4.0 release
2022-06-14 14:58:33 +00:00
rgaudin
823e6bbb01
Merge pull request #132 from openzim/ci
updated CI test website URL
2022-06-13 10:05:25 +00:00
renaud gaudin
e29b6f3ad6 CI on push is suffiscient 2022-06-13 10:02:35 +00:00
renaud gaudin
885e1763a1 updated CI test website URL 2022-06-13 09:57:37 +00:00
Kelson
80f3d3293f
Merge pull request #129 from openzim/release-badge
Release badge
2022-06-11 20:06:20 +02:00
Emmanuel Engelhart
0025901959
Replace Docker Hub build badge with CI badge 2022-06-11 11:56:18 +02:00
Emmanuel Engelhart
99f8fbafe1
Movebot does not exist anymore 2022-06-11 11:53:35 +02:00
Emmanuel Engelhart
3d3f4fb121
Add release tag 2022-06-11 11:52:48 +02:00
rgaudin
8bcd692462
Merge pull request #125 from JensKorte/patch-1
Update README.md
2022-05-30 22:07:10 +02:00
JensKorte
1f31d6c1a5
Update README.md
relative link didn't work and replaced by https://github.com/openzim/warc2zim
2022-05-30 21:45:18 +02:00
renaud gaudin
98587045b4 Updated readme: warc2zim params can be passed 2022-05-03 10:31:34 +00:00
renaud gaudin
efd8ca53b4 updating crawler and warc2zim v1.1.5 2021-06-10 14:14:11 +00:00
renaud gaudin
14ced5c481 fixed tests for new folder structure 2021-05-12 17:15:19 +00:00
renaud gaudin
2e9c129523 new crawler folder structure v1.1.4 v.1.14 2021-05-12 17:03:48 +00:00
renaud gaudin
03abf6050a updated warc2zim and browsertrix-crawler 2021-05-12 16:28:34 +00:00
renaud gaudin
f746f7b020 use same waitUntil defaults as current crawler 2021-03-04 10:40:12 +00:00
renaud gaudin
14fc8ffe0f released v1.1.3 v1.1.3 2021-03-01 09:59:34 +00:00
rgaudin
ae820472de
Merge pull request #85 from openzim/limit-hit
capture and incorporates limit info from crawl
2021-02-15 17:23:42 +00:00
renaud gaudin
cfa4b0e7f8 capture and incorporates limit info from crawl 2021-02-15 17:20:43 +00:00
renaud gaudin
964746481f using crawler 0.2.0 2021-02-15 17:15:54 +00:00
rgaudin
69892a215f
Merge pull request #84 from myt00seven/master
Update README.md with a --exclude example
2021-01-26 08:12:09 +00:00
lakesidethinks
6da4714cff Update README.md 2021-01-25 12:31:09 -06:00
renaud gaudin
d0d51539fe updated CHANGELOG 2021-01-15 12:59:00 +00:00
rgaudin
c3a7a02121
Merge pull request #80 from openzim/issue76
more flexible url redirects acceptance
2021-01-15 12:55:14 +00:00
renaud gaudin
76c92bdb4c Fixed #76: more flexible url redirects acceptance
- accepts redirects to same first-level domain
- accepts redirects matching scope
2021-01-15 12:50:53 +00:00
renaud gaudin
610ecc7e5c using docker publish v5 v1.1.2 2021-01-14 18:27:07 +00:00
rgaudin
a60f7a392f
Merge pull request #79 from openzim/custom-css
Add custom-css option support (warc2zim)
2021-01-14 18:24:26 +00:00
renaud gaudin
871f7ab58d Add custom-css option support (warc2zim) 2021-01-14 18:11:22 +00:00
rgaudin
e91cd7921e
Added domains blocklist (#77)
All domains from the 3 [anudeepND](https://github.com/anudeepND/blacklist) lists
are now blocked at local resolver level by updating /etc/hosts in entrypoint.

- this saves network and CPU resources by failing early.
- this is wanted in almost all cases
- can be bypassed by setting a blank entrypoint
2021-01-12 07:31:16 +01:00
renaud gaudin
f4c11dc948 using published version of action 2020-12-22 15:48:12 +00:00
renaud gaudin
01302d3885 added package assignment 2020-12-22 11:15:51 +00:00
renaud gaudin
f72caad35c added Docker publish GA 2020-12-22 11:10:53 +00:00
renaud gaudin
71603f8a15 fixed version number in changelog 2020-12-22 11:09:41 +00:00
rgaudin
ff5c6b3dc9
Merge pull request #68 from openzim/github-bots
GitHub bots
2020-12-15 11:23:28 +00:00
Emmanuel Engelhart
0cb3db6f16 Add move/stale bots configuration 2020-12-15 12:19:21 +01:00
Ilya Kreymer
508286ef78
Update to latest version of browsertrix-crawler (0.1.4) (#66)
to add autofetch support for srcset (and also stylesheets)
should fix (#63)
v1.1.1
2020-12-14 09:36:41 +01:00
renaud gaudin
56d319ce3f added changelog v1.1 2020-12-14 08:13:54 +00:00
rgaudin
f6d44314cd
Fixed #58: updated README with limitations 2020-12-12 13:58:32 +00:00
rgaudin
eb5ca99bfb
Merge pull request #62 from openzim/progres
Enhanced --statsFilename support
v1.0
2020-12-10 10:50:18 +00:00
renaud gaudin
85fad62b61 Updated test to new stats files
- verify output of crawl, warc2zim and zimit file
- using a simpler tag for CI test image as to not confuse it with public image
2020-12-10 10:44:49 +00:00
renaud gaudin
3ffa34d46e Enhanced --statsFilename support
- `--statsFilename` to now represent overall zimit progress and not just crawling
- Exposing a simpler (`done`, `total`) json format for progress
- Live converting individual step's progres into this file
- using warc2zim 1.3.3 for its `--progress-file` support
- Currently arbitrarily assigning 90% to crawl and 10% to warc2zim
2020-12-10 10:44:39 +00:00
rgaudin
b9ed1d00a2
Merge pull request #60 from openzim/stats
stats: add support for stats output after every page crawled, fixes #39
2020-12-04 11:21:44 +00:00