renaud gaudin
b29aeb08e6
back to dev
2022-06-21 17:20:30 +00:00
renaud gaudin
0eeb2ad9e3
Releasing 1.2.0
v1.2.0
2022-06-21 17:08:38 +00:00
renaud gaudin
dffc81860e
updated docker publish action
2022-06-21 17:06:40 +00:00
renaud gaudin
e32aac3ec0
code styling
2022-06-21 17:05:08 +00:00
rgaudin
b2bb77cd65
Merge pull request #108 from openzim/crawler-with-video
...
update to latest browsertrix-crawler + warc2zim
2022-06-21 16:59:15 +00:00
renaud gaudin
932f97c999
updated tests for crawler and warc2zim
2022-06-21 16:55:32 +00:00
renaud gaudin
1f490ace8f
Updated to browsertrix-crawler 0.6 and warc2zim 1.4
2022-06-21 12:04:56 +00:00
renaud gaudin
8b5eeb31c7
using crawler 0.6beta1
2022-06-14 14:58:33 +00:00
Ilya Kreymer
acf0aaf552
update to latest browsertrix-crawler
...
test with dev build of warc2zim 1.4.0 release
2022-06-14 14:58:33 +00:00
rgaudin
823e6bbb01
Merge pull request #132 from openzim/ci
...
updated CI test website URL
2022-06-13 10:05:25 +00:00
renaud gaudin
e29b6f3ad6
CI on push is suffiscient
2022-06-13 10:02:35 +00:00
renaud gaudin
885e1763a1
updated CI test website URL
2022-06-13 09:57:37 +00:00
Kelson
80f3d3293f
Merge pull request #129 from openzim/release-badge
...
Release badge
2022-06-11 20:06:20 +02:00
Emmanuel Engelhart
0025901959
Replace Docker Hub build badge with CI badge
2022-06-11 11:56:18 +02:00
Emmanuel Engelhart
99f8fbafe1
Movebot does not exist anymore
2022-06-11 11:53:35 +02:00
Emmanuel Engelhart
3d3f4fb121
Add release tag
2022-06-11 11:52:48 +02:00
rgaudin
8bcd692462
Merge pull request #125 from JensKorte/patch-1
...
Update README.md
2022-05-30 22:07:10 +02:00
JensKorte
1f31d6c1a5
Update README.md
...
relative link didn't work and replaced by https://github.com/openzim/warc2zim
2022-05-30 21:45:18 +02:00
renaud gaudin
98587045b4
Updated readme: warc2zim params can be passed
2022-05-03 10:31:34 +00:00
renaud gaudin
efd8ca53b4
updating crawler and warc2zim
v1.1.5
2021-06-10 14:14:11 +00:00
renaud gaudin
14ced5c481
fixed tests for new folder structure
2021-05-12 17:15:19 +00:00
renaud gaudin
2e9c129523
new crawler folder structure
v1.1.4
v.1.14
2021-05-12 17:03:48 +00:00
renaud gaudin
03abf6050a
updated warc2zim and browsertrix-crawler
2021-05-12 16:28:34 +00:00
renaud gaudin
f746f7b020
use same waitUntil defaults as current crawler
2021-03-04 10:40:12 +00:00
renaud gaudin
14fc8ffe0f
released v1.1.3
v1.1.3
2021-03-01 09:59:34 +00:00
rgaudin
ae820472de
Merge pull request #85 from openzim/limit-hit
...
capture and incorporates limit info from crawl
2021-02-15 17:23:42 +00:00
renaud gaudin
cfa4b0e7f8
capture and incorporates limit info from crawl
2021-02-15 17:20:43 +00:00
renaud gaudin
964746481f
using crawler 0.2.0
2021-02-15 17:15:54 +00:00
rgaudin
69892a215f
Merge pull request #84 from myt00seven/master
...
Update README.md with a --exclude example
2021-01-26 08:12:09 +00:00
lakesidethinks
6da4714cff
Update README.md
2021-01-25 12:31:09 -06:00
renaud gaudin
d0d51539fe
updated CHANGELOG
2021-01-15 12:59:00 +00:00
rgaudin
c3a7a02121
Merge pull request #80 from openzim/issue76
...
more flexible url redirects acceptance
2021-01-15 12:55:14 +00:00
renaud gaudin
76c92bdb4c
Fixed #76 : more flexible url redirects acceptance
...
- accepts redirects to same first-level domain
- accepts redirects matching scope
2021-01-15 12:50:53 +00:00
renaud gaudin
610ecc7e5c
using docker publish v5
v1.1.2
2021-01-14 18:27:07 +00:00
rgaudin
a60f7a392f
Merge pull request #79 from openzim/custom-css
...
Add custom-css option support (warc2zim)
2021-01-14 18:24:26 +00:00
renaud gaudin
871f7ab58d
Add custom-css option support (warc2zim)
2021-01-14 18:11:22 +00:00
rgaudin
e91cd7921e
Added domains blocklist ( #77 )
...
All domains from the 3 [anudeepND](https://github.com/anudeepND/blacklist ) lists
are now blocked at local resolver level by updating /etc/hosts in entrypoint.
- this saves network and CPU resources by failing early.
- this is wanted in almost all cases
- can be bypassed by setting a blank entrypoint
2021-01-12 07:31:16 +01:00
renaud gaudin
f4c11dc948
using published version of action
2020-12-22 15:48:12 +00:00
renaud gaudin
01302d3885
added package assignment
2020-12-22 11:15:51 +00:00
renaud gaudin
f72caad35c
added Docker publish GA
2020-12-22 11:10:53 +00:00
renaud gaudin
71603f8a15
fixed version number in changelog
2020-12-22 11:09:41 +00:00
rgaudin
ff5c6b3dc9
Merge pull request #68 from openzim/github-bots
...
GitHub bots
2020-12-15 11:23:28 +00:00
Emmanuel Engelhart
0cb3db6f16
Add move/stale bots configuration
2020-12-15 12:19:21 +01:00
Ilya Kreymer
508286ef78
Update to latest version of browsertrix-crawler (0.1.4) ( #66 )
...
to add autofetch support for srcset (and also stylesheets)
should fix (#63 )
v1.1.1
2020-12-14 09:36:41 +01:00
renaud gaudin
56d319ce3f
added changelog
v1.1
2020-12-14 08:13:54 +00:00
rgaudin
f6d44314cd
Fixed #58 : updated README with limitations
2020-12-12 13:58:32 +00:00
rgaudin
eb5ca99bfb
Merge pull request #62 from openzim/progres
...
Enhanced --statsFilename support
v1.0
2020-12-10 10:50:18 +00:00
renaud gaudin
85fad62b61
Updated test to new stats files
...
- verify output of crawl, warc2zim and zimit file
- using a simpler tag for CI test image as to not confuse it with public image
2020-12-10 10:44:49 +00:00
renaud gaudin
3ffa34d46e
Enhanced --statsFilename support
...
- `--statsFilename` to now represent overall zimit progress and not just crawling
- Exposing a simpler (`done`, `total`) json format for progress
- Live converting individual step's progres into this file
- using warc2zim 1.3.3 for its `--progress-file` support
- Currently arbitrarily assigning 90% to crawl and 10% to warc2zim
2020-12-10 10:44:39 +00:00
rgaudin
b9ed1d00a2
Merge pull request #60 from openzim/stats
...
stats: add support for stats output after every page crawled, fixes #39
2020-12-04 11:21:44 +00:00