Ilya Kreymer
3519d32ba6
disable https checking for fetch() head check (pywb already ignores https certs for capture), should fix #10
2020-10-06 15:49:45 +00:00
Ilya Kreymer
24c843c4af
update to latest warc2zim (1.1.0)
2020-10-06 15:36:45 +00:00
Ilya Kreymer
daa2492655
config work: pass remaining config opts to warc2zim, fixes #13
...
warc2zim check: add runWarc2Zim() to test warc2zim opts before running for validity
run script: create temp dir in output dir to ensure all data is on the volume
run script: add --keep option to keep temp dir, otherwise delete
2020-10-06 06:25:40 +00:00
Ilya Kreymer
e4128c8183
add help text/validation for all config options, url now must be passed in with --url
...
add --scroll boolean option, which activates simple autoscroll behavior
use chrome user-agent for manual fetch
reenable pywb option
cleanup Dockerfile: update to warc2zim 1.0.1, install fonts-stix for math science sites
update README
2020-09-29 05:22:33 +00:00
Kelson
bb5b7e48c1
Additional README.md changes ( #16 )
2020-09-25 12:02:43 +02:00
rgaudin
252516e38c
Merge pull request #14 from openzim/kelson42-patch-1
...
Update README.md
2020-09-25 09:47:29 +00:00
Kelson
ac650bff05
Update README.md
2020-09-25 11:36:30 +02:00
rgaudin
01f2471ab8
Merge pull request #11 from openzim/develop
...
Initial prototype
2020-09-23 08:44:34 +00:00
renaud gaudin
71e94914aa
Added gevent update to prevent segfault in uwsgi
2020-09-23 08:42:08 +00:00
Ilya Kreymer
6a925748d5
excludes: fix no exclude default
2020-09-22 18:12:15 +00:00
Ilya Kreymer
f25b390f15
add regex exclusions
2020-09-22 17:48:09 +00:00
Ilya Kreymer
f252245983
try using regular puppeteer, only copy deps from chrome image
...
pywb: increase uwsgi processes, disable autoindex/autofetch for better perf
2020-09-22 06:09:33 +00:00
Ilya Kreymer
b00c4262a7
add --limit param for max URLs to be captured
...
add 'html check', only load HTML in browsers, load other content-types directly via pywb, esp for PDFs (work on #8 )
improved error handling
2020-09-21 07:16:26 +00:00
Ilya Kreymer
ff2773677c
crawling: move checking logic to shouldCrawl, remove hashtag before checking seen list
2020-09-19 23:19:21 +00:00
Ilya Kreymer
9b23de828b
Update README.md
2020-09-19 15:53:23 -07:00
Ilya Kreymer
4e04645e6b
move warc2zim to be launched by node process
2020-09-19 22:47:19 +00:00
Ilya Kreymer
1de577bd78
use puppeteeer-cluster for parallel crawling
...
use yargs to parse command-line args
2020-09-19 22:19:20 +00:00
Ilya Kreymer
7346527a81
initial setup - single url capture with existing browser image, pywb, puppeteer and warc2zim
2020-09-19 17:38:52 +00:00
rgaudin
bdfd9be399
Added LICENSE document
2020-09-01 10:22:32 +02:00
renaud gaudin
15cf636ff3
reset master branch for 2020 codebase
2020-08-19 09:36:48 +02:00
Kelson
d178431e20
Github Kiwix Sponsoring page link
2020-02-01 18:14:09 +01:00
Kelson
77efa285e0
Create FUNDING.yml
proof-of-concept
2019-06-22 08:08:19 +02:00
Alexis Métaireau
6b10be5557
Add a few options to HTTrack
2016-06-21 15:47:15 +02:00
Alexis Métaireau
eee3447fd1
Rename logging by log_file
2016-06-20 19:17:21 +02:00
Alexis Métaireau
6c1f22ae96
Always append to log files.
2016-06-20 18:55:59 +02:00
Alexis Métaireau
df2d0ccada
Rename the "website" endpoint to "website-zim".
...
Fix #17
2016-06-20 18:54:45 +02:00
Alexis Métaireau
ddb0eb69e3
Add a status API. Fix #5
2016-06-20 18:46:30 +02:00
Alexis Métaireau
728a90a7dd
Fix markup
2016-06-20 15:36:40 +02:00
Alexis Métaireau
13d04caf5c
Define the exposed API in the README.
...
Fix #13
2016-06-20 15:22:15 +02:00
Alexis Métaireau
c84e6cc5d3
Refactor the ZimCreator class.
...
Fixes #12 #14
2016-06-20 14:48:16 +02:00
Alexis Métaireau
8ce39f00f9
Replace material design by bootstrap.
...
It's visually more pleasant :)
2016-06-20 09:59:31 +02:00
Alexis Métaireau
6d7affc01b
Add a frontend to start jobs.
2016-06-19 18:57:05 +02:00
Alexis Métaireau
7066a17edd
Serve static assets and redirect to the index.
2016-06-18 12:09:15 +02:00
Alexis Métaireau
a45f56f23a
Update the readme with better installation instructions
2016-06-17 18:32:39 +02:00
Alexis Métaireau
6756b2e55d
Do not include the index.html from httrack.
...
Fix #3
2016-06-17 18:26:25 +02:00
Alexis Métaireau
6442dd1eb2
Add a home
2016-01-15 18:20:47 +01:00
Alexis Métaireau
4771959014
Add a section about supervisord configuration
2016-01-15 18:11:20 +01:00
Alexis Métaireau
d2a26db898
Add configuration insntructions
2016-01-15 18:01:58 +01:00
Alexis Métaireau
e87f4a1e4e
Add a static webpage
2016-01-11 11:15:35 +01:00
Alexis Métaireau
f0a25abedc
Change the location of the zims
2016-01-11 11:15:22 +01:00
Alexis Métaireau
d8cf8dffc2
Add a way to specify the location of the output folder
2016-01-10 23:07:58 +01:00
Alexis Métaireau
8277e3f883
Send an email when the download is ready
2016-01-10 22:52:42 +01:00
Alexis Metaireau
146bd32f63
Merge pull request #2 from Natim/read_from_settings
...
Read settings from config.
2016-01-10 12:16:43 +01:00
Alexis Metaireau
3548e32f60
Merge pull request #1 from Natim/improve-readme
...
Improve the ZIMIT README.
2016-01-10 12:13:50 +01:00
Rémy HUBSCHER
b67bd8c031
Read settings from config.
2016-01-10 12:09:44 +01:00
Rémy HUBSCHER
58f63b75a3
Improve the ZIMIT README.
2016-01-10 11:44:03 +01:00
Alexis Métaireau
f56c16360b
update the readme
2016-01-10 00:36:49 +01:00
Alexis Métaireau
1e92a0f469
First working version
2016-01-10 00:34:47 +01:00