Added domains blocklist (#77)

All domains from the 3 [anudeepND](https://github.com/anudeepND/blacklist) lists
are now blocked at local resolver level by updating /etc/hosts in entrypoint.

- this saves network and CPU resources by failing early.
- this is wanted in almost all cases
- can be bypassed by setting a blank entrypoint
This commit is contained in:
rgaudin 2021-01-12 06:31:16 +00:00 committed by GitHub
parent f4c11dc948
commit e91cd7921e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 14 additions and 1 deletions

View File

@ -11,5 +11,16 @@ ADD zimit.py /app/
RUN ln -s /app/zimit.py /usr/bin/zimit RUN ln -s /app/zimit.py /usr/bin/zimit
CMD ["zimit"] # download list of bad domains to filter-out. intentionnaly ran post-install
# so it's not cached in earlier layers (url stays same but content updated)
RUN mkdir -p /tmp/ads && cd /tmp/ads && \
curl -L -O https://hosts.anudeep.me/mirror/adservers.txt && \
curl -L -O https://hosts.anudeep.me/mirror/CoinMiner.txt && \
curl -L -O https://hosts.anudeep.me/mirror/facebook.txt && \
cat ./*.txt > /etc/blocklist.txt \
&& rm ./*.txt
RUN printf '#!/bin/sh\ncat /etc/blocklist.txt >> /etc/hosts\nexec "$@"' > /usr/local/bin/entrypoint.sh && \
chmod +x /usr/local/bin/entrypoint.sh
ENTRYPOINT ["entrypoint.sh"]
CMD ["zimit"]

View File

@ -60,6 +60,8 @@ docker run -v /output:/output --cap-add=SYS_ADMIN --cap-add=NET_ADMIN \
The puppeteer-cluster provides monitoring output which is enabled by The puppeteer-cluster provides monitoring output which is enabled by
default and prints the crawl status to the Docker log. default and prints the crawl status to the Docker log.
**Note**: Image automatically filters out a large number of ads by using the 3 blocklists from [anudeepND](https://github.com/anudeepND/blacklist). If you don't want this filtering, disable the image's entrypoint in your container (`docker run --entrypoint="" openzim/zimit ...`).
Nota bene Nota bene
--------- ---------