Merge pull request #264 from openzim/use_warc2zim2

This commit is contained in:
benoit74 2024-01-15 08:30:32 +01:00 committed by GitHub
commit 22551388e0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 40 additions and 8 deletions

31
.github/workflows/docker_zimit2.yml vendored Normal file
View File

@ -0,0 +1,31 @@
name: Docker Zimit2
on:
push:
branches:
- zimit2
jobs:
build-and-push:
name: Deploy Docker Image
runs-on: ubuntu-22.04
steps:
- name: Retrieve source code
uses: actions/checkout@v3
- name: Build and push
uses: openzim/docker-publish-action@v10
with:
image-name: openzim/zimit
manual-tag: zimit2
restrict-to: openzim/zimit
registries: ghcr.io
credentials:
GHCRIO_USERNAME=${{ secrets.GHCR_USERNAME }}
GHCRIO_TOKEN=${{ secrets.GHCR_TOKEN }}
repo_description: auto
repo_overview: auto
platforms: |
linux/amd64
linux/arm64

View File

@ -10,8 +10,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Changed
- Adapt to new `warc2zim` code structure
- Using `main` warc2zim ⚠️ change before releasing!
- Use `warc2zim` version 2, which works without Service Worker anymore
- Using `warc2zim2` warc2zim ⚠️ change before releasing!
- Build temporary `zimit2` Docker image for testing ⚠️ remove before releasing!
### Added

View File

@ -9,7 +9,7 @@ RUN apt-get update \
# python setup (in venv not to conflict with browsertrix)
&& python3 -m venv /app/zimit \
&& /app/zimit/bin/python -m pip install --no-cache-dir 'requests==2.31.0' 'inotify==0.2.10' 'tld==0.13' \
'git+https://github.com/openzim/warc2zim@main#egg_name=warc2zim' \
'git+https://github.com/openzim/warc2zim@warc2zim2#egg_name=warc2zim' \
# placeholder (default output location)
&& mkdir -p /output \
# disable chrome upgrade

View File

@ -6,9 +6,9 @@ import libzim.reader
from warcio import ArchiveIterator
def get_zim_article(zimfile, path):
def get_zim_main_entry(zimfile):
zim_fh = libzim.reader.Archive(zimfile)
return zim_fh.get_entry_by_path(path).get_item().content.tobytes()
return zim_fh.main_entry
def test_is_file():
@ -20,9 +20,9 @@ def test_zim_main_page():
"""Main page specified, http://isago.rskg.org/, was a redirect to https
Ensure main page is the redirected page"""
assert b'"https://isago.rskg.org/"' in get_zim_article(
"/output/isago.zim", "A/index.html"
)
main_entry = get_zim_main_entry("/output/isago.zim")
assert main_entry.is_redirect
assert main_entry.get_redirect_entry().path == "isago.rskg.org/"
def test_user_agent():