mirror of
https://github.com/Stichting-MINIX-Research-Foundation/pkgsrc-ng.git
synced 2025-08-03 17:59:07 -04:00
9 lines
397 B
Plaintext
9 lines
397 B
Plaintext
html5lib is a pure-python library for parsing HTML. The parser is
|
|
designed to handle all flavours of HTML and parses invalid documents
|
|
using well-defined error handling rules compatible with the behaviour of
|
|
major desktop web browsers.
|
|
|
|
Output is to a tree structure; the current release supports output to
|
|
DOM, ElementTree, lxml and BeautifulSoup tree formats as well as a
|
|
simple custom format.
|