Given xml.etree.ElementTree
as etree
(as it is commonly imported as):
What's returned is not an etree.ElementTree
, but rather an etree.Element
(this is the same as what etree.fromstring
returns; only etree.parse
returns an etree.ElementTree
). It is genuinely part of the etree module — it's not something with a similar API. The problem you've run into applies to etree.fromstring
as much as it does html5lib.
The Python documentation for xml.etree.ElementTree
doesn't mention the namespaces
argument — it seems to be an undocumented feature of ElementTree
objects (but not Element
objects). As such, it's probably not something that should really be relied on! Your best bet is likely going to be to use a wrapper function.
The fact that Eclipse cannot go through the trees is down to the fact that html5lib defaults to xml.etree.cElementTree
when it exists — which is meant to be identical, per the module's documentation, but is implemented in C using CPython's API, stopping Eclipse's debugger from functioning. You can get a treebuilder using the non-accelerated version (note from Python 3.3 both are the C implementation — cElementTree
merely survives as a deprecated alias) using the below:
import xml.etree.ElementTree as etree
import html5lib
tb = html5lib.getTreeBuilder("etree", implementation=etree)
p = html5lib.HTMLParser(tb)
tree = p.parse("<html>")