justhtml through 1.9.1 allows denial of service via deeply nested HTML. During parsing, JustHTML.__init__() always reaches TreeBuilder.finish(), which unconditionally calls _populate_selectedcontent(). That function recursively traverses the DOM via _find_elements() / _find_element() without a depth bound, allowing attacker-controlled deeply nested input to trigger an unhandled RecursionError on CPython. Depending on the host application's exception handling, this can abort parsing, fail requests, or terminate a worker/process.
TreeBuilder.finish() (treebuilder.py#L476) unconditionally calls _populate_selectedcontent(self.document) at line 494. _populate_selectedcontent() (treebuilder.py#L1243) calls _find_elements() (treebuilder.py#L1280) to recursively search the DOM tree for <select> elements:
def _find_elements(self, node: Any, name: str, result: list[Any]) -> None:
"""Recursively find all elements with given name."""
if node.name == name:
result.append(node)
if node.has_child_nodes():
for child in node.children:
self._find_elements(child, name, result) # recursive call
When the DOM tree depth exceeds CPython's default recursion limit (1000), this raises an unhandled RecursionError. The full call path is:
JustHTML(html) → tokenizer.run() → tree_builder.finish() → _populate_selectedcontent(document) → _find_elements(root, "select", selects) (recursive)
Deeply nested DOM trees can be produced by nesting <div> tags ~1000 levels deep. On CPython with the default recursion limit, approximately 11 KB of <div> nesting is sufficient to trigger the error. The exact depth threshold is environment-dependent (CPython version, recursion limit setting, call stack depth at invocation).
Additional recursive functions are affected on already-parsed deep trees:
- Node.clone_node(deep=True) (node.py#L523) — called during sanitization
- _node_to_html() (serialize.py#L580) — used by to_html(pretty=True)
- _to_markdown_walk() (node.py#L817) — used by to_markdown()
Note: the library already uses iterative traversal in several comparable functions (e.g., _node_to_html_compact at serialize.py#L197, _to_text_collect at node.py#L161, _is_blocky_element at serialize.py#L405, apply_to_children at transforms.py#L1642), demonstrating the correct pattern.
from justhtml import JustHTML
html = "<div>" * 1000 + "x" + "</div>" * 1000
doc = JustHTML(html) # raises RecursionError
Test environment: CPython 3.14.3, macOS ARM64 (Apple Silicon), justhtml 1.9.1, default recursion limit (1000)
| Input | Size | Result |
|-------|------|--------|
| <div> × 500 | 5,501 bytes | OK |
| <div> × 800 | 8,801 bytes | OK |
| <div> × 1000 | 11,001 bytes | RecursionError |
The error occurs with both sanitize=True (default) and sanitize=False.
An attacker who can supply HTML for parsing can trigger an unhandled RecursionError during JustHTML() construction. The error is triggered during construction and is not avoided by justhtml configuration alone; mitigating it requires host-application exception handling or input constraints. Depending on the host application's exception handling, this can abort parsing, fail requests, or terminate a worker/process.
Convert the recursive tree traversal functions to iterative implementations using an explicit stack. Example for _find_elements:
def _find_elements(self, node: Any, name: str, result: list[Any]) -> None:
stack = [node]
while stack:
current = stack.pop()
if current.name == name:
result.append(current)
if current.has_child_nodes():
stack.extend(reversed(current.children))
The same conversion should be applied to _find_element, clone_node(deep=True), _node_to_html(), and _to_markdown_walk().
{
"cwe_ids": [
"CWE-674"
],
"nvd_published_at": null,
"severity": "HIGH",
"github_reviewed": true,
"github_reviewed_at": "2026-03-17T14:07:38Z"
}