The HTML Parser in lxml does not properly handle context-switching for special HTML tags such as <svg>
, <math>
and <noscript>
. This behavior deviates from how web browsers parse and interpret such tags. Specifically, content in CSS comments is ignored by lxmlhtmlclean but may be interpreted differently by web browsers, enabling malicious scripts to bypass the cleaning process. This vulnerability could lead to Cross-Site Scripting (XSS) attacks, compromising the security of users relying on lxmlhtmlclean in default configuration for sanitizing untrusted HTML content.
Users employing the HTML cleaner in a security-sensitive context should upgrade to lxml 0.4.0, which addresses this issue.
As a temporary mitigation, users can configure lxmlhtmlclean with the following settings to prevent the exploitation of this vulnerability:
* remove_tags
: Specify tags to remove - their content is moved to their parents' tags.
* kill_tags
: Specify tags to be removed completely.
* allow_tags
: Restrict the set of permissible tags, excluding context-switching tags like <svg>
, <math>
and <noscript>
.
{ "nvd_published_at": "2024-11-19T22:15:21Z", "cwe_ids": [ "CWE-184", "CWE-79", "CWE-83" ], "severity": "HIGH", "github_reviewed": true, "github_reviewed_at": "2024-11-19T21:07:59Z" }