PYSEC-2024-235

See a problem?
Import Source
https://github.com/pypa/advisory-database/blob/main/vulns/langchain-exa/PYSEC-2024-235.yaml
JSON Data
https://api.osv.dev/v1/vulns/PYSEC-2024-235
Aliases
Published
2024-02-26T16:27:49Z
Modified
2025-02-26T03:26:56.421288Z
Severity
  • 8.1 (High) CVSS_V3 - CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H CVSS Calculator
Summary
[none]
Details

With the following crawler configuration:

from bs4 import BeautifulSoup as Soup

url = "https://example.com"
loader = RecursiveUrlLoader(
    url=url, max_depth=2, extractor=lambda x: Soup(x, "html.parser").text
)
docs = loader.load()

An attacker in control of the contents of https://example.com could place a malicious HTML file in there with links like "https://example.completely.different/my_file.html" and the crawler would proceed to download that file as well even though prevent_outside=True.

https://github.com/langchain-ai/langchain/blob/bf0b3cc0b5ade1fb95a5b1b6fa260e99064c2e22/libs/community/langchaincommunity/documentloaders/recursiveurlloader.py#L51-L51

Resolved in https://github.com/langchain-ai/langchain/pull/15559

References

Affected packages

PyPI / langchain-exa

Package

Affected ranges

Type
GIT
Repo
https://github.com/langchain-ai/langchain
Events
Introduced
0 Unknown introduced commit / All previous commits are affected
Fixed
Fixed
Type
ECOSYSTEM
Events
Introduced
0Unknown introduced version / All previous versions are affected
Fixed
0.1.0

Affected versions

0.*

0.0.1