Ray Data registers custom Arrow extension types (ray.data.arrow_tensor, ray.data.arrow_tensor_v2, ray.data.arrow_variable_shaped_tensor) globally in PyArrow. When PyArrow reads a Parquet file containing one of these extension types, it calls __arrow_ext_deserialize__ on the field's metadata bytes. Ray's implementation passes these bytes directly to cloudpickle.loads(), achieving arbitrary code execution during schema parsing, before any row data is read.
In May 2024, Ray fixed a related vulnerability in PyExtensionType-based extension types (issue #41314, PR #45084). In July 2025, PR #54831 introduced cloudpickle.loads() into the replacement extension types' deserialization path, reintroducing the same class of vulnerability.
_deserialize_with_fallback function with cloudpickle.loads() was introduced in commit f6d21db1a4 (PR #54831, July 2025), first released in Ray 2.49.0.ray.data.read_parquet(), pyarrow.parquet.read_table(), pandas.read_parquet(), etc.ray.data.arrow_tensor (or v2, or variable-shaped) extension type name, which makes this a targeted attack against Ray Data users.{
"github_reviewed": true,
"severity": "HIGH",
"github_reviewed_at": "2026-04-24T16:15:00Z",
"nvd_published_at": "2026-05-08T22:16:29Z",
"cwe_ids": [
"CWE-502",
"CWE-94"
]
}