CVE-2025-40007

Commit 20d72b00ca81 ("netfs: Fix the request's work item to not require a ref") modified netfsallocrequest() to initialize the reference counter to 2 instead of 1. The rationale was that the requet's "work" would release the second reference after completion (via netfs_{read,write}collectionworker()). That works most of the time if all goes well.

However, it leaks this additional reference if the request is released before the I/O operation has been submitted: the error code path only decrements the reference counter once and the work item will never be queued because there will never be a completion.

This has caused outages of our whole server cluster today because tasks were blocked in netfswaitforoutstandingio(), leading to deadlocks in Ceph (another bug that I will address soon in another patch). This was caused by a netfspgpriv2begincopytocache() call which failed in fscachebeginwriteoperation(). The leaked netfsiorequest was never completed, leaving netfs_inode.io_count with a positive value forever.

All of this is super-fragile code. Finding out which code paths will lead to an eventual completion and which do not is hard to see:

Some functions like netfscreatewrite_req() allocate a request, but will never submit any I/O.
netfsunbufferedreaditerlocked() calls netfsunbufferedread() and then netfsputrequest(); however, netfsunbufferedread() can also fail early before submitting the I/O request, therefore another netfsputrequest() call must be added there.

A rule of thumb is that functions that return a netfs_io_request do not submit I/O, and all of their callers must be checked.

For my taste, the whole netfs code needs an overhaul to make reference counting easier to understand and less fragile & obscure. But to fix this bug here and now and produce a patch that is adequate for a stable backport, I tried a minimal approach that quickly frees the request object upon early failure.

I decided against adding a second netfsputrequest() each time because that would cause code duplication which obscures the code further. Instead, I added the function netfsputfailedrequest() which frees such a failed request synchronously under the assumption that the reference count is exactly 2 (as initially set by netfsallocrequest() and never touched), verified by a WARNONONCE(). It then deinitializes the request object (without going through the "cleanupwork" indirection) and frees the allocation (with RCU protection to protect against concurrent access by netfsrequestsseq_start()).

All code paths that fail early have been changed to call netfsputfailedrequest() instead of netfsputrequest(). Additionally, I have added a netfsputrequest() call to netfsunbufferedread() as explained above because the netfsputfailedrequest() approach does not work there.

Database specific

{
    "osv_generated_from": "https://github.com/CVEProject/cvelistV5/tree/main/cves/2025/40xxx/CVE-2025-40007.json",
    "cna_assigner": "Linux"
}

References

Affected packages

Git / git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git

Affected ranges

Type: GIT
Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Events: Introduced

20d72b00ca814d748f5663484e5c53bb2bf37a3a

Fixed

8df142e93098b4531fadb5dfcf93087649f570b3

Fixed

4d428dca252c858bfac691c31fa95d26cd008706

Type: GIT
Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Events: Introduced

0 Unknown introduced commit / All previous commits are affected

Last affected

1a8360c2eed3b292ed654c2ac61b09de4a80e298

Database specific

source

"https://storage.googleapis.com/cve-osv-conversion/osv-output/CVE-2025-40007.json"

Linux / Kernel

Package

Name: Kernel

Affected ranges

Type: ECOSYSTEM
Events: Introduced

6.16.0

Fixed

6.16.10

Database specific

source

"https://storage.googleapis.com/cve-osv-conversion/osv-output/CVE-2025-40007.json"