In the Linux kernel, the following vulnerability has been resolved:
netfs: fix reference leak
Commit 20d72b00ca81 ("netfs: Fix the request's work item to not require a ref") modified netfsallocrequest() to initialize the reference counter to 2 instead of 1. The rationale was that the requet's "work" would release the second reference after completion (via netfs_{read,write}collectionworker()). That works most of the time if all goes well.
However, it leaks this additional reference if the request is released before the I/O operation has been submitted: the error code path only decrements the reference counter once and the work item will never be queued because there will never be a completion.
This has caused outages of our whole server cluster today because
tasks were blocked in netfswaitforoutstandingio(), leading to
deadlocks in Ceph (another bug that I will address soon in another
patch). This was caused by a netfspgpriv2begincopytocache() call
which failed in fscachebeginwriteoperation(). The leaked
netfsiorequest was never completed, leaving netfs_inode.io_count
with a positive value forever.
All of this is super-fragile code. Finding out which code paths will lead to an eventual completion and which do not is hard to see:
Some functions like netfscreatewrite_req() allocate a request, but will never submit any I/O.
netfsunbufferedreaditerlocked() calls netfsunbufferedread() and then netfsputrequest(); however, netfsunbufferedread() can also fail early before submitting the I/O request, therefore another netfsputrequest() call must be added there.
A rule of thumb is that functions that return a netfs_io_request do
not submit I/O, and all of their callers must be checked.
For my taste, the whole netfs code needs an overhaul to make reference counting easier to understand and less fragile & obscure. But to fix this bug here and now and produce a patch that is adequate for a stable backport, I tried a minimal approach that quickly frees the request object upon early failure.
I decided against adding a second netfsputrequest() each time because that would cause code duplication which obscures the code further. Instead, I added the function netfsputfailedrequest() which frees such a failed request synchronously under the assumption that the reference count is exactly 2 (as initially set by netfsallocrequest() and never touched), verified by a WARNONONCE(). It then deinitializes the request object (without going through the "cleanupwork" indirection) and frees the allocation (with RCU protection to protect against concurrent access by netfsrequestsseq_start()).
All code paths that fail early have been changed to call netfsputfailedrequest() instead of netfsputrequest(). Additionally, I have added a netfsputrequest() call to netfsunbufferedread() as explained above because the netfsputfailedrequest() approach does not work there.
{
"osv_generated_from": "https://github.com/CVEProject/cvelistV5/tree/main/cves/2025/40xxx/CVE-2025-40007.json",
"cna_assigner": "Linux"
}