In the Linux kernel, the following vulnerability has been resolved:
btrfs: zoned: fix extent range end unlock in cowfilerange()
Running generic/751 on the for-next branch often results in a hang like below. They are both stack by locking an extent. This suggests someone forget to unlock an extent.
INFO: task kworker/u128:1:12 blocked for more than 323 seconds. Not tainted 6.13.0-BTRFS-ZNS+ #503 "echo 0 > /proc/sys/kernel/hungtasktimeoutsecs" disables this message. task:kworker/u128:1 state:D stack:0 pid:12 tgid:12 ppid:2 flags:0x00004000 Workqueue: btrfs-fixup btrfsworkhelper [btrfs] Call Trace: <TASK> _schedule+0x534/0xdd0 schedule+0x39/0x140 _lockextent+0x31b/0x380 [btrfs] ? _pfxautoremovewakefunction+0x10/0x10 btrfswritepagefixupworker+0xf1/0x3a0 [btrfs] btrfsworkhelper+0xff/0x480 [btrfs] ? lockrelease+0x178/0x2c0 processonework+0x1ee/0x570 ? srsoreturnthunk+0x5/0x5f workerthread+0x1d1/0x3b0 ? _pfxworkerthread+0x10/0x10 kthread+0x10b/0x230 ? _pfxkthread+0x10/0x10 retfromfork+0x30/0x50 ? _pfxkthread+0x10/0x10 retfromforkasm+0x1a/0x30 </TASK> INFO: task kworker/u134:0:184 blocked for more than 323 seconds. Not tainted 6.13.0-BTRFS-ZNS+ #503 "echo 0 > /proc/sys/kernel/hungtasktimeoutsecs" disables this message. task:kworker/u134:0 state:D stack:0 pid:184 tgid:184 ppid:2 flags:0x00004000 Workqueue: writeback wbworkfn (flush-btrfs-4) Call Trace: <TASK> _schedule+0x534/0xdd0 schedule+0x39/0x140 _lockextent+0x31b/0x380 [btrfs] ? _pfxautoremovewakefunction+0x10/0x10 findlockdelallocrange+0xdb/0x260 [btrfs] writepagedelalloc+0x12f/0x500 [btrfs] ? srsoreturnthunk+0x5/0x5f extentwritecachepages+0x232/0x840 [btrfs] btrfswritepages+0x72/0x130 [btrfs] dowritepages+0xe7/0x260 ? srsoreturnthunk+0x5/0x5f ? lockacquire+0xd2/0x300 ? srsoreturnthunk+0x5/0x5f ? findheldlock+0x2b/0x80 ? wbcattachandunlockinode.part.0+0x102/0x250 ? wbcattachandunlockinode.part.0+0x102/0x250 _writebacksingleinode+0x5c/0x4b0 writebacksbinodes+0x22d/0x550 _writebackinodeswb+0x4c/0xe0 wbwriteback+0x2f6/0x3f0 wbworkfn+0x32a/0x510 processonework+0x1ee/0x570 ? srsoreturnthunk+0x5/0x5f workerthread+0x1d1/0x3b0 ? _pfxworkerthread+0x10/0x10 kthread+0x10b/0x230 ? _pfxkthread+0x10/0x10 retfromfork+0x30/0x50 ? _pfxkthread+0x10/0x10 retfromfork_asm+0x1a/0x30 </TASK>
This happens because we have another success path for the zoned mode. When there is no active zone available, btrfsreserveextent() returns -EAGAIN. In this case, we have two reactions.
(1) If the given range is never allocated, we can only wait for someone to finish a zone, so wait on BTRFSFSNEEDZONEFINISH bit and retry afterward.
(2) Or, if some allocations are already done, we must bail out and let the caller to send IOs for the allocation. This is because these IOs may be necessary to finish a zone.
The commit 06f364284794 ("btrfs: do proper folio cleanup when cowfilerange() failed") moved the unlock code from the inside of the loop to the outside. So, previously, the allocated extents are unlocked just after the allocation and so before returning from the function. However, they are no longer unlocked on the case (2) above. That caused the hang issue.
Fix the issue by modifying the 'end' to the end of the allocated range. Then, we can exit the loop and the same unlock code can properly handle the case.