In the Linux kernel, the following vulnerability has been resolved:
block: Prevent potential deadlocks in zone write plug error recovery
Zone write plugging for handling writes to zones of a zoned block device always execute a zone report whenever a write BIO to a zone fails. The intent of this is to ensure that the tracking of a zone write pointer is always correct to ensure that the alignment to a zone write pointer of write BIOs can be checked on submission and that we can always correctly emulate zone append operations using regular write BIOs.
However, this error recovery scheme introduces a potential deadlock if a device queue freeze is initiated while BIOs are still plugged in a zone write plug and one of these write operation fails. In such case, the disk zone write plug error recovery work is scheduled and executes a report zone. This in turn can result in a request allocation in the underlying driver to issue the report zones command to the device. But with the device queue freeze already started, this allocation will block, preventing the report zone execution and the continuation of the processing of the plugged BIOs. As plugged BIOs hold a queue usage reference, the queue freeze itself will never complete, resulting in a deadlock.
Avoid this problem by completely removing from the zone write plugging code the use of report zones operations after a failed write operation, instead relying on the device user to either execute a report zones, reset the zone, finish the zone, or give up writing to the device (which is a fairly common pattern for file systems which degrade to read-only after write failures). This is not an unreasonnable requirement as all well-behaved applications, FSes and device mapper already use report zones to recover from write errors whenever possible by comparing the current position of a zone write pointer with what their assumption about the position is.
The changes to remove the automatic error recovery are as follows: - Completely remove the error recovery work and its associated resources (zone write plug list head, disk error list, and disk zonewplugswork work struct). This also removes the functions diskzonewplugseterror() and diskzonewplugclearerror().
Change the BLKZONEWPLUGERROR zone write plug flag into BLKZONEWPLUGNEEDWPUPDATE. This new flag is set for a zone write plug whenever a write opration targetting the zone of the zone write plug fails. This flag indicates that the zone write pointer offset is not reliable and that it must be updated when the next report zone, reset zone, finish zone or disk revalidation is executed.
Modify blkzonewriteplugbioendio() to set the BLKZONEWPLUGNEEDWPUPDATE flag for the target zone of a failed write BIO.
Modify the function diskzonewplugsetwp_offset() to clear this new flag, thus implementing recovery of a correct write pointer offset with the reset (all) zone and finish zone operations.
Modify blkdevreportzones() to always use the diskreportzonescb() callback so that diskzonewplugsyncwpoffset() can be called for any zone marked with the BLKZONEWPLUGNEEDWPUPDATE flag. This implements recovery of a correct write pointer offset for zone write plugs marked with BLKZONEWPLUGNEEDWPUPDATE and within the range of the report zones operation executed by the user.
Modify blkrevalidateseqzone() to call diskzonewplugsyncwpoffset() for all sequential write required zones when a zoned block device is revalidated, thus always resolving any inconsistency between the write pointer offset of zone write plugs and the actual write pointer position of sequential zones.