In the Linux kernel, the following vulnerability has been resolved:
md: fix deadlock between mddev_suspend and flush bio
Deadlock occurs when mddev is being suspended while some flush bio is in progress. It is a complex issue.
T1. the first flush is at the ending stage, it clears 'mddev->flushbio' and tries to submit data, but is blocked because mddev is suspended by T4. T2. the second flush sets 'mddev->flushbio', and attempts to queue mdsubmitflushdata(), which is already running (T1) and won't execute again if on the same CPU as T1. T3. the third flush inc activeio and tries to flush, but is blocked because 'mddev->flushbio' is not NULL (set by T2). T4. mddevsuspend() is called and waits for active_io dec to 0 which is inc by T3.
T1 T2 T3 T4 (flush 1) (flush 2) (third 3) (suspend) mdsubmitflushdata mddev->flushbio = NULL; . . mdflushrequest . mddev->flushbio = bio . queue submitflushes . . . . mdhandlerequest . . activeio + 1 . . mdflushrequest . . wait !mddev->flushbio . . . . mddevsuspend . . wait !activeio . . . submitflushes . queuework mdsubmitflushdata . //mdsubmitflushdata is already running (T1) . mdhandlerequest wait resume
The root issue is non-atomic inc/dec of activeio during flush process. activeio is dec before mdsubmitflushdata is queued, and inc soon after mdsubmitflushdata() run. mdflushrequest activeio + 1 submitflushes activeio - 1 mdsubmitflushdata mdhandlerequest activeio + 1 makerequest active_io - 1
If activeio is dec after mdhandlerequest() instead of within submitflushes(), makerequest() can be called directly intead of mdhandlerequest() in mdsubmitflushdata(), and active_io will only inc and dec once in the whole flush process. Deadlock will be fixed.
Additionally, the only difference between fixing the issue and before is that there is no return error handling of makerequest(). But after previous patch cleaned mdwritestart(), makerequst() only return error in raid5makerequest() by dm-raid, see commit 41425f96d7aa ("dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape)". Since dm always splits data and flush operation into two separate io, io size of flush submitted by dm always is 0, makerequest() will not be called in mdsubmitflushdata(). To prevent future modifications from introducing issues, add WARNON to ensure makerequest() no error is returned in this context.