In the Linux kernel, the following vulnerability has been resolved:
md: Don't ignore suspended array in mdcheckrecovery()
mddevsuspend() never stop syncthread, hence it doesn't make sense to ignore suspended array in mdcheckrecovery(), which might cause sync_thread can't be unregistered.
After commit f52f5c71f3d4 ("md: fix stopping sync thread"), following hang can be triggered by test shell/integrity-caching.sh:
1) suspend the array: raidpostsuspend mddevsuspend
2) stop the array: raiddtr mdstop _mdstopwrites stopsyncthread setbit(MDRECOVERYINTR, &mddev->recovery); mdwakeupthreaddirectly(mddev->syncthread); waitevent(..., !testbit(MDRECOVERYRUNNING, &mddev->recovery))
3) sync thread done: mddosync setbit(MDRECOVERYDONE, &mddev->recovery); mdwakeup_thread(mddev->thread);
4) daemon thread can't unregister sync thread: mdcheckrecovery if (mddev->suspended) return; -> return directly mdreadsyncthread clearbit(MDRECOVERYRUNNING, &mddev->recovery); -> MDRECOVERYRUNNING can't be cleared, hence step 2 hang;
This problem is not just related to dm-raid, fix it by ignoring suspended array in mdcheckrecovery(). And follow up patches will improve dm-raid better to frozen sync thread during suspend.