In the Linux kernel, the following vulnerability has been resolved:
md: Don't ignore read-only array in mdcheckrecovery()
Usually if the array is not read-write, mdcheckrecovery() won't register new syncthread in the first place. And if the array is read-write and syncthread is registered, mdsetreadonly() will unregister sync_thread before setting the array read-only. md/raid follow this behavior hence there is no problem.
After commit f52f5c71f3d4 ("md: fix stopping sync thread"), following hang can be triggered by test shell/integrity-caching.sh:
1) array is read-only. dm-raid update super block: rsupdatesbs ro = mddev->ro mddev->ro = 0 -> set array read-write mdupdatesb
2) register new sync thread concurrently.
3) dm-raid set array back to read-only: rsupdatesbs mddev->ro = ro
4) stop the array: raiddtr mdstop stopsyncthread setbit(MDRECOVERYINTR, &mddev->recovery); mdwakeupthreaddirectly(mddev->syncthread); waitevent(..., !testbit(MDRECOVERY_RUNNING, &mddev->recovery))
5) sync thread done: mddosync setbit(MDRECOVERYDONE, &mddev->recovery); mdwakeup_thread(mddev->thread);
6) daemon thread can't unregister sync thread: mdcheckrecovery if (!mdisrdwr(mddev) && !testbit(MDRECOVERYNEEDED, &mddev->recovery)) return; -> -> MDRECOVERY_RUNNING can't be cleared, hence step 4 hang;
The root cause is that dm-raid manipulate 'mddev->ro' by itself, however, dm-raid really should stop sync thread before setting the array read-only. Unfortunately, I need to read more code before I can refacter the handler of 'mddev->ro' in dm-raid, hence let's fix the problem the easy way for now to prevent dm-raid regression.