In the Linux kernel, the following vulnerability has been resolved:
md: Don't register sync_thread for reshape directly
Currently, if reshape is interrupted, then reassemble the array will register syncthread directly from pers->run(), in this case 'MDRECOVERYRUNNING' is set directly, however, there is no guarantee that mddosync() will be executed, hence stopsyncthread() will hang because 'MDRECOVERY_RUNNING' can't be cleared.
Last patch make sure that mddosync() will set MDRECOVERYDONE, however, following hang can still be triggered by dm-raid test shell/lvconvert-raid-reshape.sh occasionally:
[root@fedora ~]# cat /proc/1982/stack [<0>] stopsyncthread+0x1ab/0x270 [mdmod] [<0>] mdfrozensyncthread+0x5c/0xa0 [mdmod] [<0>] raidpresuspend+0x1e/0x70 [dmraid] [<0>] dmtablepresuspendtargets+0x40/0xb0 [dmmod] [<0>] _dmdestroy+0x2a5/0x310 [dmmod] [<0>] dmdestroy+0x16/0x30 [dmmod] [<0>] devremove+0x165/0x290 [dmmod] [<0>] ctlioctl+0x4bb/0x7b0 [dmmod] [<0>] dmctlioctl+0x11/0x20 [dmmod] [<0>] vfsioctl+0x21/0x60 [<0>] _x64sysioctl+0xb9/0xe0 [<0>] dosyscall64+0xc6/0x230 [<0>] entrySYSCALL64after_hwframe+0x6c/0x74
Meanwhile mddev->recovery is: MDRECOVERYRUNNING | MDRECOVERYINTR | MDRECOVERYRESHAPE | MDRECOVERYFROZEN
Fix this problem by remove the code to register syncthread directly from raid10 and raid5. And let mdcheckrecovery() to register syncthread.