In the Linux kernel, the following vulnerability has been resolved:
clk: Get runtime PM before walking tree during disable_unused
Doug reported [1] the following hung task:
INFO: task swapper/0:1 blocked for more than 122 seconds. Not tainted 5.15.149-21875-gf795ebc40eb8 #1 "echo 0 > /proc/sys/kernel/hungtasktimeoutsecs" disables this message. task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00000008 Call trace: _switchto+0xf4/0x1f4 _schedule+0x418/0xb80 schedule+0x5c/0x10c rpmresume+0xe0/0x52c rpmresume+0x178/0x52c _pmruntimeresume+0x58/0x98 clkpmruntimeget+0x30/0xb0 clkdisableunusedsubtree+0x58/0x208 clkdisableunusedsubtree+0x38/0x208 clkdisableunusedsubtree+0x38/0x208 clkdisableunusedsubtree+0x38/0x208 clkdisableunusedsubtree+0x38/0x208 clkdisableunused+0x4c/0xe4 dooneinitcall+0xcc/0x2d8 doinitcalllevel+0xa4/0x148 doinitcalls+0x5c/0x9c dobasicsetup+0x24/0x30 kernelinitfreeable+0xec/0x164 kernelinit+0x28/0x120 retfromfork+0x10/0x20 INFO: task kworker/u16:0:9 blocked for more than 122 seconds. Not tainted 5.15.149-21875-gf795ebc40eb8 #1 "echo 0 > /proc/sys/kernel/hungtasktimeoutsecs" disables this message. task:kworker/u16:0 state:D stack: 0 pid: 9 ppid: 2 flags:0x00000008 Workqueue: eventsunbound deferredprobeworkfunc Call trace: _switchto+0xf4/0x1f4 _schedule+0x418/0xb80 schedule+0x5c/0x10c schedulepreemptdisabled+0x2c/0x48 _mutexlock+0x238/0x488 _mutexlockslowpath+0x1c/0x28 mutexlock+0x50/0x74 clkpreparelock+0x7c/0x9c clkcorepreparelock+0x20/0x44 clkprepare+0x24/0x30 clkbulkprepare+0x40/0xb0 mdssruntimeresume+0x54/0x1c8 pmgenericruntimeresume+0x30/0x44 _genpdruntimeresume+0x68/0x7c genpdruntimeresume+0x108/0x1f4 _rpmcallback+0x84/0x144 rpmcallback+0x30/0x88 rpmresume+0x1f4/0x52c rpmresume+0x178/0x52c _pmruntimeresume+0x58/0x98 _deviceattach+0xe0/0x170 deviceinitialprobe+0x1c/0x28 busprobedevice+0x3c/0x9c deviceadd+0x644/0x814 mipidsideviceregisterfull+0xe4/0x170 devmmipidsideviceregisterfull+0x28/0x70 tisnbridgeprobe+0x1dc/0x2c0 auxiliarybusprobe+0x4c/0x94 reallyprobe+0xcc/0x2c8 _driverprobedevice+0xa8/0x130 driverprobedevice+0x48/0x110 _deviceattachdriver+0xa4/0xcc busforeachdrv+0x8c/0xd8 _deviceattach+0xf8/0x170 deviceinitialprobe+0x1c/0x28 busprobedevice+0x3c/0x9c deferredprobeworkfunc+0x9c/0xd8 processonework+0x148/0x518 workerthread+0x138/0x350 kthread+0x138/0x1e0 retfromfork+0x10/0x20
The first thread is walking the clk tree and calling clkpmruntimeget() to power on devices required to read the clk hardware via struct clkops::isenabled(). This thread holds the clk preparelock, and is trying to runtime PM resume a device, when it finds that the device is in the process of resuming so the thread schedule()s away waiting for the device to finish resuming before continuing. The second thread is runtime PM resuming the same device, but the runtime resume callback is calling clkprepare(), trying to grab the preparelock waiting on the first thread.
This is a classic ABBA deadlock. To properly fix the deadlock, we must never runtime PM resume or suspend a device with the clk preparelock held. Actually doing that is near impossible today because the global preparelock would have to be dropped in the middle of the tree, the device runtime PM resumed/suspended, and then the prepare_lock grabbed again to ensure consistency of the clk tree topology. If anything changes with the clk tree in the meantime, we've lost and will need to start the operation all over again.
Luckily, most of the time we're simply incrementing or decrementing the runtime PM count on an active device, so we don't have the chance to schedule away with the prepare_lock held. Let's fix this immediate problem that can be ---truncated---