In the Linux kernel, the following vulnerability has been resolved:
soundwire: revisit driver bind/unbind and callbacks
In the SoundWire probe, we store a pointer from the driver ops into the 'slave' structure. This can lead to kernel oopses when unbinding codec drivers, e.g. with the following sequence to remove machine driver and codec driver.
/sbin/modprobe -r sndsocsofsdw /sbin/modprobe -r sndsoc_rt711
The full details can be found in the BugLink below, for reference the two following examples show different cases of driver ops/callbacks being invoked after the driver .remove().
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000150 kernel: Workqueue: events cdnsupdateslavestatuswork [soundwirecadence] kernel: RIP: 0010:mutexlock+0x19/0x30 kernel: Call Trace: kernel: ? sdwhandleslavestatus+0x426/0xe00 [soundwirebus 94ff184bf398570c3f8ff7efe9e32529f532e4ae] kernel: ? newidlebalance+0x26a/0x400 kernel: ? cdnsupdateslavestatuswork+0x1e9/0x200 [soundwirecadence 1bcf98eebe5ba9833cd433323769ac923c9c6f82]
kernel: BUG: unable to handle page fault for address: ffffffffc07654c8 kernel: Workqueue: pm pmruntimework kernel: RIP: 0010:sdwbusprepclkstop+0x6f/0x160 [soundwirebus] kernel: Call Trace: kernel: <TASK> kernel: sdwcdnsclockstop+0xb5/0x1b0 [soundwirecadence 1bcf98eebe5ba9833cd433323769ac923c9c6f82] kernel: intelsuspendruntime+0x5f/0x120 [soundwireintel aca858f7c87048d3152a4a41bb68abb9b663a1dd] kernel: ? dpmsysfsremove+0x60/0x60
This was not detected earlier in Intel tests since the tests first remove the parent PCI device and shut down the bus. The sequence above is a corner case which keeps the bus operational but without a driver bound.
While trying to solve this kernel oopses, it became clear that the existing SoundWire bus does not deal well with the unbind case.
Commit 528be501b7d4a ("soundwire: sdwslave: add probecomplete structure and new fields") added a 'probed' status variable and a 'probecomplete' struct completion. This status is however not reset on remove and likewise the 'probe complete' is not re-initialized, so the bind/unbind/bind test cases would fail. The timeout used before the 'updatestatus' callback was also a bad idea in hindsight, there should really be no timing assumption as to if and when a driver is bound to a device.
An initial draft was based on devicelock() and deviceunlock() was tested. This proved too complicated, with deadlocks created during the suspend-resume sequences, which also use the same devicelock/unlock() as the bind/unbind sequences. On a CometLake device, a bad DSDT/BIOS caused spurious resumes and the use of devicelock() caused hangs during suspend. After multiple weeks or testing and painful reverse-engineering of deadlocks on different devices, we looked for alternatives that did not interfere with the device core.
A bus notifier was used successfully to keep track of DRIVERBOUND and DRIVERUNBIND events. This solved the bind-unbind-bind case in tests, but it can still be defeated with a theoretical corner case where the memory is freed by a .remove while the callback is in use. The notifier only helps make sure the driver callbacks are valid, but not that the memory allocated in probe remains valid while the callbacks are invoked.
This patch suggests the introduction of a new 'sdwdevlock' mutex protecting probe/remove and all driver callbacks. Since this mutex is 'local' to SoundWire only, it does not interfere with existing locks and does not create deadlocks. In addition, this patch removes the 'probecomplete' completion, instead we directly invoke the 'updatestatus' from the probe routine. That removes any sort of timing dependency and a much better support for the device/driver model, the driver could be bound before the bus started, or eons after the bus started and the hardware would be properly initialized in all cases.
BugLink: https://github.com/thesofproject/linux/is ---truncated---