In the Linux kernel, the following vulnerability has been resolved:
KVM: Use dedicated mutex to protect kvmusagecount to avoid deadlock
Use a dedicated mutex to guard kvmusagecount to fix a potential deadlock on x86 due to a chain of locks and SRCU synchronizations. Translating the below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the fairness of r/w semaphores).
CPU0 CPU1 CPU2
1 lock(&kvm->slotslock); 2 lock(&vcpu->mutex); 3 lock(&kvm->srcu); 4 lock(cpuhotpluglock); 5 lock(kvmlock); 6 lock(&kvm->slotslock); 7 lock(cpuhotplug_lock); 8 sync(&kvm->srcu);
Note, there are likely more potential deadlocks in KVM x86, e.g. the same pattern of taking cpuhotpluglock outside of kvmlock likely exists with _kvmclockcpufreqnotifier():
cpuhpcpufreqonline() | -> cpufreqonline() | -> cpufreqgovperformancelimits() | -> _cpufreqdrivertarget() | -> _targetindex() | -> cpufreqfreqtransitionbegin() | -> cpufreqnotifytransition() | -> ... _kvmclockcpufreq_notifier()
But, actually triggering such deadlocks is beyond rare due to the combination of dependencies and timings involved. E.g. the cpufreq notifier is only used on older CPUs without a constant TSC, mucking with the NX hugepage mitigation while VMs are running is very uncommon, and doing so while also onlining/offlining a CPU (necessary to generate contention on cpuhotpluglock) would be even more unusual.
The most robust solution to the general cpuhotpluglock issue is likely to switch vmlist to be an RCU-protected list, e.g. so that x86's cpufreq notifier doesn't to take kvmlock. For now, settle for fixing the most blatant deadlock, as switching to an RCU-protected list is a much more involved change, but add a comment in locking.rst to call out that care needs to be taken when walking holding kvmlock and walking vmlist.
====================================================== WARNING: possible circular locking dependency detected 6.10.0-smp--c257535a0c9d-pip #330 Tainted: G S O
tee/35048 is trying to acquire lock: ff6a80eced71e0a8 (&kvm->slotslock){+.+.}-{3:3}, at: setnxhugepages+0x179/0x1e0 [kvm]
but task is already holding lock: ffffffffc07abb08 (kvmlock){+.+.}-{3:3}, at: setnxhugepages+0x14a/0x1e0 [kvm]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (kvmlock){+.+.}-{3:3}: _mutexlock+0x6a/0xb40 mutexlocknested+0x1f/0x30 kvmdevioctl+0x4fb/0xe50 [kvm] _sesysioctl+0x7b/0xd0 _x64sysioctl+0x21/0x30 x64syscall+0x15d0/0x2e60 dosyscall64+0x83/0x160 entrySYSCALL64after_hwframe+0x76/0x7e
-> #2 (cpuhotpluglock){++++}-{0:0}: cpusreadlock+0x2e/0xb0 statickeyslowinc+0x16/0x30 kvmlapicsetbase+0x6a/0x1c0 [kvm] kvmsetapicbase+0x8f/0xe0 [kvm] kvmsetmsrcommon+0x9ae/0xf80 [kvm] vmxsetmsr+0xa54/0xbe0 [kvmintel] _kvmsetmsr+0xb6/0x1a0 [kvm] kvmarchvcpuioctl+0xeca/0x10c0 [kvm] kvmvcpuioctl+0x485/0x5b0 [kvm] _sesysioctl+0x7b/0xd0 _x64sysioctl+0x21/0x30 x64syscall+0x15d0/0x2e60 dosyscall64+0x83/0x160 entrySYSCALL64after_hwframe+0x76/0x7e
-> #1 (&kvm->srcu){.+.+}-{0:0}: _synchronizesrcu+0x44/0x1a0
---truncated---