In the Linux kernel, the following vulnerability has been resolved:
net/smc: fix kernel panic caused by race of smc_sock
A crash occurs when smccdctxhandler() tries to access smcsock but smc_release() has already freed it.
[ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88 [ 4570.696048] #PF: supervisor write access in kernel mode [ 4570.696728] #PF: errorcode(0x0002) - not-present page [ 4570.697401] PGD 0 P4D 0 [ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111 [ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0 [ 4570.699933] RIP: 0010:rawspinlock+0x1a/0x30 <...> [ 4570.711446] Call Trace: [ 4570.711746] <IRQ> [ 4570.711992] smccdctxhandler+0x41/0xc0 [ 4570.712470] smcwrtxtaskletfn+0x213/0x560 [ 4570.712981] ? smccdctxdismisser+0x10/0x10 [ 4570.713489] taskletactioncommon.isra.17+0x66/0x140 [ 4570.714083] _dosoftirq+0x123/0x2f4 [ 4570.714521] irqexitrcu+0xc4/0xf0 [ 4570.714934] common_interrupt+0xba/0xe0
Though smccdctxhandler() checked the existence of smc connection, smcrelease() may have already dismissed and released the smc socket before smccdctx_handler() further visits it.
smccdctxhandler() |smcrelease() if (!conn) | | |smccdctxdismissslots() | smccdctxdismisser() | |sockput(&smc->sk) <- last sockput, | smcsock freed bhlocksock(&smc->sk) (panic) |
To make sure we won't receive any CDC messages after we free the smcsock, add a refcount on the smcconnection for inflight CDC message(posted to the QP but haven't received related CQE), and don't release the smc_connection until all the inflight CDC messages haven been done, for both success or failed ones.
Using refcount on CDC messages brings another problem: when the link is going to be destroyed, smcrlinkclear() will reset the QP, which then remove all the pending CQEs related to the QP in the CQ. To make sure all the CQEs will always come back so the refcount on the smcconnection can always reach 0, smcibmodifyqpreset() was replaced by smcibmodifyqperror(). And remove the timeout in smcwrtxwaitnopending_sends() since we need to wait for all pending WQEs done, or we may encounter use-after- free when handling CQEs.
For IB device removal routine, we need to wait for all the QPs on that device been destroyed before we can destroy CQs on the device, or the refcount on smcconnection won't reach 0 and smcsock cannot be released.