In the Linux kernel, the following vulnerability has been resolved: ibmvnic: fix race between xmit and reset There is a race between reset and the transmit paths that can lead to ibmvnicxmit() accessing an scrq after it has been freed in the reset path. It can result in a crash like: Kernel attempted to read user page (0) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc0080000016189f8 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c0080000016189f8] ibmvnicxmit+0x60/0xb60 [ibmvnic] LR [c000000000c0046c] devhardstartxmit+0x11c/0x280 Call Trace: [c008000001618f08] ibmvnicxmit+0x570/0xb60 [ibmvnic] (unreliable) [c000000000c0046c] devhardstartxmit+0x11c/0x280 [c000000000c9cfcc] schdirectxmit+0xec/0x330 [c000000000bfe640] _devxmitskb+0x3a0/0x9d0 [c000000000c00ad4] _devqueuexmit+0x394/0x730 [c008000002db813c] _bondstartxmit+0x254/0x450 [bonding] [c008000002db8378] bondstartxmit+0x40/0xc0 [bonding] [c000000000c0046c] devhardstartxmit+0x11c/0x280 [c000000000c00ca4] _devqueuexmit+0x564/0x730 [c000000000cf97e0] neighhhoutput+0xd0/0x180 [c000000000cfa69c] ipfinishoutput2+0x31c/0x5c0 [c000000000cfd244] _ipqueuexmit+0x194/0x4f0 [c000000000d2a3c4] _tcptransmitskb+0x434/0x9b0 [c000000000d2d1e0] _tcpretransmitskb+0x1d0/0x6a0 [c000000000d2d984] tcpretransmitskb+0x34/0x130 [c000000000d310e8] tcpretransmittimer+0x388/0x6d0 [c000000000d315ec] tcpwritetimerhandler+0x1bc/0x330 [c000000000d317bc] tcpwritetimer+0x5c/0x200 [c000000000243270] calltimerfn+0x50/0x1c0 [c000000000243704] _runtimers.part.0+0x324/0x460 [c000000000243894] runtimersoftirq+0x54/0xa0 [c000000000ea713c] _dosoftirq+0x15c/0x3e0 [c000000000166258] _irqexitrcu+0x158/0x190 [c000000000166420] irqexit+0x20/0x40 [c00000000002853c] timerinterrupt+0x14c/0x2b0 [c000000000009a00] decrementercommonvirt+0x210/0x220 --- interrupt: 900 at plparhcallnoretsnotrace+0x18/0x2c The immediate cause of the crash is the access of txscrq in the following snippet during a reset, where the txscrq can be either NULL or an address that will soon be invalid: ibmvnicxmit() { ... txscrq = adapter->txscrq[queuenum]; txq = netdevgettxqueue(netdev, queuenum); indbufp = &txscrq->indbuf; if (testbit(0, &adapter->resetting)) { ... } But beyond that, the call to ibmvnicxmit() itself is not safe during a reset and the reset path attempts to avoid this by stopping the queue in ibmvniccleanup(). However just after the queue was stopped, an in-flight ibmvniccompletetx() could have restarted the queue even as the reset is progressing. Since the queue was restarted we could get a call to ibmvnicxmit() which can then access the bad txscrq (or other fields). We cannot however simply have ibmvniccompletetx() check the ->resetting bit and skip starting the queue. This can race at the "back-end" of a good reset which just restarted the queue but has not cleared the ->resetting bit yet. If we skip restarting the queue due to ->resetting being true, the queue would remain stopped indefinitely potentially leading to transmit timeouts. IOW ->resetting is too broad for this purpose. Instead use a new flag that indicates whether or not the queues are active. Only the open/ reset paths control when the queues are active. ibmvniccompletetx() and others wake up the queue only if the queue is marked active. So we will have: A. reset/open thread in ibmvniccleanup() and _ibmvnicopen() ->resetting = true ->txqueuesactive = false disable tx queues ... ->txqueuesactive = true start tx queues B. Tx interrupt in ibmvniccompletetx(): if (->txqueuesactive) netifwakesubqueue(); To ensure that ->txqueuesactive and state of the queues are consistent, we need a lock which: - must also be taken in the interrupt path (ibmvniccomplete_tx()) - shared across the multiple ---truncated---