In the Linux kernel, the following vulnerability has been resolved:
ice: xsk: disable txq irq before flushing hw
iceqpdis() intends to stop a given queue pair that is a target of xsk pool attach/detach. One of the steps is to disable interrupts on these queues. It currently is broken in a way that txq irq is turned off after HW flush which in turn takes no effect.
iceqpdis(): -> iceqvecdisirq() --> disable rxq irq --> flush hw -> icevsistoptx_ring() -->disable txq irq
Below splat can be triggered by following steps: - start xdpsock WITHOUT loading xdp prog - run xdprxqinfo with XDP_TX action on this interface - start traffic - terminate xdpsock
[ 256.312485] BUG: kernel NULL pointer dereference, address: 0000000000000018 [ 256.319560] #PF: supervisor read access in kernel mode [ 256.324775] #PF: errorcode(0x0000) - not-present page [ 256.329994] PGD 0 P4D 0 [ 256.332574] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 256.337006] CPU: 3 PID: 32 Comm: ksoftirqd/3 Tainted: G OE 6.2.0-rc5+ #51 [ 256.345218] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 256.355807] RIP: 0010:icecleanrxirqzc+0x9c/0x7d0 [ice] [ 256.361423] Code: b7 8f 8a 00 00 00 66 39 ca 0f 84 f1 04 00 00 49 8b 47 40 4c 8b 24 d0 41 0f b7 45 04 66 25 ff 3f 66 89 04 24 0f 84 85 02 00 00 <49> 8b 44 24 18 0f b7 14 24 48 05 00 01 00 00 49 89 04 24 49 89 44 [ 256.380463] RSP: 0018:ffffc900088bfd20 EFLAGS: 00010206 [ 256.385765] RAX: 000000000000003c RBX: 0000000000000035 RCX: 000000000000067f [ 256.393012] RDX: 0000000000000775 RSI: 0000000000000000 RDI: ffff8881deb3ac80 [ 256.400256] RBP: 000000000000003c R08: ffff889847982710 R09: 0000000000010000 [ 256.407500] R10: ffffffff82c060c0 R11: 0000000000000004 R12: 0000000000000000 [ 256.414746] R13: ffff88811165eea0 R14: ffffc9000d255000 R15: ffff888119b37600 [ 256.421990] FS: 0000000000000000(0000) GS:ffff8897e0cc0000(0000) knlGS:0000000000000000 [ 256.430207] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 256.436036] CR2: 0000000000000018 CR3: 0000000005c0a006 CR4: 00000000007706e0 [ 256.443283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 256.450527] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 256.457770] PKRU: 55555554 [ 256.460529] Call Trace: [ 256.463015] <TASK> [ 256.465157] ? icexmitzc+0x6e/0x150 [ice] [ 256.469437] icenapipoll+0x46d/0x680 [ice] [ 256.473815] ? _rawspinunlockirqrestore+0x1b/0x40 [ 256.478863] _napipoll+0x29/0x160 [ 256.482409] netrxaction+0x136/0x260 [ 256.486222] _dosoftirq+0xe8/0x2e5 [ 256.489853] ? smpbootthreadfn+0x2c/0x270 [ 256.494108] runksoftirqd+0x2a/0x50 [ 256.497747] smpbootthreadfn+0x1c1/0x270 [ 256.501907] ? _pfxsmpbootthreadfn+0x10/0x10 [ 256.506594] kthread+0xea/0x120 [ 256.509785] ? _pfxkthread+0x10/0x10 [ 256.513597] retfrom_fork+0x29/0x50 [ 256.517238] </TASK>
In fact, irqs were not disabled and napi managed to be scheduled and run while xskpool pointer was still valid, but SW ring of xdpbuff pointers was already freed.
To fix this, call iceqvecdisirq() after icevsistoptxring(). Also while at it, remove redundant icecleanrxring() call - this is handled in iceqpclean_rings().