In the Linux kernel, the following vulnerability has been resolved:
net/mlx5e: Wrap the tx reporter dump callback to extract the sq
Function mlx5etxreporterdumpsq() casts its void * argument to struct mlx5etxqsq *, but in TX-timeout-recovery flow the argument is actually of type struct mlx5etxtimeoutctx *.
mlx5core 0000:08:00.1 enp8s0f1: TX timeout detected mlx5core 0000:08:00.1 enp8s0f1: TX timeout on queue: 1, SQ: 0x11ec, CQ: 0x146d, SQ Cons: 0x0 SQ Prod: 0x1, usecs since last trans: 21565000 BUG: stack guard page was hit at 0000000093f1a2de (stack is 00000000b66ea0dc..000000004d932dae) kernel stack overflow (page fault): 0000 [#1] SMP NOPTI CPU: 5 PID: 95 Comm: kworker/u20:1 Tainted: G W OE 5.13.0mlnx #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5e mlx5etxtimeoutwork [mlx5core] RIP: 0010:mlx5etxreporterdumpsq+0xd3/0x180 [mlx5core] Call Trace: mlx5etxreporterdump+0x43/0x1c0 [mlx5core] devlinkhealthdodump.part.91+0x71/0xd0 devlinkhealthreport+0x157/0x1b0 mlx5ereportertxtimeout+0xb9/0xf0 [mlx5core] ? mlx5etxreportererrcqerecover+0x1d0/0x1d0 [mlx5core] ? mlx5ehealthqueuedump+0xd0/0xd0 [mlx5core] ? updateloadavg+0x19b/0x550 ? setnextentity+0x72/0x80 ? picknexttaskfair+0x227/0x340 ? finishtaskswitch+0xa2/0x280 mlx5etxtimeoutwork+0x83/0xb0 [mlx5core] processonework+0x1de/0x3a0 workerthread+0x2d/0x3c0 ? processonework+0x3a0/0x3a0 kthread+0x115/0x130 ? kthreadpark+0x90/0x90 retfromfork+0x1f/0x30 --[ end trace 51ccabea504edaff ]--- RIP: 0010:mlx5etxreporterdumpsq+0xd3/0x180 PKRU: 55555554 Kernel panic - not syncing: Fatal exception Kernel Offset: disabled end Kernel panic - not syncing: Fatal exception
To fix this bug add a wrapper for mlx5etxreporterdumpsq() which extracts the sq from struct mlx5etxtimeout_ctx and set it as the TX-timeout-recovery flow dump callback.