In the Linux kernel, the following vulnerability has been resolved:
RDMA/cma: Fix workqueue crash in cmaneteventwork_handler
struct rdmacmid has member "struct workstruct network" that is reused for enqueuing cmaneteventworkhandler()s onto cmawq.
Below crash[1] can occur if more than one call to cmaneteventcallback() occurs in quick succession, which further enqueues cmaneteventworkhandler()s for the same rdmacmid, overwriting any previously queued work-item(s) that was just scheduled to run i.e. there is no guarantee the queued work item may run between two successive calls to cmaneteventcallback() and the 2nd INITWORK would overwrite the 1st work item (for the same rdmacmid), despite grabbing idtablelock during enqueue.
Also drgn analysis [2] indicates the work item was likely overwritten.
Fix this by moving the INITWORK() to _rdmacreateid(), so that it doesn't race with any existing queue_work() or its worker thread.
BUG: kernel NULL pointer dereference, address: 0000000000000008 kworker/u256:6 ... 6.12.0-0... Workqueue: cmaneteventworkhandler [rdmacm] (rdmacm) RIP: 0010:processonework+0xba/0x31a Call Trace: workerthread+0x266/0x3a0 kthread+0xcf/0x100 retfromfork+0x31/0x50
[2] drgn crash analysis:
trace = prog.crashedthread().stacktrace() trace (0) crashsetupregs (./arch/x86/include/asm/kexec.h:111:15) (1) _crashkexec (kernel/crashcore.c:122:4) (2) panic (kernel/panic.c:399:3) (3) oopsend (arch/x86/kernel/dumpstack.c:382:3) ... (8) processonework (kernel/workqueue.c:3168:2) (9) processscheduledworks (kernel/workqueue.c:3310:3) (10) worker_thread (kernel/workqueue.c:3391:4) (11) kthread (kernel/kthread.c:389:9)
Line workqueue.c:3168 for this kernel version is in processonework(): 3168 strscpy(worker->desc, pwq->wq->name, WORKERDESCLEN);
trace[8]["work"] *(struct workstruct *)0xffff92577d0a21d8 = { .data = (atomiclongt){ .counter = (s64)536870912, <=== Note }, .entry = (struct listhead){ .next = (struct listhead *)0xffff924d075924c0, .prev = (struct listhead *)0xffff924d075924c0, }, .func = (workfunct)cmaneteventwork_handler+0x0 = 0xffffffffc2cec280, }
Suspicion is that pwq is NULL:
trace[8]["pwq"] (struct pool_workqueue *)<absent>
In processonework(), pwq is assigned from: struct poolworkqueue *pwq = getwork_pwq(work);
and getworkpwq() is: static struct poolworkqueue *getworkpwq(struct workstruct *work) { unsigned long data = atomiclongread(&work->data);
if (data & WORK_STRUCT_PWQ)
return work_struct_pwq(data);
else
return NULL;
}
WORKSTRUCTPWQ is 0x4:
print(repr(prog['WORKSTRUCTPWQ'])) Object(prog, 'enum work_flags', value=4)
But work->data is 536870912 which is 0x20000000. So, getworkpwq() returns NULL and we crash in processonework():