In the Linux kernel, the following vulnerability has been resolved:
ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry
domqtimedreceive calls wqsleep with a stack local address. The sender (domqtimedsend) uses this address to later call pipelinedsend.
This leads to a very hard to trigger race where a domqtimedreceive call might return and leave domqtimedsend to rely on an invalid address, causing the following crash:
RIP: 0010:wakeqaddsafe+0x13/0x60 Call Trace: _x64sysmqtimedsend+0x2a9/0x490 dosyscall64+0x80/0x680 entrySYSCALL64after_hwframe+0x44/0xa9 RIP: 0033:0x7f5928e40343
The race occurs as:
domqtimedreceive calls wq_sleep with the address of struct
ext_wait_queue
on function stack (aliased as ewq_addr
here) - it
holds a valid struct ext_wait_queue *
as long as the stack has not
been overwritten.
ewq_addr
gets added to info->ewaitq[RECV].list in wqadd, and
domqtimedsend receives it via wqgetfirstwaiter(info, RECV) to call
_pipelinedop.
Sender calls _pipelinedop::smpstorerelease(&this->state,
STATE_READY). Here is where the race window begins. (this
is
ewq_addr
.)
If the receiver wakes up now in domqtimedreceive::wq_sleep, it
will see state == STATE_READY
and break.
domqtimedreceive returns, and ewq_addr
is no longer guaranteed
to be a struct ext_wait_queue *
since it was on domqtimedreceive's
stack. (Although the address may not get overwritten until another
function happens to touch it, which means it can persist around for an
indefinite time.)
domqtimedsend::_pipelinedop() still believes ewq_addr
is a
struct ext_wait_queue *
, and uses it to find a taskstruct to pass to
the wakeqaddsafe call. In the lucky case where nothing has
overwritten ewq_addr
yet, ewq_addr->task
is the right taskstruct.
In the unlucky case, _pipelinedop::wakeqaddsafe gets handed a
bogus address as the receiver's task_struct causing the crash.
domqtimedsend::_pipelinedop() should not dereference this
after
setting STATEREADY, as the receiver counterpart is now free to return.
Change _pipelinedop to call wakeqaddsafe on the receiver's
taskstruct returned by gettask_struct, instead of dereferencing this
which sits on the receiver's stack.
As Manfred pointed out, the race potentially also exists in ipc/msg.c::expungeall and ipc/sem.c::wakeupsemqueue_prepare. Fix those in the same way.