In the Linux kernel, the following vulnerability has been resolved:
IB/hfi1: Fix sdma.h tx->num_descs off-by-one error
Unfortunately the commit fd8958efe877
introduced another error
causing the descs
array to overflow. This reults in further crashes
easily reproducible by sendmsg
system call.
[ 1080.836473] general protection fault, probably for non-canonical address 0x400300015528b00a: 0000 [#1] PREEMPT SMP PTI
[ 1080.974535] Call Trace: [ 1080.976990] <TASK> [ 1081.021929] hfi1ipoibsenddmacommon+0x7a/0x2e0 [hfi1] [ 1081.027364] hfi1ipoibsenddmalist+0x62/0x270 [hfi1] [ 1081.032633] hfi1ipoibsend+0x112/0x300 [hfi1] [ 1081.042001] ipoibstartxmit+0x2a9/0x2d0 [ib_ipoib]
[ 1081.148347] _syssendmsg+0x59/0xa0
crash> ipoibtxreq 0xffff9cfeba229f00 struct ipoibtxreq { txreq = { list = { next = 0xffff9cfeba229f00, prev = 0xffff9cfeba229f00 }, descp = 0xffff9cfeba229f40, coalescebuf = 0x0, wait = 0xffff9cfea4e69a48, complete = 0xffffffffc0fe0760 <hfi1_ipoib_sdma_complete>, packetlen = 0x46d, tlen = 0x0, numdesc = 0x0, desclimit = 0x6, nextdescqidx = 0x45c, coalesceidx = 0x0, flags = 0x0, descs = {{ qw = {0x8024000120dffb00, 0x4} # SDMADESC0FIRSTDESCFLAG (bit 63) }, { qw = { 0x3800014231b108, 0x4} }, { qw = { 0x310000e4ee0fcf0, 0x8} }, { qw = { 0x3000012e9f8000, 0x8} }, { qw = { 0x59000dfb9d0000, 0x8} }, { qw = { 0x78000e02e40000, 0x8} }} }, sdmahdr = 0x400300015528b000, <<< invalid pointer in the tx request structure sdmastatus = 0x0, SDMADESC0LASTDESC_FLAG (bit 62) complete = 0x0, priv = 0x0, txq = 0xffff9cfea4e69880, skb = 0xffff9d099809f400 }
If an SDMA send consists of exactly 6 descriptors and requires dword padding (in the 7th descriptor), the sdmatxreq descriptor array is not properly expanded and the packet will overflow into the container structure. This results in a panic when the send completion runs. The exact panic varies depending on what elements of the container structure get corrupted. The fix is to use the correct expression in _padsdmatxdescs() to test the need to expand the descriptor array.
With this patch the crashes are no longer reproducible and the machine is stable.