In the Linux kernel, the following vulnerability has been resolved:
ice: fix Rx page leak on multi-buffer frames
The iceputrxmbuf() function handles calling iceputrxbuf() for each buffer in the current frame. This function was introduced as part of handling multi-buffer XDP support in the ice driver.
It works by iterating over the buffers from first_desc up to 1 plus the total number of fragments in the frame, cached from before the XDP program was executed.
If the hardware posts a descriptor with a size of 0, the logic used in iceputrxmbuf() breaks. Such descriptors get skipped and don't get added as fragments in iceaddxdpfrag. Since the buffer isn't counted as a fragment, we do not iterate over it in iceputrxmbuf(), and thus we don't call iceputrxbuf().
Because we don't call iceputrxbuf(), we don't attempt to re-use the page or free it. This leaves a stale page in the ring, as we don't increment nextto_alloc.
The icereuserxpage() assumes that the nexttoalloc has been incremented properly, and that it always points to a buffer with a NULL page. Since this function doesn't check, it will happily recycle a page over the top of the nextto_alloc buffer, losing track of the old page.
Note that this leak only occurs for multi-buffer frames. The iceputrx_mbuf() function always handles at least one buffer, so a single-buffer frame will always get handled correctly. It is not clear precisely why the hardware hands us descriptors with a size of 0 sometimes, but it happens somewhat regularly with "jumbo frames" used by 9K MTU.
To fix iceputrxmbuf(), we need to make sure to call iceputrxbuf() on all buffers between firstdesc and nexttoclean. Borrow the logic of a similar function in i40e used for this same purpose. Use the same logic also in iceget_pgcnts().
Instead of iterating over just the number of fragments, use a loop which iterates until the current index reaches to the nexttoclean element just past the current frame. Unlike i40e, the iceputrxmbuf() function does call iceputrxbuf() on the last buffer of the frame indicating the end of packet.
For non-linear (multi-buffer) frames, we need to take care when adjusting the pagecntbias. An XDP program might release fragments from the tail of the frame, in which case that fragment page is already released. Only update the pagecntbias for the first descriptor and fragments still remaining post-XDP program. Take care to only access the shared info for fragmented buffers, as this avoids a significant cache miss.
The xdpxmit value only needs to be updated if an XDP program is run, and only once per packet. Drop the xdpxmit pointer argument from iceputrxmbuf(). Instead, set xdpxmit in the icecleanrx_irq() function directly. This avoids needing to pass the argument and avoids an extra bit-wise OR for each buffer in the frame.
Move the increment of the ntc local variable to ensure its updated before all calls to icegetpgcnts() or iceputrx_mbuf(), as the loop logic requires the index of the element just after the current frame.
Now that we use an index pointer in the ring to identify the packet, we no longer need to track or cache the number of fragments in the rx_ring.