In the Linux kernel, the following vulnerability has been resolved:
net: ntbnetdev: Move ntbnetdevrxhandler() to call netifrx() from _netif_rx()
The following is emitted when using idxd (DSA) dmanegine as the data mover for ntbtransport that ntbnetdev uses.
[74412.546922] BUG: using smpprocessorid() in preemptible [00000000] code: irq/52-idxd-por/14526 [74412.556784] caller is netifrxinternal+0x42/0x130 [74412.562282] CPU: 6 PID: 14526 Comm: irq/52-idxd-por Not tainted 6.9.5 #5 [74412.569870] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.E9I.1752.P05.2402080856 02/08/2024 [74412.581699] Call Trace: [74412.584514] <TASK> [74412.586933] dumpstacklvl+0x55/0x70 [74412.591129] checkpreemptiondisabled+0xc8/0xf0 [74412.596374] netifrxinternal+0x42/0x130 [74412.600957] _netifrx+0x20/0xd0 [74412.604743] ntbnetdevrxhandler+0x66/0x150 [ntbnetdev] [74412.610985] ntbcompleterxc+0xed/0x140 [ntbtransport] [74412.617010] ntbrxcopycallback+0x53/0x80 [ntbtransport] [74412.623332] idxddmacompletetxd+0xe3/0x160 [idxd] [74412.628963] idxdwqthread+0x1a6/0x2b0 [idxd] [74412.634046] irqthreadfn+0x21/0x60 [74412.638134] ? irqthread+0xa8/0x290 [74412.642218] irqthread+0x1a0/0x290 [74412.646212] ? _pfxirqthreadfn+0x10/0x10 [74412.651071] ? _pfxirqthreaddtor+0x10/0x10 [74412.656117] ? _pfxirqthread+0x10/0x10 [74412.660686] kthread+0x100/0x130 [74412.664384] ? _pfxkthread+0x10/0x10 [74412.668639] retfromfork+0x31/0x50 [74412.672716] ? _pfxkthread+0x10/0x10 [74412.676978] retfromforkasm+0x1a/0x30 [74412.681457] </TASK>
The cause is due to the idxd driver interrupt completion handler uses threaded interrupt and the threaded handler is not hard or soft interrupt context. However _netifrx() can only be called from interrupt context. Change the call to netif_rx() in order to allow completion via normal context for dmaengine drivers that utilize threaded irq handling.
While the following commit changed from netifrx() to _netifrx(), baebdf48c360 ("net: dev: Makes sure netifrx() can be invoked in any context."), the change should've been a noop instead. However, the code precedes this fix should've been using netifrxni() or netifrxany_context().