In the Linux kernel, the following vulnerability has been resolved:
net/mlx5: Reload only IB representors upon lag disable/enable
On lag disable, the bond IB device along with all of its representors are destroyed, and then the slaves' representors get reloaded.
In case the slave IB representor load fails, the eswitch error flow unloads all representors, including ethernet representors, where the netdevs get detached and removed from lag bond. Such flow is inaccurate as the lag driver is not responsible for loading/unloading ethernet representors. Furthermore, the flow described above begins by holding lag lock to prevent bond changes during disable flow. However, when reaching the ethernet representors detachment from lag, the lag lock is required again, triggering the following deadlock:
Call trace: _switchto+0xf4/0x148 _schedule+0x2c8/0x7d0 schedule+0x50/0xe0 schedulepreemptdisabled+0x18/0x28 _mutexlock.isra.13+0x2b8/0x570 _mutexlockslowpath+0x1c/0x28 mutexlock+0x4c/0x68 mlx5lagremovenetdev+0x3c/0x1a0 [mlx5core] mlx5euplinkrepdisable+0x70/0xa0 [mlx5core] mlx5edetachnetdev+0x6c/0xb0 [mlx5core] mlx5enetdevchangeprofile+0x44/0x138 [mlx5core] mlx5enetdevattachnicprofile+0x28/0x38 [mlx5core] mlx5evportrepunload+0x184/0x1b8 [mlx5core] mlx5eswoffloadsrepload+0xd8/0xe0 [mlx5core] mlx5eswitchreloadreps+0x74/0xd0 [mlx5core] mlx5disablelag+0x130/0x138 [mlx5core] mlx5lagdisablechange+0x6c/0x70 [mlx5core] // hold ldev->lock mlx5devlinkeswitchmodeset+0xc0/0x410 [mlx5core] devlinknlcmdeswitchsetdoit+0xdc/0x180 genlfamilyrcvmsgdoit.isra.17+0xe8/0x138 genlrcvmsg+0xe4/0x220 netlinkrcvskb+0x44/0x108 genlrcv+0x40/0x58 netlinkunicast+0x198/0x268 netlinksendmsg+0x1d4/0x418 socksendmsg+0x54/0x60 _syssendto+0xf4/0x120 _arm64syssendto+0x30/0x40 el0svccommon+0x8c/0x120 doel0svc+0x30/0xa0 el0svc+0x20/0x30 el0synchandler+0x90/0xb8 el0sync+0x160/0x180
Thus, upon lag enable/disable, load and unload only the IB representors of the slaves preventing the deadlock mentioned above.
While at it, refactor the mlx5eswoffloadsrepload() function to have a static helper method for its internal logic, in symmetry with the representor unload design.