In the Linux kernel, the following vulnerability has been resolved:
net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
When using the felix driver (the only one which supports UC filtering and MC filtering) as a DSA master for a random other DSA switch, one can see the following stack trace when the downstream switch ports join a VLAN-aware bridge:
=============================
net/8021q/vlancore.c:238 suspicious rcudereference_protected() usage!
stack backtrace: Workqueue: dsaordered dsaslaveswitchdeveventwork Call trace: lockdeprcususpicious+0x170/0x210 vlanforeach+0x8c/0x188 dsaslavesyncuc+0x128/0x178 _hwaddrsyncdev+0x138/0x158 dsaslavesetrxmode+0x58/0x70 _devsetrxmode+0x88/0xa8 devucadd+0x74/0xa0 dsaportbridgehostfdbadd+0xec/0x180 dsaslaveswitchdeveventwork+0x7c/0x1c8 processone_work+0x290/0x568
What it's saying is that vlanforeach() expects rtnllock() context and it's not getting it, when it's called from the DSA master's ndosetrxmode().
The caller of that - dsaslavesetrxmode() - is the slave DSA interface's dsaportbridgehostfdbadd() which comes from the deferred dsaslaveswitchdevevent_work().
We went to great lengths to avoid the rtnllock() context in that call path in commit 0faf890fc519 ("net: dsa: drop rtnllock from dsaslaveswitchdeveventwork"), and calling rtnllock() is simply not an option due to the possibility of deadlocking when calling dsaflushworkqueue() from the call paths that do hold rtnllock() - basically all of them.
So, when the DSA master calls vlanforeach() from its ndosetrx_mode(), the state of the 8021q driver on this device is really not protected from concurrent access by anything.
Looking at net/8021q/, I don't think that vlaninfo->vidlist was particularly designed with RCU traversal in mind, so introducing an RCU read-side form of vlanforeach() - vlanforeach_rcu() - won't be so easy, and it also wouldn't be exactly what we need anyway.
In general I believe that the solution isn't in net/8021q/ anyway; vlanforeach() is not cut out for this task. DSA doesn't need rtnllock() to be held per se - since it's not a netdev state change that we're blocking, but rather, just concurrent additions/removals to a VLAN list. We don't even need sleepable context - the callback of vlanfor_each() just schedules deferred work.
The proposed escape is to remove the dependency on vlanforeach() and to open-code a non-sleepable, rtnl-free alternative to that, based on copies of the VLAN list modified from .ndovlanrxaddvid() and .ndovlanrxkillvid().
{
"cna_assigner": "Linux",
"osv_generated_from": "https://github.com/CVEProject/cvelistV5/tree/main/cves/2023/54xxx/CVE-2023-54149.json"
}