In the Linux kernel, the following vulnerability has been resolved:
netfilter: bridge: confirm multicast packets before passing them up the stack
conntrack nfconfirm logic cannot handle cloned skbs referencing the same nfconn entry, which will happen for multicast (broadcast) frames on bridges.
Example: macvlan0 | br0 / \ ethX ethY
ethX (or Y) receives a L2 multicast or broadcast packet containing an IP packet, flow is not yet in conntrack table.
In macvlan case, macvlan driver retains clone(s) of the mcast skb and schedules a work queue to send them out on the lower devices.
The clone skb->nfct is not a copy, it is the same entry as the original skb. The macvlan rx handler then returns RXHANDLER_PASS.
The Macvlan broadcast worker and normal confirm path will race.
This race will not happen if step 2 already confirmed a clone. In that case later steps perform skbclone() with skb->nfct already confirmed (in hash table). This works fine.
But such confirmation won't happen when eb/ip/nftables rules dropped the packets before they reached the nf_confirm step in postrouting.
Pablo points out that nfconntrackbridge doesn't allow use of stateful nat, so we can safely discard the nf_conn entry and let inet call conntrack again.
This doesn't work for bridge netfilter: skb could have a nat transformation. Also bridge nf prevents re-invocation of inet prerouting via 'sabotage_in' hook.
Work around this problem by explicit confirmation of the entry at LOCAL_IN time, before upper layer has a chance to clone the unconfirmed entry.
The downside is that this disables NAT and conntrack helpers.
Alternative fix would be to add locking to all code parts that deal with unconfirmed packets, but even if that could be done in a sane way this opens up other problems, for example:
-m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4 -m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5
For multicast case, only one of such conflicting mappings will be created, conntrack only handles 1:1 NAT mappings.
Users should set create a setup that explicitly marks such traffic NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass them, ruleset might have accept rules for untracked traffic already, so user-visible behaviour would change.