In the Linux kernel, the following vulnerability has been resolved:
netfilter: conntrack: serialize hash resizes and cleanups
Syzbot was able to trigger the following warning [1]
No repro found by syzbot yet but I was able to trigger similar issue by having 2 scripts running in parallel, changing conntrack hash sizes, and:
for j in seq 1 1000
; do unshare -n /bin/true >/dev/null ; done
It would take more than 5 minutes for net_namespace structures to be cleaned up.
This is because nfctiterate_cleanup() has to restart everytime a resize happened.
By adding a mutex, we can serialize hash resizes and cleanups and also make getnextcorpse() faster by skipping over empty buckets.
Even without resizes in the picture, this patch considerably speeds up network namespace dismantles.
[1] INFO: task syz-executor.0:8312 can't die for more than 144 seconds. task:syz-executor.0 state:R running task stack:25672 pid: 8312 ppid: 6573 flags:0x00004006 Call Trace: contextswitch kernel/sched/core.c:4955 [inline] _schedule+0x940/0x26f0 kernel/sched/core.c:6236 preemptschedulecommon+0x45/0xc0 kernel/sched/core.c:6408 preemptschedulethunk+0x16/0x18 arch/x86/entry/thunk64.S:35 _localbhenableip+0x109/0x120 kernel/softirq.c:390 localbhenable include/linux/bottomhalf.h:32 [inline] getnextcorpse net/netfilter/nfconntrackcore.c:2252 [inline] nfctiteratecleanup+0x15a/0x450 net/netfilter/nfconntrackcore.c:2275 nfconntrackcleanupnetlist+0x14c/0x4f0 net/netfilter/nfconntrackcore.c:2469 opsexitlist+0x10d/0x160 net/core/netnamespace.c:171 setupnet+0x639/0xa30 net/core/netnamespace.c:349 copynetns+0x319/0x760 net/core/netnamespace.c:470 createnewnamespaces+0x3f6/0xb20 kernel/nsproxy.c:110 unsharensproxynamespaces+0xc1/0x1f0 kernel/nsproxy.c:226 ksysunshare+0x445/0x920 kernel/fork.c:3128 _dosysunshare kernel/fork.c:3202 [inline] _sesysunshare kernel/fork.c:3200 [inline] _x64sysunshare+0x2d/0x40 kernel/fork.c:3200 dosyscallx64 arch/x86/entry/common.c:50 [inline] dosyscall64+0x35/0xb0 arch/x86/entry/common.c:80 entrySYSCALL64afterhwframe+0x44/0xae RIP: 0033:0x7f63da68e739 RSP: 002b:00007f63d7c05188 EFLAGS: 00000246 ORIGRAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00007f63da792f80 RCX: 00007f63da68e739 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000000 RBP: 00007f63da6e8cc4 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f63da792f80 R13: 00007fff50b75d3f R14: 00007f63d7c05300 R15: 0000000000022000
Showing all locks held in the system: 1 lock held by khungtaskd/27: #0: ffffffff8b980020 (rcureadlock){....}-{1:2}, at: debugshowalllocks+0x53/0x260 kernel/locking/lockdep.c:6446 2 locks held by kworker/u4:2/153: #0: ffff888010c69138 ((wqcompletion)eventsunbound){+.+.}-{0:0}, at: archatomic64set arch/x86/include/asm/atomic6464.h:34 [inline] #0: ffff888010c69138 ((wqcompletion)eventsunbound){+.+.}-{0:0}, at: archatomiclongset include/linux/atomic/atomic-long.h:41 [inline] #0: ffff888010c69138 ((wqcompletion)eventsunbound){+.+.}-{0:0}, at: atomiclongset include/linux/atomic/atomic-instrumented.h:1198 [inline] #0: ffff888010c69138 ((wqcompletion)eventsunbound){+.+.}-{0:0}, at: setworkdata kernel/workqueue.c:634 [inline] #0: ffff888010c69138 ((wqcompletion)eventsunbound){+.+.}-{0:0}, at: setworkpoolandclearpending kernel/workqueue.c:661 [inline] #0: ffff888010c69138 ((wqcompletion)eventsunbound){+.+.}-{0:0}, at: processonework+0x896/0x1690 kernel/workqueue.c:2268 #1: ffffc9000140fdb0 ((kfencetimer).work){+.+.}-{0:0}, at: processonework+0x8ca/0x1690 kernel/workqueue.c:2272 1 lock held by systemd-udevd/2970: 1 lock held by in:imklog/6258: #0: ffff88807f970ff0 (&f->fposlock){+.+.}-{3:3}, at: _fdget_pos+0xe9/0x100 fs/file.c:990 3 locks held by kworker/1:6/8158: 1 lock held by syz-executor.0/8312: 2 locks held by kworker/u4:13/9320: 1 lock held by ---truncated---