In the Linux kernel, the following vulnerability has been resolved:
sockmap: avoid race between sockmapclose and skpsock_put
skpsockget will return NULL if the refcount of psock has gone to 0, which will happen when the last call of skpsockput is done. However, skpsockdrop may not have finished yet, so the close callback will still point to sockmapclose despite psock being NULL.
This can be reproduced with a thread deleting an element from the sock map, while the second one creates a socket, adds it to the map and closes it.
That will trigger the WARNONONCE:
------------[ cut here ]------------ WARNING: CPU: 1 PID: 7220 at net/core/sockmap.c:1701 sockmapclose+0x2a2/0x2d0 net/core/sockmap.c:1701 Modules linked in: CPU: 1 PID: 7220 Comm: syz-executor380 Not tainted 6.9.0-syzkaller-07726-g3c999d1ae3c7 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 RIP: 0010:sockmapclose+0x2a2/0x2d0 net/core/sockmap.c:1701 Code: df e8 92 29 88 f8 48 8b 1b 48 89 d8 48 c1 e8 03 42 80 3c 20 00 74 08 48 89 df e8 79 29 88 f8 4c 8b 23 eb 89 e8 4f 15 23 f8 90 <0f> 0b 90 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d e9 13 26 3d 02 RSP: 0018:ffffc9000441fda8 EFLAGS: 00010293 RAX: ffffffff89731ae1 RBX: ffffffff94b87540 RCX: ffff888029470000 RDX: 0000000000000000 RSI: ffffffff8bcab5c0 RDI: ffffffff8c1faba0 RBP: 0000000000000000 R08: ffffffff92f9b61f R09: 1ffffffff25f36c3 R10: dffffc0000000000 R11: fffffbfff25f36c4 R12: ffffffff89731840 R13: ffff88804b587000 R14: ffff88804b587000 R15: ffffffff89731870 FS: 000055555e080380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000207d4000 CR4: 0000000000350ef0 Call Trace: <TASK> unixrelease+0x87/0xc0 net/unix/afunix.c:1048 _sockrelease net/socket.c:659 [inline] sockclose+0xbe/0x240 net/socket.c:1421 _fput+0x42b/0x8a0 fs/filetable.c:422 _dosysclose fs/open.c:1556 [inline] _sesysclose fs/open.c:1541 [inline] _x64sysclose+0x7f/0x110 fs/open.c:1541 dosyscallx64 arch/x86/entry/common.c:52 [inline] dosyscall64+0xf5/0x240 arch/x86/entry/common.c:83 entrySYSCALL64afterhwframe+0x77/0x7f RIP: 0033:0x7fb37d618070 Code: 00 00 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb d4 e8 10 2c 00 00 80 3d 31 f0 07 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c RSP: 002b:00007ffcd4a525d8 EFLAGS: 00000202 ORIGRAX: 0000000000000003 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fb37d618070 RDX: 0000000000000010 RSI: 00000000200001c0 RDI: 0000000000000004 RBP: 0000000000000000 R08: 0000000100000000 R09: 0000000100000000 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK>
Use skpsock, which will only check that the pointer is not been set to NULL yet, which should only happen after the callbacks are restored. If, then, a reference can still be gotten, we may call skpsock_stop and cancel psock->work.
As suggested by Paolo Abeni, reorder the condition so the control flow is less convoluted.
After that change, the reproducer does not trigger the WARNONONCE anymore.