In the Linux kernel, the following vulnerability has been resolved:
mm/memory-failure: fix deadlock when hugetlboptimizevmemmap is enabled
When I did hard offline test with hugetlb pages, below deadlock occurs:
====================================================== WARNING: possible circular locking dependency detected
bash/46904 is trying to acquire lock: ffffffffabe68910 (cpuhotpluglock){++++}-{0:0}, at: statickeyslow_dec+0x16/0x60
but task is already holding lock: ffffffffabf92ea8 (pcpbatchhighlock){+.+.}-{3:3}, at: zonepcp_disable+0x16/0x40
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (pcpbatchhighlock){+.+.}-{3:3}: _mutexlock+0x6c/0x770 pagealloccpuonline+0x3c/0x70 cpuhpinvokecallback+0x397/0x5f0 _cpuhpinvokecallbackrange+0x71/0xe0 cpuup+0xeb/0x210 cpuup+0x91/0xe0 cpuhpbringupmask+0x49/0xb0 bringupnonbootcpus+0xb7/0xe0 smpinit+0x25/0xa0 kernelinitfreeable+0x15f/0x3e0 kernelinit+0x15/0x1b0 retfromfork+0x2f/0x50 retfromforkasm+0x1a/0x30
-> #0 (cpuhotpluglock){++++}-{0:0}: _lockacquire+0x1298/0x1cd0 lockacquire+0xc0/0x2b0 cpusreadlock+0x2a/0xc0 statickeyslowdec+0x16/0x60 _hugetlbvmemmaprestorefolio+0x1b9/0x200 dissolvefreehugepage+0x211/0x260 _pagehandlepoison+0x45/0xc0 memoryfailure+0x65e/0xc70 hardofflinepagestore+0x55/0xa0 kernfsfopwriteiter+0x12c/0x1d0 vfswrite+0x387/0x550 ksyswrite+0x64/0xe0 dosyscall64+0xca/0x1e0 entrySYSCALL64after_hwframe+0x6d/0x75
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(pcpbatchhighlock); lock(cpuhotpluglock); lock(pcpbatchhighlock); rlock(cpuhotpluglock);
* DEADLOCK *
5 locks held by bash/46904: #0: ffff98f6c3bb23f0 (sbwriters#5){.+.+}-{0:0}, at: ksyswrite+0x64/0xe0 #1: ffff98f6c328e488 (&of->mutex){+.+.}-{3:3}, at: kernfsfopwriteiter+0xf8/0x1d0 #2: ffff98ef83b31890 (kn->active#113){.+.+}-{0:0}, at: kernfsfopwriteiter+0x100/0x1d0 #3: ffffffffabf9db48 (mfmutex){+.+.}-{3:3}, at: memoryfailure+0x44/0xc70 #4: ffffffffabf92ea8 (pcpbatchhighlock){+.+.}-{3:3}, at: zonepcp_disable+0x16/0x40
stack backtrace: CPU: 10 PID: 46904 Comm: bash Kdump: loaded Not tainted 6.8.0-11409-gf6cef5f8c37f #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dumpstacklvl+0x68/0xa0 checknoncircular+0x129/0x140 _lockacquire+0x1298/0x1cd0 lockacquire+0xc0/0x2b0 cpusreadlock+0x2a/0xc0 statickeyslowdec+0x16/0x60 _hugetlbvmemmaprestorefolio+0x1b9/0x200 dissolvefreehugepage+0x211/0x260 _pagehandlepoison+0x45/0xc0 memoryfailure+0x65e/0xc70 hardofflinepagestore+0x55/0xa0 kernfsfopwriteiter+0x12c/0x1d0 vfswrite+0x387/0x550 ksyswrite+0x64/0xe0 dosyscall64+0xca/0x1e0 entrySYSCALL64afterhwframe+0x6d/0x75 RIP: 0033:0x7fc862314887 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007fff19311268 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc862314887 RDX: 000000000000000c RSI: 000056405645fe10 RDI: 0000000000000001 RBP: 000056405645fe10 R08: 00007fc8623d1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007fc86241b780 R14: 00007fc862417600 R15: 00007fc862416a00
In short, below scene breaks the ---truncated---