In the Linux kernel, the following vulnerability has been resolved:
mm/kmemleak: avoid scanning potential huge holes
When using devmrequestfreememregion() and devmmemremappages() to add ZONEDEVICE memory, if requested free mem region's end pfn were huge(e.g., 0x400000000), the nodeendpfn() will be also huge (see movepfnrangetozone()). Thus it creates a huge hole between nodestartpfn() and nodeend_pfn().
We found on some AMD APUs, amdkfd requested such a free mem region and created a huge hole. In such a case, following code snippet was just doing busy test_bit() looping on the huge hole.
for (pfn = startpfn; pfn < endpfn; pfn++) { struct page *page = pfntoonline_page(pfn); if (!page) continue; ... }
So we got a soft lockup:
watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [bash:1221] CPU: 6 PID: 1221 Comm: bash Not tainted 5.15.0-custom #1 RIP: 0010:pfntoonlinepage+0x5/0xd0 Call Trace: ? kmemleakscan+0x16a/0x440 kmemleakwrite+0x306/0x3a0 ? commonfileperm+0x72/0x170 fullproxywrite+0x5c/0x90 vfswrite+0xb9/0x260 ksyswrite+0x67/0xe0 _x64syswrite+0x1a/0x20 dosyscall64+0x3b/0xc0 entrySYSCALL64afterhwframe+0x44/0xae
I did some tests with the patch.
(1) amdgpu module unloaded
before the patch:
real 0m0.976s user 0m0.000s sys 0m0.968s
after the patch:
real 0m0.981s user 0m0.000s sys 0m0.973s
(2) amdgpu module loaded
before the patch:
real 0m35.365s user 0m0.000s sys 0m35.354s
after the patch:
real 0m1.049s user 0m0.000s sys 0m1.042s