CVE-2024-26762

The PCI AER model is an awkward fit for CXL error handling. While the expectation is that a PCI device can escalate to link reset to recover from an AER event, the same reset on CXL amounts to a surprise memory hotplug of massive amounts of memory.

At present, the CXL error handler attempts some optimistic error handling to unbind the device from the cxl_mem driver after reaping some RAS register values. This results in a "hopeful" attempt to unplug the memory, but there is no guarantee that will succeed.

A subsequent AER notification after the memdev unbind event can no longer assume the registers are mapped. Check for memdev bind before reaping status register values to avoid crashes of the form:

BUG: unable to handle page fault for address: ffa00000195e9100 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page [...] RIP: 0010:__cxlhandleras+0x30/0x110 [cxl_core] [...] Call Trace: <TASK> ? __die+0x24/0x70 ? pagefaultoops+0x82/0x160 ? kernelmodefixuporoops+0x84/0x110 ? excpagefault+0x113/0x170 ? asmexcpagefault+0x26/0x30 ? __pfxdpcreset_link+0x10/0x10 ? _cxlhandleras+0x30/0x110 [cxlcore] ? findcxlport+0x59/0x80 [cxlcore] cxlhandlerpras+0xbc/0xd0 [cxlcore] cxlerrordetected+0x6c/0xf0 [cxlcore] reporterrordetected+0xc7/0x1c0 pciwalkbus+0x73/0x90 pciedorecovery+0x23f/0x330

Longer term, the unbind and PCIERSRESULTDISCONNECT behavior might need to be replaced with a new PCIERSRESULTPANIC.

Database specific

{
    "cna_assigner": "Linux",
    "osv_generated_from": "https://github.com/CVEProject/cvelistV5/tree/main/cves/2024/26xxx/CVE-2024-26762.json"
}

References

Affected packages

Git / git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git

Affected ranges

Type: GIT
Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Events: Introduced

6ac07883dbb5f60f7bc56a13b7a84a382aa9c1ab

Fixed

21e5e84f3f63fdf44e49642a6e45cd895e921a84

Fixed

eef5c7b28dbecd6b141987a96db6c54e49828102

Database specific

source

"https://storage.googleapis.com/cve-osv-conversion/osv-output/CVE-2024-26762.json"