In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Currently amdgpu calls drmschedfini() from the fence driver sw fini routine - such function is expected to be called only after the respective init function - drmschedinit() - was executed successfully.
Happens that we faced a driver probe failure in the Steam Deck recently, and the function drmschedfini() was called even without its counter-part had been previously called, causing the following oops:
amdgpu: probe of 0000:04:00.0 failed with error -110 BUG: kernel NULL pointer dereference, address: 0000000000000090 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 RIP: 0010:drmschedfini+0x84/0xa0 [gpusched] [...] Call Trace: <TASK> amdgpufencedriverswfini+0xc8/0xd0 [amdgpu] amdgpudevicefinisw+0x2b/0x3b0 [amdgpu] amdgpudriverreleasekms+0x16/0x30 [amdgpu] devmdrmdevinit_release+0x49/0x70 [...]
To prevent that, check if the drm_sched was properly initialized for a given ring before calling its fini counter-part.
Notice ideally we'd use sched.ready for that; such field is set as the latest thing on drmschedinit(). But amdgpu seems to "override" the meaning of such field - in the above oops for example, it was a GFX ring causing the crash, and the sched.ready field was set to true in the ring init routine, regardless of the state of the DRM scheduler. Hence, we ended-up using sched.ops as per Christian's suggestion [0], and also removed the no_scheduler check [1].
[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/ [1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/