In the Linux kernel, the following vulnerability has been resolved:
powerpc/pseries/iommu: IOMMU table is not initialized for kdump over SR-IOV
When kdump kernel tries to copy dump data over SR-IOV, LPAR panics due to NULL pointer exception:
Kernel attempted to read user page (0) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc000000020847ad4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGESIZE=64K MMU=Radix SMP NRCPUS=2048 NUMA pSeries Modules linked in: mlx5core(+) vmxcrypto pserieswdt paprscm libnvdimm mlxfw tls psample sunrpc fuse overlay squashfs loop CPU: 12 PID: 315 Comm: systemd-udevd Not tainted 6.4.0-Test102+ #12 Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060008) hv:phyp pSeries NIP: c000000020847ad4 LR: c00000002083b2dc CTR: 00000000006cd18c REGS: c000000029162ca0 TRAP: 0300 Not tainted (6.4.0-Test102+) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 48288244 XER: 00000008 CFAR: c00000002083b2d8 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 ... NIP _findnextzerobit+0x24/0x110 LR bitmapfindnextzeroareaoff+0x5c/0xe0 Call Trace: devprintkemit+0x38/0x48 (unreliable) iommuareaalloc+0xc4/0x180 iommurangealloc+0x1e8/0x580 iommualloc+0x60/0x130 iommualloccoherent+0x158/0x2b0 dmaiommualloccoherent+0x3c/0x50 dmaallocattrs+0x170/0x1f0 mlx5cmdinit+0xc0/0x760 [mlx5core] mlx5functionsetup+0xf0/0x510 [mlx5core] mlx5initone+0x84/0x210 [mlx5core] probeone+0x118/0x2c0 [mlx5core] localpciprobe+0x68/0x110 pcicallprobe+0x68/0x200 pcideviceprobe+0xbc/0x1a0 reallyprobe+0x104/0x540 _driverprobedevice+0xb4/0x230 driverprobedevice+0x54/0x130 _driverattach+0x158/0x2b0 busforeachdev+0xa8/0x130 driverattach+0x34/0x50 busadddriver+0x16c/0x300 driverregister+0xa4/0x1b0 _pciregisterdriver+0x68/0x80 mlx5init+0xb8/0x100 [mlx5core] dooneinitcall+0x60/0x300 doinitmodule+0x7c/0x2b0
At the time of LPAR dump, before kexec hands over control to kdump kernel, DDWs (Dynamic DMA Windows) are scanned and added to the FDT. For the SR-IOV case, default DMA window "ibm,dma-window" is removed from the FDT and DDW added, for the device.
Now, kexec hands over control to the kdump kernel.
When the kdump kernel initializes, PCI busses are scanned and IOMMU group/tables created, in pcidmabussetuppSeriesLP(). For the SR-IOV case, there is no "ibm,dma-window". The original commit: b1fc44eaa9ba, fixes the path where memory is pre-mapped (direct mapped) to the DDW. When TCEs are direct mapped, there is no need to initialize IOMMU tables.
iommutablesetparmslpar() only considers "ibm,dma-window" property when initiallizing IOMMU table. In the scenario where TCEs are dynamically allocated for SR-IOV, newly created IOMMU table is not initialized. Later, when the device driver tries to enter TCEs for the SR-IOV device, NULL pointer execption is thrown from iommuarea_alloc().
The fix is to initialize the IOMMU table with DDW property stored in the FDT. There are 2 points to remember:
1. For the dedicated adapter, kdump kernel would encounter both
default and DDW in FDT. In this case, DDW property is used to
initialize the IOMMU table.
2. A DDW could be direct or dynamic mapped. kdump kernel would
initialize IOMMU table and mark the existing DDW as
"dynamic". This works fine since, at the time of table
initialization, iommu_table_clear() makes some space in the
DDW, for some predefined number of TCEs which are needed for
kdump to succeed.