In the Linux kernel, the following vulnerability has been resolved:
x86/mm: Eliminate window where TLB flushes may be inadvertently skipped
tl;dr: There is a window in the mm switching code where the new CR3 is set and the CPU should be getting TLB flushes for the new mm. But shouldflushtlb() has a bug and suppresses the flush. Fix it by widening the window where shouldflushtlb() sends an IPI.
Long Version:
=== History ===
There were a few things leading up to this.
First, updating mmcpumask() was observed to be too expensive, so it was made lazier. But being lazy caused too many unnecessary IPIs to CPUs due to the now-lazy mmcpumask(). So code was added to cull mm_cpumask() periodically[2]. But that culling was a bit too aggressive and skipped sending TLB flushes to CPUs that need them. So here we are again.
=== Problem ===
The too-aggressive code in shouldflushtlb() strikes in this window:
// Turn on IPIs for this CPU/mm combination, but only
// if should_flush_tlb() agrees:
cpumask_set_cpu(cpu, mm_cpumask(next));
next_tlb_gen = atomic64_read(&next->context.tlb_gen);
choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);
load_new_mm_cr3(need_flush);
// ^ After 'need_flush' is set to false, IPIs *MUST*
// be sent to this CPU and not be ignored.
this_cpu_write(cpu_tlbstate.loaded_mm, next);
// ^ Not until this point does should_flush_tlb()
// become true!
shouldflushtlb() will suppress TLB flushes between loadnewmmcr3() and writing to 'loadedmm', which is a window where they should not be suppressed. Whoops.
=== Solution ===
Thankfully, the fuzzy "just about to write CR3" window is already marked with loadedmm==LOADEDMMSWITCHING. Simply checking for that state in shouldflush_tlb() is sufficient to ensure that the CPU is targeted with an IPI.
This will cause more TLB flush IPIs. But the window is relatively small and I do not expect this to cause any kind of measurable performance impact.
Update the comment where LOADEDMMSWITCHING is written since it grew yet another user.
Peter Z also raised a concern that shouldflushtlb() might not observe 'loadedmm' and 'islazy' in the same order that switchmmirqs_off() writes them. Add a barrier to ensure that they are observed in the order they are written.