In the Linux kernel, the following vulnerability has been resolved: btrfs: fix a race between renames and directory logging We have a race between a rename and directory inode logging that if it happens and we crash/power fail before the rename completes, the next time the filesystem is mounted, the log replay code will end up deleting the file that was being renamed. This is best explained following a step by step analysis of an interleaving of steps that lead into this situation. Consider the initial conditions: 1) We are at transaction N; 2) We have directories A and B created in a past transaction (< N); 3) We have inode X corresponding to a file that has 2 hardlinks, one in directory A and the other in directory B, so we'll name them as "A/foolink1" and "B/foolink2". Both hard links were persisted in a past transaction (< N); 4) We have inode Y corresponding to a file that as a single hard link and is located in directory A, we'll name it as "A/bar". This file was also persisted in a past transaction (< N). The steps leading to a file loss are the following and for all of them we are under transaction N: 1) Link "A/foolink1" is removed, so inode's X lastunlinktrans field is updated to N, through btrfsunlink() -> btrfsrecordunlinkdir(); 2) Task A starts a rename for inode Y, with the goal of renaming from "A/bar" to "A/baz", so we enter btrfsrename(); 3) Task A inserts the new BTRFSINODEREFKEY for inode Y by calling btrfsinsertinoderef(); 4) Because the rename happens in the same directory, we don't set the lastunlinktrans field of directoty A's inode to the current transaction id, that is, we don't cal btrfsrecordunlinkdir(); 5) Task A then removes the entries from directory A (BTRFSDIRITEMKEY and BTRFSDIRINDEXKEY items) when calling _btrfsunlinkinode() (actually the dir index item is added as a delayed item, but the effect is the same); 6) Now before task A adds the new entry "A/baz" to directory A by calling btrfsaddlink(), another task, task B is logging inode X; 7) Task B starts a fsync of inode X and after logging inode X, at btrfsloginodeparent() it calls btrfslogallparents(), since inode X has a lastunlinktrans value of N, set at in step 1; 8) At btrfslogallparents() we search for all parent directories of inode X using the commit root, so we find directories A and B and log them. Bu when logging direct A, we don't have a dir index item for inode Y anymore, neither the old name "A/bar" nor for the new name "A/baz" since the rename has deleted the old name but has not yet inserted the new name - task A hasn't called yet btrfsaddlink() to do that. Note that logging directory A doesn't fallback to a transaction commit because its lastunlinktrans has a lower value than the current transaction's id (see step 4); 9) Task B finishes logging directories A and B and gets back to btrfssyncfile() where it calls btrfssynclog() to persist the log tree; 10) Task B successfully persisted the log tree, btrfssynclog() completed with success, and a power failure happened. We have a log tree without any directory entry for inode Y, so the log replay code deletes the entry for inode Y, name "A/bar", from the subvolume tree since it doesn't exist in the log tree and the log tree is authorative for its index (we logged a BTRFSDIRLOGINDEXKEY item that covers the index range for the dentry that corresponds to "A/bar"). Since there's no other hard link for inode Y and the log replay code deletes the name "A/bar", the file is lost. The issue wouldn't happen if task B synced the log only after task A called btrfslognewname(), which would update the log with the new name for inode Y ("A/bar"). Fix this by pinning the log root during renames before removing the old directory entry, and unpinning af ---truncated---