overlayfs over xfs deadlock

| categories Linux  | tags overlayfs  xfs  deadlock  linux 

Phenomenon

In our DC, some of k8s’s minion nodes are using Ubuntu Linux 5.4 to do the test. We have some containers using the same base image and start script. Sometimes that many processes will come into D state.

Some of the process that come into D state are:

crash> foreach UN ps -m |sort|tail
[0 16:55:33.313] [UN]  PID: 148201  TASK: ffff89610e331ec0  CPU: 33  COMMAND: "checkCustom"
[0 16:55:33.885] [UN]  PID: 147963  TASK: ffff894bd4845c40  CPU: 16  COMMAND: "network_mon"
[0 16:55:33.941] [UN]  PID: 147952  TASK: ffff89667c833d80  CPU: 3   COMMAND: "disk_mon"
[0 16:55:34.831] [UN]  PID: 147702  TASK: ffff896225998000  CPU: 34  COMMAND: "mv"
[0 16:55:36.268] [UN]  PID: 146103  TASK: ffff896bc51bdc40  CPU: 7   COMMAND: "checkCPU"
[0 16:55:36.579] [UN]  PID: 2690   TASK: ffff896e75769ec0  CPU: 10  COMMAND: "containerd"
[0 16:55:37.063] [UN]  PID: 16429  TASK: ffff896e77039ec0  CPU: 39  COMMAND: "runsv"
[0 16:55:37.537] [UN]  PID: 147220  TASK: ffff8968e5305c40  CPU: 32  COMMAND: "mv"
[0 16:55:37.537] [UN]  PID: 147221  TASK: ffff895d140d0000  CPU: 10  COMMAND: "mv"
[0 16:55:37.540] [UN]  PID: 147222  TASK: ffff896278ee3d80  CPU: 26  COMMAND: "mv"

There are containers using the same image, and do mv /etc/security/access.conf /etc/security/access.conf.bak operations after start.

crash> ps -p 147222
PID: 0      TASK: ffffffff86a13780  CPU: 0   COMMAND: "swapper/0"
 PID: 1      TASK: ffff896737dedc40  CPU: 38  COMMAND: "systemd"
  PID: 25742  TASK: ffff895e24405c40  CPU: 13  COMMAND: "containerd"
   PID: 32242  TASK: ffff895337bb1ec0  CPU: 20  COMMAND: "containerd-shim"
    PID: 32317  TASK: ffff895c3b288000  CPU: 27  COMMAND: "run.sh"
     PID: 147222  TASK: ffff896278ee3d80  CPU: 26  COMMAND: "mv"

crash> ps -p 147221
PID: 0      TASK: ffffffff86a13780  CPU: 0   COMMAND: "swapper/0"
 PID: 1      TASK: ffff896737dedc40  CPU: 38  COMMAND: "systemd"
  PID: 17972  TASK: ffff8967372d0000  CPU: 14  COMMAND: "containerd"
   PID: 26289  TASK: ffff896e770d3d80  CPU: 16  COMMAND: "containerd-shim"
    PID: 26333  TASK: ffff8958a7c38000  CPU: 10  COMMAND: "run.sh"
     PID: 147221  TASK: ffff895d140d0000  CPU: 10  COMMAND: "mv"

crash> ps -p 147220
PID: 0      TASK: ffffffff86a13780  CPU: 0   COMMAND: "swapper/0"
 PID: 1      TASK: ffff896737dedc40  CPU: 38  COMMAND: "systemd"
  PID: 3315   TASK: ffff896ad717bd80  CPU: 28  COMMAND: "containerd"
   PID: 38825  TASK: ffff89654b693d80  CPU: 22  COMMAND: "containerd-shim"
    PID: 38950  TASK: ffff895c38c73d80  CPU: 32  COMMAND: "run.sh"
     PID: 147220  TASK: ffff8968e5305c40  CPU: 32  COMMAND: "mv"

crash> ps -p 16429
PID: 0      TASK: ffffffff86a13780  CPU: 0   COMMAND: "swapper/0"
 PID: 1      TASK: ffff896737dedc40  CPU: 38  COMMAND: "systemd"
  PID: 3315   TASK: ffff896ad717bd80  CPU: 28  COMMAND: "containerd"
   PID: 15723  TASK: ffff8958b5d23d80  CPU: 14  COMMAND: "containerd-shim"
    PID: 15747  TASK: ffff896e71c41ec0  CPU: 29  COMMAND: "my_init"
     PID: 16428  TASK: ffff896562fc8000  CPU: 34  COMMAND: "runsvdir"
      PID: 16429  TASK: ffff896e77039ec0  CPU: 39  COMMAND: "runsv"

The stacks are

crash> bt -t 147222
PID: 147222  TASK: ffff896278ee3d80  CPU: 26  COMMAND: "mv"
              START: __schedule at ffffffff85ea4cc6
  [ffff9e33ab0df6c0] __schedule at ffffffff85ea4cc6
  [ffff9e33ab0df700] xfs_buf_find at ffffffffc0457435 [xfs]
  [ffff9e33ab0df720] schedule at ffffffff85ea5103
  [ffff9e33ab0df738] schedule_timeout at ffffffff85ea8a65
  [ffff9e33ab0df798] xfs_buf_find at ffffffffc0457435 [xfs]
  [ffff9e33ab0df7b8] __down at ffffffff85ea7756
  [ffff9e33ab0df810] down at ffffffff85500f91
  [ffff9e33ab0df830] xfs_buf_lock at ffffffffc0456fa7 [xfs]
  [ffff9e33ab0df858] xfs_buf_find at ffffffffc0457435 [xfs]
  [ffff9e33ab0df8e8] xfs_buf_get_map at ffffffffc0457743 [xfs]
  [ffff9e33ab0df938] xfs_buf_read_map at ffffffffc0457fab [xfs]
  [ffff9e33ab0df980] xfs_trans_read_buf_map at ffffffffc04906e9 [xfs]
  [ffff9e33ab0df9c0] xfs_read_agi at ffffffffc0439783 [xfs]
  [ffff9e33ab0dfa20] xfs_iunlink at ffffffffc046ca7b [xfs]
  [ffff9e33ab0dfa30] current_time at ffffffff856f9921
  [ffff9e33ab0dfa80] xfs_droplink at ffffffffc046cc24 [xfs]
  [ffff9e33ab0dfaa0] xfs_rename at ffffffffc087fab2 [livepatch_93597ae8]
  [ffff9e33ab0dfb70] xfs_vn_rename at ffffffffc046a0b3 [xfs]
  [ffff9e33ab0dfbe8] vfs_rename at ffffffff856e9955
  [ffff9e33ab0dfc60] d_lookup at ffffffff856f88da
  [ffff9e33ab0dfca0] ovl_do_rename at ffffffffc0d0e2bc [overlay]
  [ffff9e33ab0dfce0] ovl_rename at ffffffffc0d0f557 [overlay]
  [ffff9e33ab0dfd98] vfs_rename at ffffffff856e9955
  [ffff9e33ab0dfe10] __lookup_hash at ffffffff856e6c94
  [ffff9e33ab0dfe50] do_renameat2 at ffffffff856ed168
  [ffff9e33ab0dff20] __x64_sys_rename at ffffffff856ed3e0
  [ffff9e33ab0dff30] do_syscall_64 at ffffffff85404447
  [ffff9e33ab0dff50] entry_SYSCALL_64_after_hwframe at ffffffff8600008c
    RIP: 00007fdb71618367  RSP: 00007ffce72ad108  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 0000000000008000  RCX: 00007fdb71618367
    RDX: 0000000000000000  RSI: 00007ffce72af1aa  RDI: 00007ffce72af19d
    RBP: 00007ffce72ad4d0   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000640  R11: 0000000000000202  R12: 00007ffce72ad201
    R13: 00007ffce72ad5c0  R14: 00007ffce72af19d  R15: 0000000000000000
    ORIG_RAX: 0000000000000052  CS: 0033  SS: 002b
crash> bt -t  147221
PID: 147221  TASK: ffff895d140d0000  CPU: 10  COMMAND: "mv"
              START: __schedule at ffffffff85ea4cc6
  [ffff9e33ab0af6c0] __schedule at ffffffff85ea4cc6
  [ffff9e33ab0af700] xfs_buf_find at ffffffffc0457435 [xfs]
  [ffff9e33ab0af720] schedule at ffffffff85ea5103
  [ffff9e33ab0af738] schedule_timeout at ffffffff85ea8a65
  [ffff9e33ab0af750] select_task_rq_fair at ffffffff854de761
  [ffff9e33ab0af798] xfs_buf_find at ffffffffc0457435 [xfs]
  [ffff9e33ab0af7b8] __down at ffffffff85ea7756
  [ffff9e33ab0af7d8] arch_haltpoll_enable.cold.25 at ffffffff85477200
  [ffff9e33ab0af810] down at ffffffff85500f91
  [ffff9e33ab0af830] xfs_buf_lock at ffffffffc0456fa7 [xfs]
  [ffff9e33ab0af858] xfs_buf_find at ffffffffc0457435 [xfs]
  [ffff9e33ab0af8e8] xfs_buf_get_map at ffffffffc0457743 [xfs]
  [ffff9e33ab0af938] xfs_buf_read_map at ffffffffc0457fab [xfs]
  [ffff9e33ab0af948] kmem_zone_alloc at ffffffffc047d34c [xfs]
  [ffff9e33ab0af980] xfs_trans_read_buf_map at ffffffffc04906e9 [xfs]
  [ffff9e33ab0af9c0] xfs_read_agi at ffffffffc0439783 [xfs]
  [ffff9e33ab0afa20] xfs_iunlink_remove at ffffffffc046d205 [xfs]
  [ffff9e33ab0afa48] _cond_resched at ffffffff85ea5399
  [ffff9e33ab0afaa0] xfs_rename at ffffffffc087f9e0 [livepatch_93597ae8]
  [ffff9e33ab0afb70] xfs_vn_rename at ffffffffc046a0b3 [xfs]
  [ffff9e33ab0afbe8] vfs_rename at ffffffff856e9955
  [ffff9e33ab0afc60] d_lookup at ffffffff856f88da
  [ffff9e33ab0afca0] ovl_do_rename at ffffffffc0d0e2bc [overlay]
  [ffff9e33ab0afce0] ovl_rename at ffffffffc0d0f557 [overlay]
  [ffff9e33ab0afd98] vfs_rename at ffffffff856e9955
  [ffff9e33ab0afe10] __lookup_hash at ffffffff856e6c94
  [ffff9e33ab0afe50] do_renameat2 at ffffffff856ed168
  [ffff9e33ab0aff20] __x64_sys_rename at ffffffff856ed3e0
  [ffff9e33ab0aff30] do_syscall_64 at ffffffff85404447
  [ffff9e33ab0aff50] entry_SYSCALL_64_after_hwframe at ffffffff8600008c
    RIP: 00007f85457cf367  RSP: 00007ffeb63a9778  RFLAGS: 00000206
    RAX: ffffffffffffffda  RBX: 0000000000008000  RCX: 00007f85457cf367
    RDX: 0000000000000000  RSI: 00007ffeb63ac1ad  RDI: 00007ffeb63ac1a0
    RBP: 00007ffeb63a9b40   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000640  R11: 0000000000000206  R12: 00007ffeb63a9901
    R13: 00007ffeb63a9c30  R14: 00007ffeb63ac1a0  R15: 0000000000000000
    ORIG_RAX: 0000000000000052  CS: 0033  SS: 002b
crash> bt -t 147220
PID: 147220  TASK: ffff8968e5305c40  CPU: 32  COMMAND: "mv"
              START: __schedule at ffffffff85ea4cc6
  [ffff9e33ab0cf6c0] __schedule at ffffffff85ea4cc6
  [ffff9e33ab0cf700] xfs_buf_find at ffffffffc0457435 [xfs]
  [ffff9e33ab0cf720] schedule at ffffffff85ea5103
  [ffff9e33ab0cf738] schedule_timeout at ffffffff85ea8a65
  [ffff9e33ab0cf750] select_task_rq_fair at ffffffff854de761
  [ffff9e33ab0cf798] xfs_buf_find at ffffffffc0457435 [xfs]
  [ffff9e33ab0cf7b8] __down at ffffffff85ea7756
  [ffff9e33ab0cf810] down at ffffffff85500f91
  [ffff9e33ab0cf830] xfs_buf_lock at ffffffffc0456fa7 [xfs]
  [ffff9e33ab0cf858] xfs_buf_find at ffffffffc0457435 [xfs]
  [ffff9e33ab0cf8e8] xfs_buf_get_map at ffffffffc0457743 [xfs]
  [ffff9e33ab0cf938] xfs_buf_read_map at ffffffffc0457fab [xfs]
  [ffff9e33ab0cf980] xfs_trans_read_buf_map at ffffffffc04906e9 [xfs]
  [ffff9e33ab0cf9c0] xfs_read_agi at ffffffffc0439783 [xfs]
  [ffff9e33ab0cfa20] xfs_iunlink at ffffffffc046ca7b [xfs]
  [ffff9e33ab0cfa30] current_time at ffffffff856f9921
  [ffff9e33ab0cfa80] xfs_droplink at ffffffffc046cc24 [xfs]
  [ffff9e33ab0cfaa0] xfs_rename at ffffffffc087fab2 [livepatch_93597ae8]
  [ffff9e33ab0cfb70] xfs_vn_rename at ffffffffc046a0b3 [xfs]
  [ffff9e33ab0cfbe8] vfs_rename at ffffffff856e9955
  [ffff9e33ab0cfc60] d_lookup at ffffffff856f88da
  [ffff9e33ab0cfca0] ovl_do_rename at ffffffffc0d0e2bc [overlay]
  [ffff9e33ab0cfce0] ovl_rename at ffffffffc0d0f557 [overlay]
  [ffff9e33ab0cfd98] vfs_rename at ffffffff856e9955
  [ffff9e33ab0cfe10] __lookup_hash at ffffffff856e6c94
  [ffff9e33ab0cfe50] do_renameat2 at ffffffff856ed168
  [ffff9e33ab0cff20] __x64_sys_rename at ffffffff856ed3e0
  [ffff9e33ab0cff30] do_syscall_64 at ffffffff85404447
  [ffff9e33ab0cff50] entry_SYSCALL_64_after_hwframe at ffffffff8600008c
    RIP: 00007f0c25917367  RSP: 00007fff15e5e8f8  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 0000000000008000  RCX: 00007f0c25917367
    RDX: 0000000000000000  RSI: 00007fff15e60c91  RDI: 00007fff15e60c77
    RBP: 00007fff15e5ecc0   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000640  R11: 0000000000000202  R12: 00007fff15e5ea01
    R13: 00007fff15e5edb0  R14: 00007fff15e60c77  R15: 0000000000000000
    ORIG_RAX: 0000000000000052  CS: 0033  SS: 002b
crash> bt -t 16429
PID: 16429  TASK: ffff896e77039ec0  CPU: 39  COMMAND: "runsv"
              START: __schedule at ffffffff85ea4cc6
  [ffff9e33961d74c0] __schedule at ffffffff85ea4cc6
  [ffff9e33961d7500] xfs_buf_find at ffffffffc0457435 [xfs]
  [ffff9e33961d7520] schedule at ffffffff85ea5103
  [ffff9e33961d7538] schedule_timeout at ffffffff85ea8a65
  [ffff9e33961d7550] _xfs_trans_bjoin at ffffffffc0490328 [xfs]
  [ffff9e33961d7578] xfs_trans_read_buf_map at ffffffffc0490729 [xfs]
  [ffff9e33961d7598] xfs_buf_find at ffffffffc0457435 [xfs]
  [ffff9e33961d75b8] __down at ffffffff85ea7756
  [ffff9e33961d75e0] xfs_btree_lookup_get_block at ffffffffc0420932 [xfs]
  [ffff9e33961d7608] down at ffffffff85500f91
  [ffff9e33961d7628] xfs_buf_lock at ffffffffc0456fa7 [xfs]
  [ffff9e33961d7650] xfs_buf_find at ffffffffc0457435 [xfs]
  [ffff9e33961d76e0] xfs_buf_get_map at ffffffffc0457743 [xfs]
  [ffff9e33961d7730] xfs_buf_read_map at ffffffffc0457fab [xfs]
  [ffff9e33961d7778] xfs_trans_read_buf_map at ffffffffc04906e9 [xfs]
  [ffff9e33961d77b8] xfs_read_agi at ffffffffc0439783 [xfs]
  [ffff9e33961d7818] xfs_ialloc_read_agi at ffffffffc0439854 [xfs]
  [ffff9e33961d7850] xfs_dialloc at ffffffffc043a163 [xfs]
  [ffff9e33961d78c8] xfs_ialloc at ffffffffc046b8ab [xfs]
  [ffff9e33961d7948] xfs_dir_ialloc at ffffffffc046e1b5 [xfs]
  [ffff9e33961d79e0] xfs_create at ffffffffc046e556 [xfs]
  [ffff9e33961d7a78] xfs_generic_create at ffffffffc046b4b6 [xfs]
  [ffff9e33961d7af8] xfs_vn_mknod at ffffffffc046b5a4 [xfs]
  [ffff9e33961d7b08] xfs_vn_create at ffffffffc046b5e3 [xfs]
  [ffff9e33961d7b18] vfs_create at ffffffff856ea530
  [ffff9e33961d7b58] ovl_create_real at ffffffffc0d0ebab [overlay]
  [ffff9e33961d7b88] ovl_create_or_link at ffffffffc0d0fe8e [overlay]
  [ffff9e33961d7bb8] inode_init_always at ffffffff856f8e11
  [ffff9e33961d7be0] alloc_inode at ffffffff856f9226
  [ffff9e33961d7c00] ovl_fill_inode at ffffffffc0d0b99c [overlay]
  [ffff9e33961d7c68] ovl_create_object at ffffffffc0d105bb [overlay]
  [ffff9e33961d7cb8] ovl_create at ffffffffc0d10683 [overlay]
  [ffff9e33961d7cc8] path_openat at ffffffff856ec08f
  [ffff9e33961d7da8] do_filp_open at ffffffff856ed9d3
  [ffff9e33961d7e58] __alloc_fd at ffffffff856fd596
  [ffff9e33961d7eb0] do_sys_open at ffffffff856d6aa8
  [ffff9e33961d7f20] __x64_sys_openat at ffffffff856d6c40
  [ffff9e33961d7f30] do_syscall_64 at ffffffff85404447
  [ffff9e33961d7f50] entry_SYSCALL_64_after_hwframe at ffffffff8600008c
    RIP: 00007f8986361c8e  RSP: 00007ffd58c71540  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 000055a4fc1c27fe  RCX: 00007f8986361c8e
    RDX: 0000000000000a41  RSI: 000055a4fc1c27d6  RDI: 00000000ffffff9c
    RBP: 000055a4fc1c2828   R8: 00007ffd58c71678   R9: 00007ffd58c71670
    R10: 00000000000001a4  R11: 0000000000000246  R12: 000055a4fc1c27d6
    R13: 000055a4fc1c2815  R14: 000055a4fc1c27ec  R15: 000055a4fc3c5100
    ORIG_RAX: 0000000000000101  CS: 0033  SS: 002b
crash> bt -t 2690
PID: 2690   TASK: ffff896e75769ec0  CPU: 10  COMMAND: "containerd"
              START: __schedule at ffffffff85ea4cc6
  [ffff9e334e3e7ae0] __schedule at ffffffff85ea4cc6
  [ffff9e334e3e7b40] schedule at ffffffff85ea5103
  [ffff9e334e3e7b58] rwsem_down_read_slowpath at ffffffff85ea7ed3
  [ffff9e334e3e7bb0] lookup_fast at ffffffff856e6f1c
  [ffff9e334e3e7bd8] xfs_ilock_data_map_shared at ffffffffc046d5b0 [xfs]
  [ffff9e334e3e7bf0] down_read at ffffffff85ea8125
  [ffff9e334e3e7c08] xfs_ilock at ffffffffc046d4d2 [xfs]
  [ffff9e334e3e7c28] xfs_dir_open at ffffffffc045c0a0 [xfs]
  [ffff9e334e3e7c38] xfs_ilock_data_map_shared at ffffffffc046d5b0 [xfs]
  [ffff9e334e3e7c50] xfs_dir_open at ffffffffc045c0d9 [xfs]
  [ffff9e334e3e7c70] xfs_dir_open at ffffffffc045c0a0 [xfs]
  [ffff9e334e3e7c80] do_dentry_open at ffffffff856d4e43
  [ffff9e334e3e7cb8] vfs_open at ffffffff856d678d
  [ffff9e334e3e7cc8] path_openat at ffffffff856eb119
  [ffff9e334e3e7cd8] filename_lookup at ffffffff856ed500
  [ffff9e334e3e7da8] do_filp_open at ffffffff856ed9d3
  [ffff9e334e3e7e58] __alloc_fd at ffffffff856fd596
  [ffff9e334e3e7eb0] do_sys_open at ffffffff856d6aa8
  [ffff9e334e3e7f20] __x64_sys_openat at ffffffff856d6c40
  [ffff9e334e3e7f30] do_syscall_64 at ffffffff85404447
  [ffff9e334e3e7f50] entry_SYSCALL_64_after_hwframe at ffffffff8600008c
    RIP: 00005580fc05c8da  RSP: 000000c422afb540  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00005580fc05c8da
    RDX: 0000000000080000  RSI: 000000c423733bc0  RDI: ffffffffffffff9c
    RBP: 000000c422afb5c0   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000202  R12: ffffffffffffffff
    R13: 000000000000004b  R14: 000000000000004a  R15: 0000000000000055
    ORIG_RAX: 0000000000000101  CS: 0033  SS: 002b

This version of ubuntu kernel already have the patch: xfs: Fix deadlock between AGI and AGF with RENAME_WHITEOUT but not have xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename() So we patched the second patch to our kernel, after some days, this issue still happens. From the stack, there is no AGF and AGI related lock stacks, so it should a different issue with issues these two patches.

Try to reproduce

Use following script to reproduce this issue

#!/bin/bash

SCRATCH_MNT=/mnt
LOAD_FACTOR=1
TIME_FACTOR=1

mkfs.xfs -f /dev/sda
mount /dev/sda $SCRATCH_MNT

mkdir $SCRATCH_MNT/lowerdir
mkdir $SCRATCH_MNT/lowerdir1
mkdir $SCRATCH_MNT/lowerdir/etc
mkdir $SCRATCH_MNT/workers
echo salts > $SCRATCH_MNT/lowerdir/etc/access.conf
touch $SCRATCH_MNT/running

stop_workers() {
test -e $SCRATCH_MNT/running || return
rm -f $SCRATCH_MNT/running

	while [ "$(ls $SCRATCH_MNT/workers/ | wc -l)" -gt 0 ]; do
		wait
	done
}

worker() {
local tag="$1"
local mergedir="$SCRATCH_MNT/merged$tag"
local l="lowerdir=$SCRATCH_MNT/lowerdir:$SCRATCH_MNT/lowerdir1"
local u="upperdir=$SCRATCH_MNT/upperdir$tag"
local w="workdir=$SCRATCH_MNT/workdir$tag"
local i="index=off"

	touch $SCRATCH_MNT/workers/$tag
	while test -e $SCRATCH_MNT/running; do
		rm -rf $SCRATCH_MNT/merged$tag
		rm -rf $SCRATCH_MNT/upperdir$tag
		rm -rf $SCRATCH_MNT/workdir$tag
		mkdir $SCRATCH_MNT/merged$tag
		mkdir $SCRATCH_MNT/workdir$tag
		mkdir $SCRATCH_MNT/upperdir$tag

		mount -t overlay overlay -o "$l,$u,$w,$i" $mergedir
		mv $mergedir/etc/access.conf $mergedir/etc/access.conf.bak
		touch $mergedir/etc/access.conf
		mv $mergedir/etc/access.conf $mergedir/etc/access.conf.bak
		touch $mergedir/etc/access.conf
		umount $mergedir
	done
	rm -f $SCRATCH_MNT/workers/$tag
}

for i in $(seq 0 $((4 + LOAD_FACTOR)) ); do
worker $i &
done

sleep $((30 * TIME_FACTOR))
stop_workers

Even in the newest kernel, this issue still can be reproduced

Check the stack after issue reproduce

 crash> foreach UN ps -m|sort|tail
 [0 00:00:23.271] [UN]  PID: 38033  TASK: ffff908cd4c8ba80  CPU: 20
  COMMAND: "mv"
 [0 00:00:23.332] [UN]  PID: 38029  TASK: ffff908cd6093a80  CPU: 1
 COMMAND: "mv"
 [0 00:00:23.332] [UN]  PID: 38037  TASK: ffff908cd606ba80  CPU: 31
  COMMAND: "touch"
 [0 00:00:23.332] [UN]  PID: 38038  TASK: ffff908cd4c50000  CPU: 24
  COMMAND: "mv"
 [0 00:00:23.333] [UN]  PID: 38032  TASK: ffff908cd5a0ba80  CPU: 33
  COMMAND: "mv"
 [0 00:00:23.333] [UN]  PID: 38035  TASK: ffff908cd5110000  CPU: 8
 COMMAND: "touch"
 [0 00:00:23.336] [UN]  PID: 38040  TASK: ffff908cd4cbba80  CPU: 2
 COMMAND: "touch"
 [0 00:00:23.337] [UN]  PID: 38039  TASK: ffff908cd62cba80  CPU: 35
  COMMAND: "touch"
 [0 00:00:23.338] [UN]  PID: 38030  TASK: ffff908cdbfc0000  CPU: 15
  COMMAND: "mv"
 [0 00:00:23.339] [UN]  PID: 38036  TASK: ffff908cd4ce8000  CPU: 22
  COMMAND: "mv"

The mv process have the same stack like:

 crash> bt
 PID: 38029  TASK: ffff908cd6093a80  CPU: 1   COMMAND: "mv"
  #0 [ffffa848961b3558] __schedule at ffffffff8fa96c09
  #1 [ffffa848961b35e8] schedule at ffffffff8fa97235
  #2 [ffffa848961b3608] schedule_timeout at ffffffff8fa9ccc6
  #3 [ffffa848961b36c8] __down_common at ffffffff8fa9b2b3
  #4 [ffffa848961b3748] __down at ffffffff8fa9b311
  #5 [ffffa848961b3758] down at ffffffff8ed43be1
  #6 [ffffa848961b3778] xfs_buf_lock at ffffffffc06fab78 [xfs]
  #7 [ffffa848961b37a8] xfs_buf_find at ffffffffc06fb160 [xfs]
  #8 [ffffa848961b3850] xfs_buf_get_map at ffffffffc06fbf41 [xfs]
  #9 [ffffa848961b38a0] xfs_buf_read_map at ffffffffc06fcd67 [xfs]
  #10 [ffffa848961b38f0] xfs_trans_read_buf_map at ffffffffc0752496 [xfs]
  #11 [ffffa848961b3938] xfs_read_agi at ffffffffc06cdf02 [xfs]
  #12 [ffffa848961b39a0] xfs_iunlink at ffffffffc071c741 [xfs]
  #13 [ffffa848961b3a08] xfs_droplink at ffffffffc071c9e2 [xfs]
  #14 [ffffa848961b3a30] xfs_rename at ffffffffc07221b7 [xfs]
  #15 [ffffa848961b3b08] xfs_vn_rename at ffffffffc0717ec4 [xfs]
  #16 [ffffa848961b3b80] vfs_rename at ffffffff8eff9d65
  #17 [ffffa848961b3c40] ovl_do_rename at ffffffffc091d177 [overlay]
  #18 [ffffa848961b3c78] ovl_rename at ffffffffc091e885 [overlay]
  #19 [ffffa848961b3d10] vfs_rename at ffffffff8eff9d65
  #20 [ffffa848961b3dd8] do_renameat2 at ffffffff8effcc66
  #21 [ffffa848961b3ea8] __x64_sys_rename at ffffffff8effcdb9
  #22 [ffffa848961b3ec0] do_syscall_64 at ffffffff8ec04c54
  #23 [ffffa848961b3f50] entry_SYSCALL_64_after_hwframe at ffffffff8fc00091
     RIP: 00007f84ed829da7  RSP: 00007ffe6da90508  RFLAGS: 00000202
     RAX: ffffffffffffffda  RBX: 00007ffe6da9093f  RCX: 00007f84ed829da7
     RDX: 0000000000000000  RSI: 00007ffe6da92b94  RDI: 00007ffe6da92b7a
     RBP: 00007ffe6da908e0   R8: 0000000000000001   R9: 0000000000000000
     R10: 0000000000000001  R11: 0000000000000202  R12: 00007ffe6da909c0
     R13: 0000000000000000  R14: 0000000000000000  R15: 00007ffe6da92b7a
     ORIG_RAX: 0000000000000052  CS: 0033  SS: 002b

Check for who hold the lock

From the stack and lock’s defination, I can’t find which process hold this lock, so revert the pr: xfs: remove b_last_holder & associated macros This can help to debug

  • Check for process 38036
 crash> set 38036
     PID: 38036
 COMMAND: "mv"
    TASK: ffff908cd4ce8000  [THREAD_INFO: ffff908cd4ce8000]
     CPU: 22
   STATE: TASK_UNINTERRUPTIBLE


 It is waiting for a lock, which is holding by 38029:

 crash> struct xfs_buf.b_last_holder ffff908f92a0a680
   b_last_holder = 38029
  • Check for process 38029
crash> set 38029
          PID: 38029
      COMMAND: "mv"
         TASK: ffff908cd6093a80  [THREAD_INFO: ffff908cd6093a80]
          CPU: 1
        STATE: TASK_UNINTERRUPTIBLE

 crash> struct xfs_buf.b_last_holder ffff908f993cc780
   b_last_holder = 38030

 It is waiting for a lock, which is holding  by 38030
  • Check for process 38030
 crash> set 38030
          PID: 38030
      COMMAND: "mv"
         TASK: ffff908cdbfc0000  [THREAD_INFO: ffff908cdbfc0000]
          CPU: 15
        STATE: TASK_UNINTERRUPTIBLE

 crash> struct xfs_buf.b_last_holder ffff908f92a0a680
   b_last_holder = 38029

Check for where to get the lock

Use a simple ebpf tools xfs lock trace script to print the stacks when the acquire/free the lock

Lock ffff908f92a0a680 should be held by 38029 in xfs_iunlink_remove() but not released:

   b'xfs_buf_trylock+0x1'
   b'xfs_buf_get_map+0x51'
   b'xfs_buf_read_map+0x47'
   b'xfs_trans_read_buf_map+0xf6'
   b'xfs_read_agi+0xd2'
   b'xfs_iunlink_remove+0x9a'
   b'xfs_rename+0x618'
   b'xfs_vn_rename+0x104'
   b'vfs_rename+0x6e5'
   b'ovl_do_rename+0x47'
   b'ovl_rename+0x5d5'
   b'vfs_rename+0x6e5'
   b'do_renameat2+0x576'
   b'__x64_sys_rename+0x29'
   b'do_syscall_64+0x84'
   b'entry_SYSCALL_64_after_hwframe+0x49'
   b'xfs_buf_trylock+0x1'
   b'xfs_buf_get_map+0x51'
   b'xfs_buf_read_map+0x47'
   b'xfs_trans_read_buf_map+0xf6'
   b'xfs_read_agi+0xd2'
   b'xfs_iunlink_remove+0x9a'
   b'xfs_rename+0x618'
   b'xfs_vn_rename+0x104'
   b'vfs_rename+0x6e5'
   b'ovl_do_rename+0x47'
   b'ovl_rename+0x5d5'
   b'vfs_rename+0x6e5'
   b'do_renameat2+0x576'
   b'__x64_sys_rename+0x29'
   b'do_syscall_64+0x84'
   b'entry_SYSCALL_64_after_hwframe+0x49'

The lock 0xffff908f993cc780 should also be held by 38030 xfs_iunlink_remove() but not released:

   b'xfs_buf_trylock+0x1'
   b'xfs_buf_get_map+0x51'
   b'xfs_buf_read_map+0x47'
   b'xfs_trans_read_buf_map+0xf6'
   b'xfs_read_agi+0xd2'
   b'xfs_iunlink_remove+0x9a'
   b'xfs_rename+0x618'
   b'xfs_vn_rename+0x104'
   b'vfs_rename+0x6e5'
   b'ovl_do_rename+0x47'
   b'ovl_rename+0x5d5'
   b'vfs_rename+0x6e5'
   b'do_renameat2+0x576'
   b'__x64_sys_rename+0x29'
   b'do_syscall_64+0x84'
   b'entry_SYSCALL_64_after_hwframe+0x49'

There is ABBA deadlock in this scenario.

report this issue to the maintainer and get the fix

Details: https://www.spinics.net/lists/linux-xfs/msg47556.html


Prev     Next