🔧 Resolving XFS Filesystem “Input/Output Error” on a Critical Production Server

🧩 The Scenario

While working on a production build machine, I encountered a critical issue: the workspace_disk3 directory suddenly became inaccessible. Any operation such as ls or cd resulted in:

bashCopyEditls: cannot open directory .: Input/output error

This machine was hosting critical builds and scripts, so an immediate solution was required — and preferably without a reboot.


🚨 Initial Observations

Filesystem Mount Verification

bashCopyEditmount | grep workspace_disk3
bashCopyEdit/dev/nvme0n1p3 on /home/build/workspace_disk3 type xfs (rw,_netdev)

The XFS filesystem was still mounted, but clearly malfunctioning.

Disk Layout Confirmation

bashCopyEditlsblk -o NAME,MOUNTPOINT,SIZE,FSTYPE

Helped identify that /dev/nvme0n1p3 was mounted on workspace_disk3 and using the XFS filesystem.


📛 The Red Flags

Checking dmesg revealed:

nginxCopyEditXFS (nvme0n1p3): xfs_log_force: error -5 returned

The filesystem was failing to flush its journal — a serious issue.
Then, checking space usage:

bashCopyEditdf -h
bashCopyEdit/dev/nvme0n1p3  1.7T  1.7T  20K  100% /home/build/workspace_disk3

The disk was 100% full, likely causing the journaling error.


❌ Diagnostic Tools Failing

Attempts to use diagnostic tools on the path failed due to I/O errors:

bashCopyEditfuser -vm /home/build/workspace_disk3
lsof +D /home/build/workspace_disk3

Both returned:

luaCopyEditInput/output error

This confirmed that the filesystem was too broken to scan from the directory level.


🔍 Switching Focus to the Block Device

Using the device name directly:

bashCopyEditsudo fuser -vm /dev/nvme0n1p3

Success!

bashCopyEdit/dev/nvme0n1p3:  build 84984 ..c.. bash

Several shell sessions were holding the mount as their current working directory, blocking clean recovery.


✅ The Fix: Kill the Blocking Processes

bashCopyEditsudo fuser -km /dev/nvme0n1p3

This command forcefully killed all processes using the device. As a result:

  • The filesystem recovered on its own
  • ls and cd worked again
  • I/O errors stopped appearing
  • No need for a reboot or xfs_repair

🧠 Root Cause & Lessons Learned

IssueExplanation
xfs_log_force: error -5Failed journaling due to full disk
Input/output errorKernel couldn’t stat the path
Shell processes (..c..)Held locks on the corrupted mount
FixKill those processes and allow the kernel to recover

🧼 Final Recommendation

After fixing the issue, clean up the disk:

bashCopyEditdu -sh /home/build/workspace_disk3/* | sort -hr | head -n 20

Regularly monitor disk usage to avoid this situation.


📌 Conclusion

This incident reminded me of a key DevOps principle: before reaching for a reboot or running risky repair commands, investigate active locks and memory-mapped processes. A simple fuser -km saved the day — and the uptime.