🧩 The Scenario
While working on a production build machine, I encountered a critical issue: the workspace_disk3
directory suddenly became inaccessible. Any operation such as ls
or cd
resulted in:
bashCopyEditls: cannot open directory .: Input/output error
This machine was hosting critical builds and scripts, so an immediate solution was required — and preferably without a reboot.
🚨 Initial Observations
Filesystem Mount Verification
bashCopyEditmount | grep workspace_disk3
bashCopyEdit/dev/nvme0n1p3 on /home/build/workspace_disk3 type xfs (rw,_netdev)
The XFS filesystem was still mounted, but clearly malfunctioning.
Disk Layout Confirmation
bashCopyEditlsblk -o NAME,MOUNTPOINT,SIZE,FSTYPE
Helped identify that /dev/nvme0n1p3
was mounted on workspace_disk3
and using the XFS filesystem.
📛 The Red Flags
Checking dmesg
revealed:
nginxCopyEditXFS (nvme0n1p3): xfs_log_force: error -5 returned
The filesystem was failing to flush its journal — a serious issue.
Then, checking space usage:
bashCopyEditdf -h
bashCopyEdit/dev/nvme0n1p3 1.7T 1.7T 20K 100% /home/build/workspace_disk3
The disk was 100% full, likely causing the journaling error.
❌ Diagnostic Tools Failing
Attempts to use diagnostic tools on the path failed due to I/O errors:
bashCopyEditfuser -vm /home/build/workspace_disk3
lsof +D /home/build/workspace_disk3
Both returned:
luaCopyEditInput/output error
This confirmed that the filesystem was too broken to scan from the directory level.
🔍 Switching Focus to the Block Device
Using the device name directly:
bashCopyEditsudo fuser -vm /dev/nvme0n1p3
Success!
bashCopyEdit/dev/nvme0n1p3: build 84984 ..c.. bash
Several shell sessions were holding the mount as their current working directory, blocking clean recovery.
✅ The Fix: Kill the Blocking Processes
bashCopyEditsudo fuser -km /dev/nvme0n1p3
This command forcefully killed all processes using the device. As a result:
- The filesystem recovered on its own
ls
andcd
worked again- I/O errors stopped appearing
- No need for a reboot or
xfs_repair
🧠 Root Cause & Lessons Learned
Issue | Explanation |
---|---|
xfs_log_force: error -5 | Failed journaling due to full disk |
Input/output error | Kernel couldn’t stat the path |
Shell processes (..c.. ) | Held locks on the corrupted mount |
Fix | Kill those processes and allow the kernel to recover |
🧼 Final Recommendation
After fixing the issue, clean up the disk:
bashCopyEditdu -sh /home/build/workspace_disk3/* | sort -hr | head -n 20
Regularly monitor disk usage to avoid this situation.
📌 Conclusion
This incident reminded me of a key DevOps principle: before reaching for a reboot or running risky repair commands, investigate active locks and memory-mapped processes. A simple fuser -km
saved the day — and the uptime.
