December 2, 20241 yr My cache drive keeps going into read-onlu mode, if I reboot it comes back as read-write, then if I start any ddocker containers it goes to read-only, I moved the NVMe from one slot to another but it keeps happening. Diags attached. Thanks in advance for any help. nebula-diagnostics-20241202-2236.zip
December 3, 20241 yr Community Expert this sounds like the dirve is overheating and protecting it self.. The issue you're describing—where the cache drive becomes read-only—often points to one of the following root causes: Possible Causes Filesystem Corruption The cache drive's filesystem might have errors, causing Unraid to remount it as read-only to prevent further damage. Hardware Issues Faulty NVMe drive. Problems with the motherboard or specific M.2 slots. Overheating of the NVMe drive under load. Insufficient Power or Driver Conflicts Power supply issues affecting the NVMe drive. Driver conflicts, especially with specific NVMe models. Docker I/O Overload Excessive I/O from Docker containers could expose or exacerbate existing filesystem or hardware issues. Steps to Diagnose and Resolve 1. Check Filesystem for Errors Run a filesystem check on the cache drive: Stop the array. Navigate to Main > Cache Drive > Check Filesystem. If errors are detected, allow it to repair the filesystem. 2. Monitor SMART Data Go to Main > Cache Drive > SMART Report. Look for: High reallocated sectors. Pending sectors. Errors indicating potential drive failure. If the SMART report shows significant issues, consider replacing the NVMe drive. 3. Test NVMe in Another System or Slot You’ve already moved the NVMe to a different slot. If the issue persists: Test the NVMe drive in another system to rule out motherboard or slot-specific problems. Alternatively, try a different NVMe drive in the same slot. 4. Review Logs for Errors Check Unraid's logs (attached diagnostics should help): Look for I/O errors, warnings, or messages indicating the cause of the drive remounting as read-only. Common errors: I/O Error: Hardware or connection issue. EXT4-fs error or BTRFS error: Filesystem problem. If the logs indicate consistent errors tied to Docker, it might suggest an I/O overload or corrupted image file. 5. Check Docker Image If Docker containers trigger the issue: Stop Docker services (Settings > Docker > Disable Docker). Delete the current Docker image file (it could be corrupt). Recreate the Docker image and containers. 6. Monitor NVMe Temperature High temperatures during Docker load could cause instability. Use Main > Cache Drive > SMART Report to check temperatures. Consider adding cooling (e.g., heatsinks, fans) if the drive runs hot. 7. Update BIOS/Firmware Ensure the motherboard BIOS and NVMe drive firmware are up-to-date. Sometimes, compatibility or bugs can cause instability. 8. Recreate Cache Pool (As a Last Resort) Backup all critical data from the cache drive. Stop the array and reformat the cache drive. Add it back to the cache pool and restore your data. Next Steps Start by reviewing the diagnostics you attached to identify errors in the logs. Prioritize checking SMART data and filesystem integrity. If the issue persists, and you’d like me to help analyze specific logs, share details from the diagnostics (key error messages or hardware info).
December 3, 20241 yr Community Expert The filesystem is going read only due to filesystem corruption, btrfs is also detecting data corruption, suggest by starting with memtest, since this is typically caused by bad RAM.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.