dacmcbibs Posted March 25, 2018 Share Posted March 25, 2018 every few weeks i run into an issue with my cache drives where it ends up taking the system off line. The disk log shows the following: ar 24 10:21:42 TheBeast kernel: BTRFS error (device nvme1n1p1): unable to find ref byte nr 505982324736 parent 0 root 5 owner 5166944 offset 0Mar 24 10:21:42 TheBeast kernel: BTRFS: error (device nvme1n1p1) in __btrfs_free_extent:7073: errno=-2 No such entryMar 24 10:21:42 TheBeast kernel: BTRFS info (device nvme1n1p1): forced readonlyMar 24 10:21:42 TheBeast kernel: BTRFS: error (device nvme1n1p1) in btrfs_run_delayed_refs:3089: errno=-2 No such entryMar 24 10:21:42 TheBeast kernel: BTRFS error (device nvme1n1p1): pending csums is 8060928 I also see BTRFS errors in the syslog around the same time. I've repaired and even completely re-created the cache. I have 2 drives in the cache pool both Samsung SSD 960 256 GB drives, every few weeks the issue reoccurs. What information can i share to help diagnose why this happens and how to avoid it from occurring again? Quote Link to comment
JorgeB Posted March 25, 2018 Share Posted March 25, 2018 Please post your diagnostics and the output of: btrfs dev stats /mnt/cache Quote Link to comment
dacmcbibs Posted March 25, 2018 Author Share Posted March 25, 2018 Thanks for the quick reply, here is the output from the dev stats command: [/dev/nvme1n1p1].write_io_errs 0 [/dev/nvme1n1p1].read_io_errs 0 [/dev/nvme1n1p1].flush_io_errs 0 [/dev/nvme1n1p1].corruption_errs 159 [/dev/nvme1n1p1].generation_errs 0 [/dev/nvme0n1p1].write_io_errs 0 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 0 [/dev/nvme0n1p1].corruption_errs 159 [/dev/nvme0n1p1].generation_errs 0 Quote Link to comment
JorgeB Posted March 25, 2018 Share Posted March 25, 2018 Please also post your diagnostics: Tools -> Diagnostics Same corruption errors on both devices would suggest RAM issues, run memtest if you haven't yet. Quote Link to comment
dacmcbibs Posted March 25, 2018 Author Share Posted March 25, 2018 Attached are the diagnostic logs requested.. about to run a memtest unraid-diagnostics-20180325-1541.zip Quote Link to comment
dacmcbibs Posted March 25, 2018 Author Share Posted March 25, 2018 FYI - Unraid's build in memtest would not boot, but I created bootable media for memtest and it came back clean Quote Link to comment
JorgeB Posted March 25, 2018 Share Posted March 25, 2018 You need to let it run for some time, ideally 24 hours, and even if there are no errors it's not conclusive, it's only conclusive if there are. if it's not memory it's likely another hardware issue, also look for a bios update. Quote Link to comment
dacmcbibs Posted April 26, 2018 Author Share Posted April 26, 2018 Thanks for the pointer. It took several 24 hour memtests before it actually occurred but i did finally find a bad memory DIMM I've replaced the faulty memory and ran a clean 24 hour memtest. Hopefully the system will be stable now. I appreciate the help! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.