Shu Posted August 31, 2022 Share Posted August 31, 2022 (edited) Hi Everyone, I woke up this morning to these errors on my nvme cache drive (my cache pool is two drives, one m.2 nvme and the other a sata ssd): Aug 31 03:51:39 520unraid kernel: BTRFS critical (device nvme0n1p1): corrupt leaf: root=10 block=820606861312 slot=118, unexpected item end, have 3543154555 expect 16251 Aug 31 03:51:39 520unraid kernel: BTRFS info (device nvme0n1p1): leaf 820606861312 gen 46836 total ptrs 258 free space 9769 owner 10 Aug 31 03:51:39 520unraid kernel: BTRFS error (device nvme0n1p1): block=820606861312 write time tree block corruption detected Aug 31 03:51:39 520unraid kernel: BTRFS: error (device nvme0n1p1) in btrfs_commit_transaction:2438: errno=-5 IO failure (Error while writing out transaction) Aug 31 03:51:39 520unraid kernel: BTRFS info (device nvme0n1p1): forced readonly Aug 31 03:51:39 520unraid kernel: BTRFS warning (device nvme0n1p1): Skipping commit of aborted transaction. Aug 31 03:51:39 520unraid kernel: BTRFS: error (device nvme0n1p1) in cleanup_transaction:2011: errno=-5 IO failure I also attached 3 screenshots (0 being the disk info, 1 & 2 being syslog relevant to this issue (a lot of item xxx key ... itemoff... errors)). While researching this issue, I ran a Check-File System for this drive but it came back clean (picture 3) and a Scrub check which also came back clean (picture 4). I do believe something is wrong however, because I when I ran the mover, it came back with a Read-Only filesystem error on all files (they're mostly media files from qBit from last night so not much unrecoverable, I believe - though, I haven't checked for files that are kept on the cache pool...which are probably more vital...) What is the proper guidance here? (Or which guide/thread should I try next?) Edit 9:56am local: I re-ran the mover and it does appear to be moving files albeit at a slow pace (~25mb/s per drive, so ~50mb/s for the pool). I would expect much faster reads for these fast drives (they are writing to a 4tb wd blue which is less than half full, I'd expect at least a starting speed around 120mb/s) Edited August 31, 2022 by Shu Additional information Quote Link to comment
JorgeB Posted August 31, 2022 Share Posted August 31, 2022 6 minutes ago, Shu said: write time tree block corruption detected This usually means bad RAM or other kernel corruption, start by running memtest. Quote Link to comment
Shu Posted August 31, 2022 Author Share Posted August 31, 2022 9 minutes ago, JorgeB said: This usually means bad RAM or other kernel corruption, start by running memtest. Will run a test. Will update when it finishes Quote Link to comment
Shu Posted August 31, 2022 Author Share Posted August 31, 2022 (edited) 2 hours ago, JorgeB said: This usually means bad RAM or other kernel corruption, start by running memtest. I haven't had a chance to run a test yet, unfortunately. But I did run a memtest on August 4th of this month and it passed (not 100% confident I ran the right test, though). I attached a photo of those tests. Edited August 31, 2022 by Shu Quote Link to comment
JorgeB Posted August 31, 2022 Share Posted August 31, 2022 You should run another one, also a good idea to post the full diags to see if there are known hardware issues. Quote Link to comment
Shu Posted August 31, 2022 Author Share Posted August 31, 2022 1 hour ago, JorgeB said: You should run another one, also a good idea to post the full diags to see if there are known hardware issues. It's running right now. I expect it to take about 3 more hours (Pass 1 is done with no errors at just short of an hour, on Pass 2 of 4 now). Since I had already shutdown my server, will I be able to recover the diagnoses file on reboot? Or were they cleared....? sorry about that Quote Link to comment
trurl Posted August 31, 2022 Share Posted August 31, 2022 Diagnostics after reboot won't tell us anything about what happened in the past since syslog resets, but it will tell us how things are now including your hardware. Quote Link to comment
Shu Posted August 31, 2022 Author Share Posted August 31, 2022 (edited) 23 minutes ago, trurl said: Diagnostics after reboot won't tell us anything about what happened in the past since syslog resets, but it will tell us how things are now including your hardware. I will post my diagnostics once the test is done and I boot back into unraid. On pass 3 of 4 now (going faster than I expected initially) Edited August 31, 2022 by Shu Quote Link to comment
Shu Posted August 31, 2022 Author Share Posted August 31, 2022 3 hours ago, JorgeB said: You should run another one, also a good idea to post the full diags to see if there are known hardware issues. Okay, 3 hours in and it just finished - all passed & no errors. I also booted into unraid and downloaded the diags. Attached here. 520unraid-diagnostics-20220831-1647.zip Quote Link to comment
JorgeB Posted September 1, 2022 Share Posted September 1, 2022 Still pretty convinced that was from a hardware issue, but keep sing the server normally and see if there are more issues. Quote Link to comment
Shu Posted September 1, 2022 Author Share Posted September 1, 2022 I'll revive the thread with a reply if it comes up again. I did notice my plex docker isn't showing the webui anymore - could a coincidence or related to app data being on cache drives (I'm going to change my cache pool to a raid 1 with two nvme's once it comes in tomorrow - for more redundancy) Quote Link to comment
JorgeB Posted September 1, 2022 Share Posted September 1, 2022 23 minutes ago, Shu said: (I'm going to change my cache pool to a raid 1 with two nvme's once it comes in tomorrow - for more redundancy) That's fine in case a device fails/drops but raid1 won't help with this type of issue. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.