Teg Posted May 7, 2020 Posted May 7, 2020 (edited) My beloved unraid server "Akira" is sick and I’ve been pulling my hair out for days. I'm running Unraid v6.8.3, it's 14 WD 2TB Green drives plus two parity drives and a WD Black cache drive. 99.9% of the cycles are plex server via docker, 99.9% of the content is plex media. Diagnostic zip attached. Stuck at home during the covid pandemic I decided to try setting up a time machine share to backup a MacBook. I had previously tried it on AFP and only experienced failure and given up, but I had read that time machine backups worked well on the newer versions of unraid now that it migrated to SMB so I was excited to try again. I created a new share, locked it to a single empty drive (Drive 1), disabled cache and started my backup, and it was working. I was happy. I mention this not because I think it’s the problem, but because I think that was my previous mistake and maybe that info will help someone. The problem started during the backing up process, I noticed a smart alert on disk 7 indicating the drive was starting to fail. I wasn’t worried about it as the backup was to drive 1 and I have double parity. My unraid server has been running in one form or another for an entire decade, so I am confident I've replaced all of the drives at least once, and was used to it. I ordered another one and slapped it in for a rebuild after the backup to Drive 1 finished… but that’s when everything went sour. During the rebuild of Disk 7, Disk 3 suddenly vanished and showed up as “Unmountable: No file system”… which I’d never seen before. There were no warnings of an impending failure, but I assumed the parity rebuilding of Disk 7 simply accessed disk 3 enough that a problem was exposed... but I found it odd that it wasn’t being emulated via the dual parity. I figured maybe the system can recover two drives but maybe not emulate two drives? Either way, once the parity rebuild finished I I slapped in another spare drive to try and rebuild disk 3… but it actually only offered me the option to format it, which had me worried about data loss, so I didn’t... it was about here that I started looking for help. I found this thread that taught me how to fix the file system on an unmountable drive (I see a fair number of threads on this popping up) however I am scared to do that when I’m down two drives. https://forums.unraid.net/topic/69765-solved-unmountable-no-file-system/ Meanwhile my rebuild of Disk 7, which I thought completed without issue, is actually not working at all. Despite the parity rebuild completing the drive continues to say contents emulated and device disabled. I worry perhaps my power supply has simply aged to the point where it’s not providing enough juice for stable operation and that’s caused the problems so if I keep trying it will only get worse. (I have been thinking it was time to rebuild the entire system with less drive bays now that there are 10-14 TB drives on sale anyway). I also assumed that if I wasn't adding anything to the drives my parity would save me, but I have discovered that my time machine backups just kept going throughout this entire process so now I'm worried the parity has gotten mangled. So now I’m down two drives and concerned about data loss, and I don’t want to take any action without the advice of someone better at this than me. I’ve also now stupidly lost track of which of these old drives were #3 & #7, assuming there’s even a way to try and salvage as much off them as is possible. Please help. I would happily pay Limetech for support here if such a service was offered. Question 1: Why is disk 7 still saying it's disabled and being emulated after a rebuild? 2: Is it safe to try and fix the file system of Disk 3 when down a drive? Or Two? 3: How can I determine if my power supply isn't providing enough juice? 4: How can I find out which was which drive? 5: How can I get any data off the old drives? I have windows machines and mac machines I could throw the drives in but I don't think they can be read by either? akira-diagnostics-20200507-0717.zip Edited May 7, 2020 by Teg Quote
JorgeB Posted May 7, 2020 Posted May 7, 2020 You should have save the diags before rebooting, or we can't see what happened, for current situation first fix file system on disk3, if successful rebuild both, you can do it at the same time since you have dual parity. Quote
Teg Posted May 7, 2020 Author Posted May 7, 2020 Thanks for chiming in johnnie.black, I did the fix on the file system for disk 3 and it completed and then the content started showing up emulated, am rebuilding the drives now. Quote
Teg Posted May 7, 2020 Author Posted May 7, 2020 (edited) The rebuild failed with a read error, diagnostic zip attached. In the logs I see a bunch of: May 7 12:24:39 AKIRA kernel: md: disk3 write error, sector=2911057832 I am assuming the disk I'm rebuilding onto is probably defective? Unless this is indeed my hunch about power issues? I'll try another disk, would power issues cause this? Is there any way to check? Diagnostic download at the time of the failure attached akira-diagnostics-20200507-1556.zip Edited May 7, 2020 by Teg Quote
Teg Posted May 7, 2020 Author Posted May 7, 2020 (edited) Ok wow, I just shut it down to try swapping the drives and now disk 3 is back to being unmountable... but so are disks 5 6 and 7!!! WTF is going on! Diagnostic file saved at the time attached. akira-diagnostics-20200507-1632.zip Edited May 7, 2020 by Teg (adding diagnostic zip) Quote
JorgeB Posted May 8, 2020 Posted May 8, 2020 Disk3 dropped offline so there's no SMART (there also wasn't on the original diags), check cables to see if it comes back online and post new diags. Also, you're still using SASLP controllers, those can go wrong at any time and cause multiple disk issues. Quote
Teg Posted May 18, 2020 Author Posted May 18, 2020 Hey Johnnie, I see you're someone who helps a lot of people on these forums so I wanted to say thank you for your time. I got new drives and was able to rebuild the file system and get back up and running. It does actually appear as though I had two drives fail seemingly at once, which is probably because I didn't always order my drives from different manufactures from different places, and they were probably from the same batch. I will replace my SASLP controllers, I have see that you recommend the LSI chipset, and it looks like this one will slap right into where my marvells are and I shouldn't miss a beat. Question 1, can you confirm this? https://www.amazon.ca/gp/product/B002RL8I7M/ref=ox_sc_mini_detail?ie=UTF8&psc=1&smid=A1UMPX8HZ468X0 I also did the math on the number of drives I have and as I have added more after the initial build, and as the power supply has aged, I strongly suspect I don't have enough juice. I have noticed the system won't even boot if I add a red/green drive to the empty slot on the row where I have the black parity and cache drives, and I know the rows are how the power is distributed, and that I had to use splitters as the power supply didn't have enough molex ports. Question 2, how can I know if my PSU isn't providing enough juice? The dash nor logs ever seem to ever say anything about power dropping. Quote
JorgeB Posted May 18, 2020 Posted May 18, 2020 3 hours ago, Teg said: Question 1, can you confirm this? https://www.amazon.ca/gp/product/B002RL8I7M/ref=ox_sc_mini_detail?ie=UTF8&psc=1&smid=A1UMPX8HZ468X0 Should be fine assuming it's genuine. 3 hours ago, Teg said: Question 2, how can I know if my PSU isn't providing enough juice? The dash nor logs ever seem to ever say anything about power dropping. Not easy to diagnose power issues with the logs, easiest way is just to try with a different PSU if available. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.