JasenHicks Posted April 10, 2021 Share Posted April 10, 2021 I'm absolutely gutted right now. I have a 9 disk array (8+1 parity drive) and one failed. Conveniently the other drives on that controller card decided to not be available after I rebooted (thinking this 'failed drive' after 2 years may be a random unraid issue that required a reboot to fix). After I rebooted the array was completely offline and 4 drives on my LSI card were no longer seen by UNRAID. I did some troubleshooting and decided to replace the card with another LSI in IT Mode (9811i, new flashed successfully right before install). On boot up the LSI card sees the 4 drives attached, I get into Unraid, it sees the 3 original drives and the new drive that I popped in to replace the failed drive. I start it on its way to rebuild the array and go to bed. It finally finished this morning and there were A LOT OF ERRORS (screen grab attached). I checked some folders I knew I had files in from the past few days, and those files are not there. I clicked into the folder icons next to each drive in the array and the disks attached to my motherboard SATA controller all show directories/etc, but when I clicked the folder icon next to any of the drives on the new card, they are empty. Additionally, the drives on the card aren't reporting temperatures. I had shut down all my docker containers during the array rebuild/check process and went to start them up after it finished. They start but none of the GUIs load in the web browser. I did notice my log file is at 100%, I assume due to the read errors during the parity check. Anything you can think of? I have a new server coming that I was planning on migrating the drives, networking card, and Quadro card over to but now I am concerned that I may have just lost 28TB of data. zeus-diagnostics-20210410-1026.zip zeus-syslog-20210410-0128.zip Quote Link to comment
John_M Posted April 10, 2021 Share Posted April 10, 2021 16 minutes ago, JasenHicks said: I did some troubleshooting and decided to replace the card with another LSI in IT Mode Why did you replace the card? The replacement doesn't seem to be working - masses of PCIe bus errors. Is there any reason not to use the original one, which might well have been working properly? Quote Link to comment
JasenHicks Posted April 10, 2021 Author Share Posted April 10, 2021 The card was replaced because it wasn't being seen by the system. I.e. none of the drives were visible on it and they were being reported as "missing". I couldn't find it in the device listings in UNRAID which led me to replacement. Based on the PCIe bus errors, it looks like something else may be afoot? Quote Link to comment
John_M Posted April 10, 2021 Share Posted April 10, 2021 The current card isn't being seen either. Try a different slot or maybe reseat the CPU in its socket. Quote Link to comment
JasenHicks Posted April 10, 2021 Author Share Posted April 10, 2021 Could it be an UNRAID specific thing? I see it on boot. Loads up and shows the drives before getting to the UNRAID boot screen. System is in a rack and has been for about 2 years, CPU hasn't been touched. Ill try a different slot to see what that yields but I did move it around last night a few times to get it recognized. Quote Link to comment
JorgeB Posted April 10, 2021 Share Posted April 10, 2021 5 hours ago, JasenHicks said: It finally finished this morning and there were A LOT OF ERRORS (screen grab attached). I checked some folders I knew I had files in from the past few days, and those files are not there. That many errors suggest the controller dropped (or all the disks there dropped, syslog doesn't show what happened), rebooting or fixing the issue should bring all your data back. Quote Link to comment
JasenHicks Posted April 10, 2021 Author Share Posted April 10, 2021 This day is getting worse and worse, I think my USB key died in the middle of troubleshooting everything. Do I have a back up? NO because I am an idiot. Am I completely screwed now and looking at a full reconfigure and rebuild? Quote Link to comment
JorgeB Posted April 10, 2021 Share Posted April 10, 2021 Create a new USB, assign all the disks like the screenshot above, check "parity is already valid" before array start, run a parity check. Quote Link to comment
ChatNoir Posted April 10, 2021 Share Posted April 10, 2021 Did you install the My Servers plugin ? Quote Link to comment
JasenHicks Posted April 11, 2021 Author Share Posted April 11, 2021 @ChatNoir - I didn't. I will be once I recover. @JorgeB - I think the PCI-E slot on my X399 motherboard failed or something. I spent the last 4 hours migrating my i7700K board, memory and CPU to my UNRAID system, managed to get all drives assigned (except my cache pool), checked parity valid and started the array. It looks like everything is being seen and there is at least data on each of the drives now. I am running the parity check now, I did notice that it found quite a few errors already (perhaps its fixing the replacement drive that's blank). Looking at the "MAIN page" no errors yet. I think at this point last time there was a ton. We might be on the right track! Once this is done, I suppose I have to troubleshoot the cache pool so that the dockers and such start running properly. I appreciate all the help so far everyone. The community spirit is great 1 Quote Link to comment
JasenHicks Posted April 12, 2021 Author Share Posted April 12, 2021 So, what do I do when the Parity check finishes but you are missing files? Seems pointless to have a parity drive that's suppose to save you from a single drive failure when it doesn't actually work. Or I am not finished? For awareness, when I click the folder icon for Drive 7 (highlighted in pink) it's empty, but it shows over 4TB used. I suspect, that's where my missing files are. tower-diagnostics-20210412-1713.zip Quote Link to comment
JasenHicks Posted April 12, 2021 Author Share Posted April 12, 2021 tower-syslog-20210412-0813.zip Quote Link to comment
JorgeB Posted April 12, 2021 Share Posted April 12, 2021 36 minutes ago, JasenHicks said: So, what do I do when the Parity check finishes but you are missing files? That suggests there were errors on another drive(s) during the rebuild, and the sync errors found after the parity check also support that. Quote Link to comment
JasenHicks Posted April 12, 2021 Author Share Posted April 12, 2021 So, I'm screwed then and lost those files? Quote Link to comment
JorgeB Posted April 12, 2021 Share Posted April 12, 2021 Is the drive that failed still intact? And are you sure it actually failed, i.e., is is dead or still detected? Also keep in mind that Unraid (or any RAID) is not a backup, you should still have backups of any important data. Quote Link to comment
JasenHicks Posted April 12, 2021 Author Share Posted April 12, 2021 (edited) 6 minutes ago, JorgeB said: Is the drive that failed still intact? And are you sure it actually failed, i.e., is is dead or still detected? Also keep in mind that Unraid (or any RAID) is not a backup, you should still have backups of any important data. 1. It did report as failed, however, I do have it and I may just shut down and put it back in to see if I can access it. Nothing to lose at this point. 2. Yes, I understand that. Nothing mission critical was lost, however, its still not going to be fun recovering it from old backup drives, etc. if I can't get make that work. I guess ultimately, its probably OK that this happened. It's made me re-evaluate some architecture in the system and gave me an excuse to splurge on another server. - I assume I can just shut down the system, swap in the old HDD #7, and boot it back up re-assigning the old drive as #7? Edited April 12, 2021 by JasenHicks Quote Link to comment
JorgeB Posted April 12, 2021 Share Posted April 12, 2021 3 minutes ago, JasenHicks said: I assume I can just shut down the system, swap in the old HDD #7, and boot it back up re-assigning the old drive as #7? First try to mount it with the UD plugin (with the array stopped). Quote Link to comment
JasenHicks Posted April 12, 2021 Author Share Posted April 12, 2021 1 hour ago, JorgeB said: First try to mount it with the UD plugin (with the array stopped). Holy shit.... when I mounted it I was able to see files on it again! Looks like the SMART passed on it too... Now, I think I changed too much to figure out exactly what happened. That being said, I am not risking this long term and will still bring another server online to be safe and start a scratch UNRAID install on it. So, what's next? You're on this journey with me now @JorgeB and we're close to the end! tower-smart-20210412-0331.zip Quote Link to comment
JorgeB Posted April 12, 2021 Share Posted April 12, 2021 Change that filesystem's UUID (Settings -> Unassigned devices) then you can have the array started together with that disk and compare both disks content, you can for example use rsync. Quote Link to comment
JasenHicks Posted April 12, 2021 Author Share Posted April 12, 2021 8 minutes ago, JorgeB said: Change that filesystem's UUID (Settings -> Unassigned devices) then you can have the array started together with that disk and compare both disks content, you can for example use rsync. Click this guy? Should it "do" anything that I can see? Once I get that done, I want to make sure I do this right: 1. Start the Array 2. Mount the ZAD8EYTY drive 3. rsync between the two? Should I do that in the UNRAID terminal? Quote Link to comment
JasenHicks Posted April 12, 2021 Author Share Posted April 12, 2021 I clicked it.... I saw this in the Syslog Quote Link to comment
JorgeB Posted April 12, 2021 Share Posted April 12, 2021 Run xfs_repair first: xfs_repair -v /dev/sdd1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.