otherworldview Posted June 30 Share Posted June 30 Morning all, I'm running a Dell R720 with an MD1200 shelf bay attached. While the system has been fine over the past 4years of running it, the 4tb HDs that came with the MD1200 are starting to fail. Unfortunately I can't seem to be able to stop my array to build a new configuration as I remove these failed drives. Any assistance would be appreciated. Thanks, server-diagnostics-20240630-1056.zip Quote Link to comment
JorgeB Posted July 1 Share Posted July 1 Type reboot in the CLI, if it doesn't reboot after 5 minutes you will need to force it. Quote Link to comment
otherworldview Posted July 4 Author Share Posted July 4 I was able to reboot through either the GUI or CLI, neither one really gave me feedback like it normally would that it was performing the action. Figured I'd look at it when I get back from vacation and found that it did reboot, but uncleanly. I still can't stop the array whenever I click it on the main array. I'm actually getting more concerned because with more of the 2nd hand hard drives erroring I don't want to our data. I'm worried that I'll just have to bite the bullet to piecemeal another temp backup server to get the data secured until I build another dedicated machine. server-diagnostics-20240704-1039.zip Quote Link to comment
JorgeB Posted July 4 Share Posted July 4 Look like disk7 dropped offline after a lot of errors, power down, check/replace cables and post new diags after array start. Quote Link to comment
otherworldview Posted July 4 Author Share Posted July 4 Shutdown Unraid properly, but the MD1200 doesn't have a power button so it has to be physically unplugged. Pulled the drives and all looks good (the systems are in a rack in the basement that doesn't get played with). Plugged everything back in and powered up to a lovely "improper shutdown detected" which is pretty much typical at this point each time I do a reboot / shutdown and see the errors from that drive are cleared. Array still not stopping when I tell it to proceed. server-diagnostics-20240704-1158.zip Quote Link to comment
itimpi Posted July 4 Share Posted July 4 Looks like you have file system level corruption on disk9 and disk16 Jul 4 11:56:23 Server kernel: XFS (md16p1): Metadata CRC error detected at xfs_agi_read_verify+0x85/0xfa [xfs], xfs_agi block 0x2 Jul 4 11:56:23 Server kernel: XFS (md16p1): Unmount and run xfs_repair Jul 4 11:56:23 Server kernel: XFS (md16p1): Metadata CRC error detected at xfs_agi_read_verify+0x85/0xfa [xfs], xfs_agi block 0x2 Jul 4 11:56:23 Server kernel: XFS (md16p1): Unmount and run xfs_repair you should run a check filesystem on these and then a repair. You may want to post the output of the checks to get feedback before proceeding with the repair? Quote Link to comment
otherworldview Posted July 4 Author Share Posted July 4 Disk 9 and 16 were pulled since they were failing. This is why I'm trying to stop the array and rebuild so my parity can be rebuilt. Quote Link to comment
Solution JorgeB Posted July 4 Solution Share Posted July 4 Jul 4 11:58:11 Server emhttpd: device /dev/sdy problem getting id Jul 4 11:58:11 Server emhttpd: device /dev/sdw problem getting id Jul 4 11:58:11 Server emhttpd: device /dev/sdt problem getting id These mean Unraid is seeing some disks twice, note that you can only have one cable from the server to the enclosure, Unraid does not support SAS multipath. After those, though not sure it's because of them, emhttp crashed: Jul 4 11:58:11 Server kernel: emhttpd[6209]: segfault at 67c ip 0000000000418336 sp 00007ffd1f8cc330 error 4 in emhttpd[404000+1a4000] likely on CPU 15 (core 7, socket 1) So you will need to reboot again, but fix the issue above. Quote Link to comment
otherworldview Posted July 4 Author Share Posted July 4 That's something I've noticed since, I want to say, I added the MD1200. It shows those drives again even though they're part of the array. I would imagine hitting the clear disk or removing partition would be detrimental? Quote Link to comment
JorgeB Posted July 4 Share Posted July 4 10 minutes ago, otherworldview said: I would imagine hitting the clear disk or removing partition would be detrimental? Don't do that, just disconnect one of the cables. Quote Link to comment
otherworldview Posted July 4 Author Share Posted July 4 Pulled the cable and it cleared up. Thanks for the assist JorgeB & itimpi for troubleshooting / walking me through this. Final question: Am I good with the new config options for rebuilding without loss? Thanks again and happy holidays if you're in the USA! server-diagnostics-20240704-1439.zip Quote Link to comment
JorgeB Posted July 5 Share Posted July 5 Before doing a new config make sure the actual disks 9 and 16 are mounting, you can test with the UD plugin. Quote Link to comment
otherworldview Posted July 5 Author Share Posted July 5 7 hours ago, JorgeB said: Before doing a new config make sure the actual disks 9 and 16 are mounting, you can test with the UD plugin. I unfortunately got a little excited after waiting a few hours after your response for fix and ran the new config. I'm going to have to run the parity build again because disk7 is erroring up a storm (63k errors) so will pull that and put the other two back. Considering these were drives that came with the MD1200 I'm not surprised these drives are pooping out. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.