DBSilvers Posted November 6, 2023 Share Posted November 6, 2023 Hello, I woke up to a drive in my array being in an error state and am looking for some help on what my next steps should be. As far as I can tell the drive is still in good condition. I feel I do need to get another additional disk, 94% utilization feel like a possible sin. Is there a way to level out the data on all drives? Does it work like that? I've attached my diagnostics just to give some extra info on the situations. This has happened to another drive a year or so back and I stopped the array, removed the drive and the re-added the drive back for parity to rebuild the drive. Is that the correct step? I appreciate your time and any help I can get. Thank you! tower-diagnostics-20231106-0958.zip Quote Link to comment
itimpi Posted November 6, 2023 Share Posted November 6, 2023 Looking at the logs it appears that disk1 started getting errors just after resuming a paused parity check. Since at that point all spun down drives will simultaneously be spun up are you sure your power supply can handle spinning up all drives at the same time? Quote Link to comment
DBSilvers Posted November 6, 2023 Author Share Posted November 6, 2023 Hey! Thank you for the plugin! Hmm good question, it's a 750 watt, 80+ Gold. I don't normally have any issues. Do you think it would be wise to upgrade it? I eventually want to upgrade my whole setup but that might be a year or so away, but upgrading the power supply could be a start. Quote Link to comment
itimpi Posted November 6, 2023 Share Posted November 6, 2023 32 minutes ago, DBSilvers said: Hmm good question, it's a 750 watt, 80+ Gold. I don't normally have any issues. Do you think it would be wise to upgrade it? Sounds as if it should be enough. However too many drives on a single cable can sometimes cause intermittent problems, as can power splitters. You might also just have one where the power is not well seated. Quote Link to comment
DBSilvers Posted November 6, 2023 Author Share Posted November 6, 2023 Ok I'll shut it down and do a little rewiring to make sure the power draws are more evenly distributed and seated correctly. What would you suggest after that? Quote Link to comment
DBSilvers Posted November 6, 2023 Author Share Posted November 6, 2023 Ok shut down the array and then the server. Reconfigured the power and re-seated all the plugs. Booted up, the drive is still disabled, everything else seems normal. Is next correct step to removed the drive and the re-added the drive back for parity to rebuild the drive. Is that the correct step or is there better way? Quote Link to comment
JorgeB Posted November 7, 2023 Share Posted November 7, 2023 If the emulated disk is mounting and contents look correct you can rebuild on top, I would recommend replacing/swapping cables before to rule that out. Quote Link to comment
DBSilvers Posted November 7, 2023 Author Share Posted November 7, 2023 Thank you for the follow up, got it rebuilding now. Quote Link to comment
DBSilvers Posted November 8, 2023 Author Share Posted November 8, 2023 (edited) Ok the drive has been rebuilt but it looks like another drive got errors, it didn't become disabled though. I does appear that some files are missing which is concerning. Edit: Browsing the disks without errors the files seem to still be there. I'm unable to browse the files for the drive with the errors, it says 'No listing: Too many files' tower-diagnostics-20231108-0642.zip Edited November 8, 2023 by DBSilvers Quote Link to comment
JorgeB Posted November 8, 2023 Share Posted November 8, 2023 Looks more like a power/connection issue with disk4, but because of these errors the rebuilt disk can have some corruption. Quote Link to comment
DBSilvers Posted November 8, 2023 Author Share Posted November 8, 2023 Browsing the disks without errors the files seem to still be there. I'm unable to browse the files for the drive with the errors, it says 'No listing: Too many files' I will fully replace the cables now. Would the next step be to rebuild the drive again after I replace the cables? Quote Link to comment
JorgeB Posted November 8, 2023 Share Posted November 8, 2023 Reboot and post new diags after array start. 38 minutes ago, DBSilvers said: Browsing the disks without errors the files seem to still be there. They should all appear, but likely some will be corrupt, unless by luck the read errors all coincided with empty disk space. Quote Link to comment
DBSilvers Posted November 8, 2023 Author Share Posted November 8, 2023 A few things have happened: I replaced the cables and then I couldn't get it to boot. I think those cables might even be bad or the power supply is bad. I could only get it to boot when I took off all the drives. I thought my battery backup died. Turns out the outlet has stopped working. I'm thinking that might be where the intermittent power issue might have been. I'm trying to handle the outlet issue. Crossing my fingers that when I can get it to boot all will be corrected. Quote Link to comment
DBSilvers Posted November 9, 2023 Author Share Posted November 9, 2023 Man what a day... Well the power outlets got fixed. Turned out to be a loose wire on a different outlet making two separate outlets go out. I replaced the power supply and used the new cables. Booted up and half my drives said no device so I shut down and check the cables. Booted back up and now it says no device on all devices. I think my Mini SAS to 4 SATA Cables are bad so I ordered some new ones. Hopefully the new cables will fix it up. tower-diagnostics-20231108-1955.zip Quote Link to comment
DBSilvers Posted November 11, 2023 Author Share Posted November 11, 2023 Replaced the Mini SAS to 4 SATA Cables. I enabled the cleared the bios and re-enabled the LSI HP SAS Expander Card. 1 parity drive detects, 2/5 drives are being detected, 0/1 cache drive detected. I'm at a loss for words right now, Any help would be greatly appreciated. tower-diagnostics-20231111-1134.zip Quote Link to comment
DBSilvers Posted November 11, 2023 Author Share Posted November 11, 2023 (edited) I've double checked all the drives are that aren't being detected and double checked the 3rd pin is taped. It's like the drives aren't even receiving power. Is it possible that the power issue shorted out the hard drive boards? I'm trying to stay calm. I just can't believe all of this is happening. Edited November 12, 2023 by DBSilvers Quote Link to comment
JorgeB Posted November 12, 2023 Share Posted November 12, 2023 Try connecting one of those drives to the onboard SATA with a molex to SATA adapter to rule out any 3.3v issues, if it doesn't spin up or is not detected by the board BIOS then the drive is likely dead. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.