Jump to content

Multiple drives failing in the last 2 weeks


Go to solution Solved by JorgeB,

Recommended Posts

I had an array of 11 disks total, 1 parity and 10 disks. One of the drives failed on Sep 22, I didn't think much of it at the time and didn't even really troubleshoot it much. I had been runnng out of space so I decided to use this as an opportunity to upgrade to a bigger drive and while I'm at it, ad some more. So, I bought 3 14 tb drives (my parity is a 14 tb). I used the three drives to first, replace the bad drive, then I added one more drive as a parity so I'd have 2 parity drives, and then finally I added the third drive as an additional disk increasing my total drive space. 

 

Everything was fantastic till this Tuesday October 3, I had another drive reporting errors so the disks was taken off and data was being emulated. I figured, now I gotta start investigating. But, as life happens, I was a bit too busy. Anyway, I woke this morning and now I have a second drive failing. So, I'm here now, in desperate need of some guidance, because 1 drive failure can be expected, 2 can be a coincidence, 3 though... well that's a pattern, and likely a problem (or at leasts I thinks it is). 

 

I've attached the diagnostics effective this morning while the array was running, and I've attached a screenshot from how it was on Tuesday (10/03/23), and then one from this morning (10/05/23). 

I'm taking the array offline because well, I don't know I just feel safer that way, but of course, I'm here, so I'm not an expert.

 

Any guidance would be helpful and if any additional logs / diagnostics, etc are needed, please let me know.

Screenshot from 2023-10-03 11-54-42.png

Screenshot from 2023-10-05 09-26-54.png

server-name-diagnostics-20231005-0922.zip

Link to comment

Diags are after rebooting, so we can't see what happened, but both disks look healthy, so most likely a power/connection problem, disk9 is showing some ATA issues after this latest reboot, since both disabled disks are on different controllers a power issue would be the first suspect, like a splitter if shared, or the PSU in general.

Link to comment
1 hour ago, dv310p3r said:

So, i'm thinking the drives are ok, but now they have red X's on them, and I can't figure out how to rebuild them. At least from my searching that's what it seems I have to do. Anyone got any advice?


have you tried the instructions from the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page.   The Unraid OS->Manual section covers most features of the current ‘Unraid release.

 

Link to comment

You know, you could link me. Pointing me to the manual doesn't help very much considering that there's a lot of stuff there, and all i'm looking to do is know how to re-enable a drive. I've searched for that in the manual and I'm not easily able to find it. 

Now I remember why dislike using forums so much.

Edited by dv310p3r
Link to comment

Ok, so for anyone else out there looking for a simple answer to the question... how do I enable (re-enable) disks on my array that I think are fine but now have the Red X on them, here's what I found on Reddit where someone answered someone else's exact same question. 

 

  1. Stop the array
  2. Unassign the disk
  3. Start the array in maintenance mode
  4. Now stop the array again
  5. Reassign the drive to the same slot
  6. Then start the array back up again in maintanence mode and it'll start to rebuild them
    1. I'd leave the rebuild to happen in maintenance mode, at least that's what worked for me (see edit note below)

 

Image of the post on Reddit for reference.

Screenshot from 2023-10-07 18-01-17.png

 

EDIT: I was having issues with the system freezing up after following the instructions above. I'm not sure what it was exactly, but I decided to start the array in maintenance mode to do the rebuild, and it seems to be working just fine now. So I edited the steps above to include that info.

Edited by dv310p3r
Link to comment
4 hours ago, dv310p3r said:

You know, you could link me. Pointing me to the manual doesn't help very much considering that there's a lot of stuff there, and all i'm looking to do is know how to re-enable a drive. I've searched for that in the manual and I'm not easily able to find it. 

You probably would have received a link if you just asked.  Remember, people are volunteering their time to try to help you; a positive attitude goes a long way.  The manual is at docs.unraid.net and is laid out rather well.  Follow these clicks: Unraid OS > Manual > Storage Management > Replacing disks > Rebuilding a drive onto itself

 

Link to comment
10 hours ago, dv310p3r said:

You know, you could link me. Pointing me to the manual doesn't help very much considering that there's a lot of stuff there, and all i'm looking to do is know how to re-enable a drive. I've searched for that in the manual and I'm not easily able to find it. 

I was not in a position to provide the link so wanted to know if you had looked as many people have not.   If you did look and did not find the answer (which is there) a suggestion  from you on what would have helped guide you to the correct section would be of help in improving the documentation.

Link to comment

Look, I get it... I asked a question in a forum and got the classic, RTFM response. I should have known better. Either way, I think something that can be taken away here is that whatever UnRAID did to their manual, was kind of silly. As you scroll through past forum posts, and click on links to the manual, they don't take you, anymore, to where the poster thinks their link was taking people. UnRAID made a major change to the manuals online and it's caused a lot of confusion. As moderators maybe you can help them get that message.

Link to comment
10 minutes ago, dv310p3r said:

Look, I get it... I asked a question in a forum and got the classic, RTFM response. I should have known better. Either way, I think something that can be taken away here is that whatever UnRAID did to their manual, was kind of silly. As you scroll through past forum posts, and click on links to the manual, they don't take you, anymore, to where the poster thinks their link was taking people. UnRAID made a major change to the manuals online and it's caused a lot of confusion. As moderators maybe you can help them get that message.

That is one reason why it is best to go to the top level of the documentation via the links from the top/bottom of forum or GUI as they tend to be kept up-to-date.    Links from individual forum posts are always liable to break after some time has passed.

 

Suggestions on improving the actual documentation to help make it easier to find specific items are always welcomed.   I have fed some of my personal ideas back to Limetech but other people’s viewpoints will be useful to see if they have the same idea.

 

 

Link to comment

Well, after a day of rebuilding in maintenance mode, I turned on the array in normal mode, and everything seemed fine, all the drives are reporting correctly, but after a minute or two, it stopped responding. 

I can't get diagnostics from the Command Line because when I plug in my monitor and keyboard, I see the following error. I haven't shut down in case someone can give me some directions to get the diagnostics from the system.

 

Any help would be appreciated.

error.jpg

Edited by dv310p3r
Link to comment

Nope, no ability to SSH. 

So, I did a bit of searching and saw a random post somewhere on this forum where a person said this error was a generic error (he had basically the same one). He said his problem was a Docker container. So I restarted, quickly went into the docker tab and turned off all dockers and made sure they don't come back up on restart. Then I started them all one by one. Turns out my deluge container was causing the problem somehow. So I deleted it and now I'm all good... no issues all night. 

Not sure what that was all about.

Link to comment
On 10/5/2023 at 1:41 PM, JorgeB said:

I would start by checking/replacing cables, but if you can try a different PSU it might be worth it, and if it happens again save the diags before rebooting.

 

 

 

With all the other stuff happening after getting the system back up, I haven't had a chance to thank JorgeB for his help. Thanks a million for this, I wouldn't have thought to try this or look at this as a problem and after the replacement of the PSU and adding some additonal SATA power cables, this did the trick. So, THANKS A MILLION!

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...