Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Large Disk Failure Help

Featured Replies

  • Author

Wow, that's surprising that after all this time, now I lose 2 drives.  The thing is, Drive 9 still shows up in unraid as available for use.

 

What should I do then since if I try to rebuild my Parity, it goes super slow?

 

The only thing that makes me slightly worried is the 2 connectors connecting Drive 9/10 are at the very end of my line of power connectors for HDDs.  Previously, they were connected to the out of use drives.  When I recabled, I used those 2 connectors as there were 2 dedicated connectors for drives 9/10 with no splitters there before, and those were easier to move away for my SSDs.  

 

Oh well, if I lose drives I lose drives, what's my next step here?

 

Edit: Maybe I should recable and remove the Marvel Controller out of the equation?

Edited by tential

  • Replies 208
  • Views 20.3k
  • Created
  • Last Reply
  • Community Expert

You should try to rebuild disk10, that's the worst one, there will be be some corruptions because of disk9 but it's the best option you have.

When that is done replace disk9, and only after that resync parity1.

 

Bad/failing power can damage disks, so best to make sure all is good before proceeding, also like I mentioned 500W is IMO on the low side for 15 disks, I don't use more than 12 disks on 500W, for 14/15 disks I use 550W.

  • Author
31 minutes ago, johnnie.black said:

You should try to rebuild disk10, that's the worst one, there will be be some corruptions because of disk9 but it's the best option you have.

When that is done replace disk9, and only after that resync parity1.

 

Bad/failing power can damage disks, so best to make sure all is good before proceeding, also like I mentioned 500W is IMO on the low side for 15 disks, I don't use more than 12 disks on 500W, for 14/15 disks I use 550W.

Maybe it's a power issue, it only started when I tried to use all 15 of my drives at once.

I came across this thread here where the error I'm having went away when the person upgraded their PSU.

https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous

 

If I can't rebuild due to the process being very very slow, should I try upgrading my PSU next then?

 

tower-diagnostics-20180207-0935.zip

Latest Diagnostic.

Edited by tential

  • Community Expert
36 minutes ago, tential said:

should I try upgrading my PSU next then?

I would, or disconnect the unassigned disks you currently have connected.

  • Author
1 minute ago, johnnie.black said:

I would, or disconnect the unassigned disks you currently have connected.

Just was having the SAME thought, and was going to ask.  Alright next step!  I'm going to enjoy watching an episode or two first of TV before I try.

 

I'm going to keep the two power cables that were connected to ATA9/10 disconnected (The way they were before/connected to unassigned drives) and use different power cables in their place as well as disconnect the unassigned disks. 

 

Thanks, wish me luck!

  • Community Expert

Good luck ?

  • Author

tower-diagnostics-20180207-1404.zip

 

That's my latest diagnostic now.  What's the damage here?

 

Should I next swap the ATA9/ATA10 data cables with my SSD to rule that out?  That way those 2 drives are now connected to my MOBO, and the 2 SSDs I can move to the Marvel controller to isolate it and see if I am having trouble with the controller or those 2 drives specifically.  

 

Edit: I'm guessing that diagnostic won't be very god, I started the array and am now getting a ton of read errors on Disk 7/8.  I think I'm making this worse.

 

I'm not sure why, Transmission is starting off ON, too, even though autostart is off.  I'm sure that's not helping either when that turns on at the beginning each time.

Edited by tential

  • Community Expert

The latest diags don't show any issues, but they only cover a couple of minutes and the array didn't even start yet.

I would mention that most drives will lock the drive heads after a relatively short period of time being idle. There is a SMART attribute called load cycle count (LCC) that tells you how many times the heads have locked, and I think you'll find it is a pretty big number. This head locking helps prevent the heads from slamming into the disk surface and causing damage.

  • Author

Ok, well I'm hoping it isn't bad.  My last array start had unraid hard lockup.  I've never had that happen before.  Tons of errors spit out to me, which I couldn't get out of the diagnostic :(tower-diagnostics-20180208-0604.zip

 

I'm just going to wait for ya'll to give me a clear set of what I need to do next, so I don't mess this up anymore.  I feel like at this point, I don't even know how to get myself running stably again.  Maybe I need that new PSU I don't know I'm sad.

  • Community Expert

Nothing on the diags are they from after rebooting?

  • Author
4 minutes ago, johnnie.black said:

Nothing on the diags are they from after rebooting?

Yes, I couldn't get anything from unraid before.  It was just locked up.  I've never had that happen to me.  I couldn't cancel parity or anything.  I shutdown and the PC stayed on all night and never was able to shut off although the server was inaccessible.  I couldn't pull a diagnostic sadly, and I eventually had to hard power off.

 

This is after the reboot now, I didn't start the array since I got a ton of read errors like a million, then the parity check was still going, but was going to take 30 days.

 

Not sure what to do here, this is now after the reboot.  

 

Sorry to have made this so massively confusing.

 

 

  • Community Expert

There are no syslog errors but you have the known failing disks 9 and 10 still assigned, were you trying to sync parity1?

 

If so and like I said that won't work, you should try to rebuild disk10 to a new disk, that's the worst one, there will be be some corruptions because of disk9 but it's the best option you have, when that is done replace disk9, and only after that resync parity1.

 

 

  • Author

Ok, will try that now.

 

Any thoughts as to how I lost disk 9/10?  

 

I'm getting errors everywhere now though.

 

Feb 8 06:43:00 Tower kernel: md: disk7 read error, sector=14552776

 

That can't be good?  Over a million of them.

On Disk 6/7/8.  Rebuild is still going, says it will take 11 hours now.

 

I'm just mind boggled as to what's going on here it feels like things are getting worse not better.

 

MwC12oe.png

Edited by tential

  • Community Expert

 O.o Post current diags

  • Community Expert

Good news is that SMART for disks 6, 7 and 8 looks OK, probably a cable issue, they are likely sharing the same miniSAS cable on the LSI, it can also be a controller/controller port/power issue.

 

You can try canceling the rebuild and replacing the miniSAS cable, or swap ports with the other one, to try and rule things out, then restart the rebuild.

 

  • Author

Ok I'm trying that now.  With the amount of times power is being mentioned and considering I'm at the power limit of 15 drives/CX500 W PSU, should I just order a new PSU now?  Looks like I'm screwed on that side either way.  Especially since I was going to upgrade to a Xeon/more GPUs this year too.

  • Community Expert

Errors stayed with the disks, did you swap the cable or HBA port?

  • Author
Just now, johnnie.black said:

Errors stayed with the disks, did you swap the cable or HBA port?

 

I swapped the cables attached to ports on the LSI Card around.  I didn't swap the SATA cables attached to the HDDs around, I simply replugged them.

  • Community Expert

OK, this rules out the port, problem still might be the miniSAS cable.

  • Author

Should I flip the Sata cables around attached to the miniSAS cable?

  • Community Expert

You can do that If you don't have another cable.

Seriously, I'd go for a new power supply. I know from experience that inadequate or failing power supplies can be responsible for some very obscure and frustrating fault conditions. My personal favourites are the Corsair AX and RMx series but there are other decent brands. You need a single +12 volt rail and I'd go for a power rating in the region of 650 to 750 watts, depending on your future expansion plans.

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.