February 7, 20188 yr Author Wow, that's surprising that after all this time, now I lose 2 drives. The thing is, Drive 9 still shows up in unraid as available for use. What should I do then since if I try to rebuild my Parity, it goes super slow? The only thing that makes me slightly worried is the 2 connectors connecting Drive 9/10 are at the very end of my line of power connectors for HDDs. Previously, they were connected to the out of use drives. When I recabled, I used those 2 connectors as there were 2 dedicated connectors for drives 9/10 with no splitters there before, and those were easier to move away for my SSDs. Oh well, if I lose drives I lose drives, what's my next step here? Edit: Maybe I should recable and remove the Marvel Controller out of the equation? Edited February 7, 20188 yr by tential
February 7, 20188 yr Community Expert You should try to rebuild disk10, that's the worst one, there will be be some corruptions because of disk9 but it's the best option you have. When that is done replace disk9, and only after that resync parity1. Bad/failing power can damage disks, so best to make sure all is good before proceeding, also like I mentioned 500W is IMO on the low side for 15 disks, I don't use more than 12 disks on 500W, for 14/15 disks I use 550W.
February 7, 20188 yr Author 31 minutes ago, johnnie.black said: You should try to rebuild disk10, that's the worst one, there will be be some corruptions because of disk9 but it's the best option you have. When that is done replace disk9, and only after that resync parity1. Bad/failing power can damage disks, so best to make sure all is good before proceeding, also like I mentioned 500W is IMO on the low side for 15 disks, I don't use more than 12 disks on 500W, for 14/15 disks I use 550W. Maybe it's a power issue, it only started when I tried to use all 15 of my drives at once. I came across this thread here where the error I'm having went away when the person upgraded their PSU. https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous If I can't rebuild due to the process being very very slow, should I try upgrading my PSU next then? tower-diagnostics-20180207-0935.zip Latest Diagnostic. Edited February 7, 20188 yr by tential
February 7, 20188 yr Community Expert 36 minutes ago, tential said: should I try upgrading my PSU next then? I would, or disconnect the unassigned disks you currently have connected.
February 7, 20188 yr Author 1 minute ago, johnnie.black said: I would, or disconnect the unassigned disks you currently have connected. Just was having the SAME thought, and was going to ask. Alright next step! I'm going to enjoy watching an episode or two first of TV before I try. I'm going to keep the two power cables that were connected to ATA9/10 disconnected (The way they were before/connected to unassigned drives) and use different power cables in their place as well as disconnect the unassigned disks. Thanks, wish me luck!
February 7, 20188 yr Author tower-diagnostics-20180207-1404.zip That's my latest diagnostic now. What's the damage here? Should I next swap the ATA9/ATA10 data cables with my SSD to rule that out? That way those 2 drives are now connected to my MOBO, and the 2 SSDs I can move to the Marvel controller to isolate it and see if I am having trouble with the controller or those 2 drives specifically. Edit: I'm guessing that diagnostic won't be very god, I started the array and am now getting a ton of read errors on Disk 7/8. I think I'm making this worse. I'm not sure why, Transmission is starting off ON, too, even though autostart is off. I'm sure that's not helping either when that turns on at the beginning each time. Edited February 7, 20188 yr by tential
February 7, 20188 yr Community Expert The latest diags don't show any issues, but they only cover a couple of minutes and the array didn't even start yet.
February 8, 20188 yr I would mention that most drives will lock the drive heads after a relatively short period of time being idle. There is a SMART attribute called load cycle count (LCC) that tells you how many times the heads have locked, and I think you'll find it is a pretty big number. This head locking helps prevent the heads from slamming into the disk surface and causing damage.
February 8, 20188 yr Author Ok, well I'm hoping it isn't bad. My last array start had unraid hard lockup. I've never had that happen before. Tons of errors spit out to me, which I couldn't get out of the diagnostic tower-diagnostics-20180208-0604.zip I'm just going to wait for ya'll to give me a clear set of what I need to do next, so I don't mess this up anymore. I feel like at this point, I don't even know how to get myself running stably again. Maybe I need that new PSU I don't know I'm sad.
February 8, 20188 yr Author 4 minutes ago, johnnie.black said: Nothing on the diags are they from after rebooting? Yes, I couldn't get anything from unraid before. It was just locked up. I've never had that happen to me. I couldn't cancel parity or anything. I shutdown and the PC stayed on all night and never was able to shut off although the server was inaccessible. I couldn't pull a diagnostic sadly, and I eventually had to hard power off. This is after the reboot now, I didn't start the array since I got a ton of read errors like a million, then the parity check was still going, but was going to take 30 days. Not sure what to do here, this is now after the reboot. Sorry to have made this so massively confusing.
February 8, 20188 yr Community Expert There are no syslog errors but you have the known failing disks 9 and 10 still assigned, were you trying to sync parity1? If so and like I said that won't work, you should try to rebuild disk10 to a new disk, that's the worst one, there will be be some corruptions because of disk9 but it's the best option you have, when that is done replace disk9, and only after that resync parity1.
February 8, 20188 yr Author Ok, will try that now. Any thoughts as to how I lost disk 9/10? I'm getting errors everywhere now though. Feb 8 06:43:00 Tower kernel: md: disk7 read error, sector=14552776 That can't be good? Over a million of them. On Disk 6/7/8. Rebuild is still going, says it will take 11 hours now. I'm just mind boggled as to what's going on here it feels like things are getting worse not better. Edited February 8, 20188 yr by tential
February 8, 20188 yr Author tower-diagnostics-20180208-0706.zip Didn't realize I could pull that then. Here ya go.
February 8, 20188 yr Community Expert Good news is that SMART for disks 6, 7 and 8 looks OK, probably a cable issue, they are likely sharing the same miniSAS cable on the LSI, it can also be a controller/controller port/power issue. You can try canceling the rebuild and replacing the miniSAS cable, or swap ports with the other one, to try and rule things out, then restart the rebuild.
February 8, 20188 yr Author Ok I'm trying that now. With the amount of times power is being mentioned and considering I'm at the power limit of 15 drives/CX500 W PSU, should I just order a new PSU now? Looks like I'm screwed on that side either way. Especially since I was going to upgrade to a Xeon/more GPUs this year too.
February 8, 20188 yr Author tower-diagnostics-20180208-0734.zip Same story. A bunch of errors at the start
February 8, 20188 yr Community Expert Errors stayed with the disks, did you swap the cable or HBA port?
February 8, 20188 yr Author Just now, johnnie.black said: Errors stayed with the disks, did you swap the cable or HBA port? I swapped the cables attached to ports on the LSI Card around. I didn't swap the SATA cables attached to the HDDs around, I simply replugged them.
February 8, 20188 yr Community Expert OK, this rules out the port, problem still might be the miniSAS cable.
February 8, 20188 yr Seriously, I'd go for a new power supply. I know from experience that inadequate or failing power supplies can be responsible for some very obscure and frustrating fault conditions. My personal favourites are the Corsair AX and RMx series but there are other decent brands. You need a single +12 volt rail and I'd go for a power rating in the region of 650 to 750 watts, depending on your future expansion plans.
Archived
This topic is now archived and is closed to further replies.