tential Posted February 7, 2018 Author Share Posted February 7, 2018 (edited) Wow, that's surprising that after all this time, now I lose 2 drives. The thing is, Drive 9 still shows up in unraid as available for use. What should I do then since if I try to rebuild my Parity, it goes super slow? The only thing that makes me slightly worried is the 2 connectors connecting Drive 9/10 are at the very end of my line of power connectors for HDDs. Previously, they were connected to the out of use drives. When I recabled, I used those 2 connectors as there were 2 dedicated connectors for drives 9/10 with no splitters there before, and those were easier to move away for my SSDs. Oh well, if I lose drives I lose drives, what's my next step here? Edit: Maybe I should recable and remove the Marvel Controller out of the equation? Edited February 7, 2018 by tential Quote Link to comment
JorgeB Posted February 7, 2018 Share Posted February 7, 2018 You should try to rebuild disk10, that's the worst one, there will be be some corruptions because of disk9 but it's the best option you have. When that is done replace disk9, and only after that resync parity1. Bad/failing power can damage disks, so best to make sure all is good before proceeding, also like I mentioned 500W is IMO on the low side for 15 disks, I don't use more than 12 disks on 500W, for 14/15 disks I use 550W. Quote Link to comment
tential Posted February 7, 2018 Author Share Posted February 7, 2018 (edited) 31 minutes ago, johnnie.black said: You should try to rebuild disk10, that's the worst one, there will be be some corruptions because of disk9 but it's the best option you have. When that is done replace disk9, and only after that resync parity1. Bad/failing power can damage disks, so best to make sure all is good before proceeding, also like I mentioned 500W is IMO on the low side for 15 disks, I don't use more than 12 disks on 500W, for 14/15 disks I use 550W. Maybe it's a power issue, it only started when I tried to use all 15 of my drives at once. I came across this thread here where the error I'm having went away when the person upgraded their PSU. https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous If I can't rebuild due to the process being very very slow, should I try upgrading my PSU next then? tower-diagnostics-20180207-0935.zip Latest Diagnostic. Edited February 7, 2018 by tential Quote Link to comment
JorgeB Posted February 7, 2018 Share Posted February 7, 2018 36 minutes ago, tential said: should I try upgrading my PSU next then? I would, or disconnect the unassigned disks you currently have connected. Quote Link to comment
tential Posted February 7, 2018 Author Share Posted February 7, 2018 1 minute ago, johnnie.black said: I would, or disconnect the unassigned disks you currently have connected. Just was having the SAME thought, and was going to ask. Alright next step! I'm going to enjoy watching an episode or two first of TV before I try. I'm going to keep the two power cables that were connected to ATA9/10 disconnected (The way they were before/connected to unassigned drives) and use different power cables in their place as well as disconnect the unassigned disks. Thanks, wish me luck! Quote Link to comment
tential Posted February 7, 2018 Author Share Posted February 7, 2018 (edited) tower-diagnostics-20180207-1404.zip That's my latest diagnostic now. What's the damage here? Should I next swap the ATA9/ATA10 data cables with my SSD to rule that out? That way those 2 drives are now connected to my MOBO, and the 2 SSDs I can move to the Marvel controller to isolate it and see if I am having trouble with the controller or those 2 drives specifically. Edit: I'm guessing that diagnostic won't be very god, I started the array and am now getting a ton of read errors on Disk 7/8. I think I'm making this worse. I'm not sure why, Transmission is starting off ON, too, even though autostart is off. I'm sure that's not helping either when that turns on at the beginning each time. Edited February 7, 2018 by tential Quote Link to comment
JorgeB Posted February 7, 2018 Share Posted February 7, 2018 The latest diags don't show any issues, but they only cover a couple of minutes and the array didn't even start yet. Quote Link to comment
SSD Posted February 8, 2018 Share Posted February 8, 2018 I would mention that most drives will lock the drive heads after a relatively short period of time being idle. There is a SMART attribute called load cycle count (LCC) that tells you how many times the heads have locked, and I think you'll find it is a pretty big number. This head locking helps prevent the heads from slamming into the disk surface and causing damage. Quote Link to comment
tential Posted February 8, 2018 Author Share Posted February 8, 2018 Ok, well I'm hoping it isn't bad. My last array start had unraid hard lockup. I've never had that happen before. Tons of errors spit out to me, which I couldn't get out of the diagnostic tower-diagnostics-20180208-0604.zip I'm just going to wait for ya'll to give me a clear set of what I need to do next, so I don't mess this up anymore. I feel like at this point, I don't even know how to get myself running stably again. Maybe I need that new PSU I don't know I'm sad. Quote Link to comment
JorgeB Posted February 8, 2018 Share Posted February 8, 2018 Nothing on the diags are they from after rebooting? Quote Link to comment
tential Posted February 8, 2018 Author Share Posted February 8, 2018 4 minutes ago, johnnie.black said: Nothing on the diags are they from after rebooting? Yes, I couldn't get anything from unraid before. It was just locked up. I've never had that happen to me. I couldn't cancel parity or anything. I shutdown and the PC stayed on all night and never was able to shut off although the server was inaccessible. I couldn't pull a diagnostic sadly, and I eventually had to hard power off. This is after the reboot now, I didn't start the array since I got a ton of read errors like a million, then the parity check was still going, but was going to take 30 days. Not sure what to do here, this is now after the reboot. Sorry to have made this so massively confusing. Quote Link to comment
JorgeB Posted February 8, 2018 Share Posted February 8, 2018 There are no syslog errors but you have the known failing disks 9 and 10 still assigned, were you trying to sync parity1? If so and like I said that won't work, you should try to rebuild disk10 to a new disk, that's the worst one, there will be be some corruptions because of disk9 but it's the best option you have, when that is done replace disk9, and only after that resync parity1. Quote Link to comment
tential Posted February 8, 2018 Author Share Posted February 8, 2018 (edited) Ok, will try that now. Any thoughts as to how I lost disk 9/10? I'm getting errors everywhere now though. Feb 8 06:43:00 Tower kernel: md: disk7 read error, sector=14552776 That can't be good? Over a million of them. On Disk 6/7/8. Rebuild is still going, says it will take 11 hours now. I'm just mind boggled as to what's going on here it feels like things are getting worse not better. Edited February 8, 2018 by tential Quote Link to comment
JorgeB Posted February 8, 2018 Share Posted February 8, 2018 Post current diags Quote Link to comment
tential Posted February 8, 2018 Author Share Posted February 8, 2018 tower-diagnostics-20180208-0706.zip Didn't realize I could pull that then. Here ya go. Quote Link to comment
JorgeB Posted February 8, 2018 Share Posted February 8, 2018 Good news is that SMART for disks 6, 7 and 8 looks OK, probably a cable issue, they are likely sharing the same miniSAS cable on the LSI, it can also be a controller/controller port/power issue. You can try canceling the rebuild and replacing the miniSAS cable, or swap ports with the other one, to try and rule things out, then restart the rebuild. Quote Link to comment
tential Posted February 8, 2018 Author Share Posted February 8, 2018 Ok I'm trying that now. With the amount of times power is being mentioned and considering I'm at the power limit of 15 drives/CX500 W PSU, should I just order a new PSU now? Looks like I'm screwed on that side either way. Especially since I was going to upgrade to a Xeon/more GPUs this year too. Quote Link to comment
tential Posted February 8, 2018 Author Share Posted February 8, 2018 tower-diagnostics-20180208-0734.zip Same story. A bunch of errors at the start Quote Link to comment
JorgeB Posted February 8, 2018 Share Posted February 8, 2018 Errors stayed with the disks, did you swap the cable or HBA port? Quote Link to comment
tential Posted February 8, 2018 Author Share Posted February 8, 2018 Just now, johnnie.black said: Errors stayed with the disks, did you swap the cable or HBA port? I swapped the cables attached to ports on the LSI Card around. I didn't swap the SATA cables attached to the HDDs around, I simply replugged them. Quote Link to comment
JorgeB Posted February 8, 2018 Share Posted February 8, 2018 OK, this rules out the port, problem still might be the miniSAS cable. Quote Link to comment
tential Posted February 8, 2018 Author Share Posted February 8, 2018 Should I flip the Sata cables around attached to the miniSAS cable? Quote Link to comment
JorgeB Posted February 8, 2018 Share Posted February 8, 2018 You can do that If you don't have another cable. Quote Link to comment
John_M Posted February 8, 2018 Share Posted February 8, 2018 Seriously, I'd go for a new power supply. I know from experience that inadequate or failing power supplies can be responsible for some very obscure and frustrating fault conditions. My personal favourites are the Corsair AX and RMx series but there are other decent brands. You need a single +12 volt rail and I'd go for a power rating in the region of 650 to 750 watts, depending on your future expansion plans. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.