Large Disk Failure Help


tential

Recommended Posts

 

32 minutes ago, johnnie.black said:

You can do that If you don't have another cable.

tower-diagnostics-20180208-0840.zip

 

Same issue

 

@John_M

https://www.newegg.com/Product/Product.aspx?item=N82E16817438029

 

This is about to be ordered at this rate since I dunno what else to do.  I figure that should manage a Xeon + a flexible GPU options.

Been eyeing this upgrade anyway so I'll need a new PSU to get there when I do Upgrade.  

Edited by tential
Link to comment
  • Replies 208
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Don't know what more to tell you, call baba ji for some black magic help, it doens't make sense that the issue persists on the same disks using a different HBA port and a different miniSAS cable, power supply should be upgraded but it also doesn't explain the problem remaining on the same disks, unless there's a problem on those power connectors.

Link to comment

Should I bother trying to let it rebuild the Disk 10 fully?  

I guess next step is the new PSU then.  Ah, I can't believe I had everything running smoothly too!

Oh well, I just want to get to a stable place again.

 

Alright, I'm going to order the new PSU then.  If there are any other ideas for me to do please let me know.  

Link to comment

Alright thanks a lot for the help!  When that comes in I'll update ya and we'll see what I got!

Edit : Jesus, EVERYTHING is sold out if it has a high power rating.  It's all "Hey, want a bunch of stuff for mining?" as the addons.  Tried to switch to a newer Evga model that had "only" 1000W but was the G3 model, but so much of stock sold out.  Looks like EVGA has the best ratings, so keeping my fingers crossed it's a power problem.  Please.....

Edited by tential
Link to comment

I don't really understand PSUs, Rails, and how to ensure I have wired everything off the right rail.  Is there any video I can watch/advice to ensure I use my new PSU optimally?

 

https://www.amazon.com/EVGA-SuperNOVA-PLATINUM-Crossfire-220-P2-1200-X1/dp/B00KYK1CKI

 

That's what I ordered.  I know it's overkill but I saw this:

 

And I also wanted to be able to upgrade to a GTX 2070-80ti potentially, or two.  I'm hoping that's a large enough PSU for any of my needs.  Should I be grouping my HDDs and my fans separately and stuff like that?  

 

If it's not the PSU and I still have errors, what happens next to me?  My PSU comes Tuesday.

Edited by tential
Link to comment

Psu came, I hooked everything up, flipped the psu on and it sparked, and the breaker flipped I guess since everything is off near the unit. 

 

Rma the psu I'm guessing? I didn't turn on the pc only the psu, and the psu was still on the off position on it as it has off, economy, and on. Could that have helped at all? 

 

Jesus, trying to fix this just potentially screwed up all my hardware. This has been a whirlwind of no fun. 

Link to comment
1 minute ago, jonathanm said:

Did you, by any chance, happen to have a modular power supply already? If so, did you replace ALL the cables with the ones for the new unit? They probably aren't interchangeable.

 

Great care is needed if reusing modular cables - if two brands uses different wiring, lots of magic smoke can be released. And no one has yet managed to find a way to let the magic smoke back into the chips. It's an advantage to decide on a favorite brand to get multiple sets of interchangeable cables.

Link to comment
5 minutes ago, jonathanm said:

Disconnect the PSU from all but the wall power and turn it on again. If it shorts out with nothing hooked up, definitely RMA time.

 

Did you, by any chance, happen to have a modular power supply already? If so, did you replace ALL the cables with the ones for the new unit? They probably aren't interchangeable.

 

My last power supply was not modular.  However, I've made that mistake before, and I know to not mix and match power supply cables.  Destroyed my last SSD like that.  

 

Will try that again.

Link to comment
11 minutes ago, jonathanm said:

Disconnect the PSU from all but the wall power and turn it on again. If it shorts out with nothing hooked up, definitely RMA time.

 

Did you, by any chance, happen to have a modular power supply already? If so, did you replace ALL the cables with the ones for the new unit? They probably aren't interchangeable.

 

Hooked it up with everything unplugged and with my main pc off, nothing happened.

 

Should I continue? 

Link to comment
16 minutes ago, tential said:

 

Hooked it up with everything unplugged and with my main pc off, nothing happened.

 

Should I continue? 

Without knowing your technical background, knowledge and pain tolerance, I can't in all good conscience tell you to hook it back up without more tests, including a PSU tester. It's likely something was hooked up wrong or a hot wire was touching the case somewhere to cause the spark and breaker trip, but troubleshooting that kind of thing remotely is not easy.

 

Sorry.

Link to comment
2 minutes ago, jonathanm said:

Without knowing your technical background, knowledge and pain tolerance, I can't in all good conscience tell you to hook it back up without more tests, including a PSU tester. It's likely something was hooked up wrong or a hot wire was touching the case somewhere to cause the spark and breaker trip, but troubleshooting that kind of thing remotely is not easy.

 

Sorry.

 

Understandable,

I put in a return request already with Amazon just in case.  It's through a third party seller sadly so waiting to hear back.

 

Well, that was climatic in the complete wrong way.

 

Except for the PSU not working, everything else about it looks pretty nice.  Hopefully I don't have to wait too long for a replacement.  

 

Seems safest to just try to replace it instead of getting it to work, and potentially having it fail later down the line when I add dual CPUs/GPUs.

Link to comment
  • 2 weeks later...

Only errors I see are from the already known failing disk:

 

Feb 28 08:02:43 Tower kernel: ata9: softreset failed (1st FIS failed)
Feb 28 08:02:43 Tower kernel: ata9: softreset failed (1st FIS failed)
Feb 28 08:02:43 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 28 08:02:43 Tower kernel: ata9.00: qc timeout (cmd 0x27)
Feb 28 08:02:43 Tower kernel: ata9.00: failed to read native max address (err_mask=0x4)
Feb 28 08:02:43 Tower kernel: ata9.00: HPA support seems broken, skipping HPA handling
Feb 28 08:02:43 Tower kernel: ata9: softreset failed (1st FIS failed)
Feb 28 08:02:43 Tower kernel: ata9: softreset failed (device not ready)
Feb 28 08:02:43 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 28 08:02:43 Tower kernel: ata9.00: NCQ Send/Recv Log not supported
Feb 28 08:02:43 Tower kernel: ata9.00: ATA-9: ST5000DM000-1FK178,             W4J1LSZ0, CC49, max UDMA/133
Feb 28 08:02:43 Tower kernel: ata9.00: 9767541168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
Feb 28 08:02:43 Tower kernel: ata9.00: NCQ Send/Recv Log not supported
Feb 28 08:02:43 Tower kernel: ata9.00: configured for UDMA/133

 

Link to comment
45 minutes ago, tential said:

Ok, and you don't think it's a power/plug related I guess since it's been multiple times.

That disk and another you have have pending sectors and need to be replaced, already told you that, new power supply won't fix failing disks.

 

Let me reread the thread so I remember where we were.

Link to comment
29 minutes ago, johnnie.black said:

That disk and another you have have pending sectors and need to be replaced, already told you that, new power supply won't fix failing disks.

 

Let me reread the thread so I remember where we were.

Ah sorry to repeat it!  So surprised that during the process of this thread those 2 drives failed.  Oh well though, will wait patiently to figure out what the next step is!

Link to comment
1 hour ago, tential said:

Should I start by trying to rebuild Desk 10 first or is there a way I can save Disk 9 by adding back in another Parity first and then removing Disk 9 and then Disk 10?

Since disk10 is disable and currently there's only one valid parity it's not possible to sync the other without errors, so might as well rebuild disk10 first, and if this time all goes well then rebuild disk9 (you can sync parity at the same time if you want)

Link to comment
1 minute ago, tential said:

So surprised that during the process of this thread those 2 drives failed.

They might have failed because of power issues, but it could also be bad luck, impossible to say for sure, W4J1LSZ0 is currently unassigned (it was disk10 IIRC) and disk9 is also failing, hopefully with the new power supply the rebuild will now go OK.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.