Jump to content

(SOLVED) How should I proceed? - Two drives failed during a network transfer, crashed unraid, caused unclean shutdown.


Go to solution Solved by JorgeB,

Recommended Posts

What is the best practice on how to proceed?

 

During a network transfer two drives reported errors.   Enough errors the array seemed to crash entirely, along with it most of the Unraid gui/web interface.  I tried to stop the array and reboot from the gui but it was mostly unresponsive.  I used the terminal to try to reboot but it didn't work. Shutdown did work, however it was instant and the shutdown was unclean.  Upon boot up the two degraded disks are being emulated but once I start the array a parity check/sync will start automatically.

 

I have two parity drives.  I have replacement drives I can swap in.  Should I allow the automatic parity sync operation proceed with the emulated drives or should I swap in one new drive at a time, allowing parity to rebuild my drives? 

 

My concerns are the integrity of my parity and the what multiple parity sync / drive rebuild operations could do to other drives.  And of course, the proper order of operations to restore my array to working order.

 

Cabling has been checked and verified to be working without issue.  The two drives in question are reporting s.m.a.r.t errors.

 

I have my system powered off for now while I wait for input.  I can grab diagnostics if its absolutely necessary however, I'm pretty sure I just lucked out and suffered two hard drive failures at the same time.

 

After reading through some documentation and forum posts I haven't found guidance for this specific scenario.

 

Please advise

Thank you.

 

 

Link to comment

I think you should post the diagnostics so we can check what the actual state is.   You do not mention what sort of SMART errors you were getting so we have no idea if they are ones that indicate a likely drive failure or not.   Posting diagnostics would allow us to see the SMART reports for all drives.

 

If you have two drives being emulated and two parity drives then it will not actually be doing a parity sync - just a read check of the remaining drives so probably not a lot of point in continuing it.

Link to comment
Posted (edited)

Diagnostics attached.

 

Further information:

 

During the network transfer, reconstruct write was enabled. Data was only being written to either disk 7 or disk 8, obviously the other disks were being used to calculate parity and read errors occurred on disk 3 and/or 6.  Either way disks 3 and 6 are now disabled but only disk 6 has SMART errors (that are concerning). 

 

disk 4 and cache 2 have some UDMA CRC error count due to a loose connection.

unraid-diagnostics-20240513-0536.zip

Edited by username34793
grammar
Link to comment

Both disks dropping offline at the same time suggests a power/connection issue, see if they share something, like a power splitter.

 

Then and since the emulated disks are mounting, and assuming contents look correct, you can rebuild on top.

 

 

Link to comment
Posted (edited)

I will re-verify the power and data connections but previous checks have confirmed everything is/was fine as far as my eyes can see. 

 

I have an LSI 9206-16e with 1 connection feeding 2 sas parity drives and 2 connections feeding the inputs on an IBM ServeRAID 16-Port SAS-2 expander (46M0997 Firmware 634A). disks 1-4 are on the first sas expander output and disks 5-8 are on the second sas expander output.  How likely is it for 2 drives on 2 different ports of the sas expander to exhibit read and write errors due to a power issue that is related to the sas expander, while the 6 other drives on said sas expander ports did not have any issues?

 

I'm definitely not trying to come off as rude. I apologize if it reads that way.  To the best of my capabilities all connections are and were properly seated and all cabling perfectly intact as far as my eyes are capable of seeing.  It seems like at least one other drive should have had some kind of error if there was a bad connection.

 

I appreciate the assistance and input thus far.  I will check everything again and go from there.  I do have reservations when it comes to rebuilding onto a drive with uncorrected errors so I will probably at minimum replace one drive once I can re-confirm all connections are good to go.

 

Thank you so far!

Edited by username34793
grammar
Link to comment

All drives are powered individually. 

 

Specifically for my array drives I made sure to run them from the PSU cable to a 1 in 4 out sata power splitter adapter.  So in the case of drives 1-4, the power is coming from 1 peripheral/sata port on the PSU to a sata power splitter cable.  Drives 5-8 are powered from a separate PSU port to a separate sata power splitter... and so on.

 

So from my point of view, a power issue or connection issue at the sas expander should be affecting more than 2 drives.

A power/connection issue at the PSU level should affect 4 drives, same goes for at the sata splitter.  Which leaves the obvious culprit of the individual sata power connectors for drive 3 and drive 6 respectively.  Which of course could very well be the issue.  Unfortunately I can only rely on visual and tactile inspection for these cables and the individual ports on the drives themselves. I don't have or know of a sata or similar power continuity testing device and I am not well versed in using a multi meter or similar tools... perhaps I should be...

 

The best I have is my eyes and hands for this one but this system has been deployed for a quite a while now undisturbed and in regular use.  Of course now, I will methodically go through each connection as I prepare to swap out drive 6 and maybe drive 3.

Link to comment

If you mean the splitters are SATA->4xSATA then they are often not able to handle the current for more that 2 drives without voltage sags which can cause intermittent drive issues.  Molex->4xSATA splitters are normally fine with 4 drives.

 

Link to comment
Posted (edited)

A picture of the sata power splitter I am using for the array drives is attached.  Seems to be exactly what I should not have used...

 

At minimum I should reevaluate how I am powering my drives.  I was under the impression molex to sata was a bad idea or something like that... Not to mention the lack of molex cables for my power supply.

 

Any recommendations for proper extensions/splitters to accommodate 12+ total hdds in the array while taking into consideration ssds and fans and everything else?

 

PSU is a Seasonic VERTEX GX-1200 with 5 peripheral - SATA/molex connections.  I can make use of the PCIe power connections if that is possible and not ill-advised.

 

 

61tmqKq3wVL._AC_SL1500_.jpg

17-320-023-04.png

Edited by username34793
smaller images
Link to comment
Posted (edited)
27 minutes ago, JorgeB said:

Yep.

I appreciate your candor.  Any thoughts or suggestions to accomplish proper powering of 13 hard drives using reliable extensions and splitters within the constraints of 5 available PSU power connections?

Edited by username34793
correct number of hdds
Link to comment

I would try to see if it's possible to get separate modular cables to that PSU, that's what I did for most of my servers, I use Corsair semi-modular PSUs, and then got additional 4 x SATA cables, so I can directly connect 12 devices for smaller servers, and 16 for larger ones, still use a few splitters some times, but just split one SATA port into two, never more than that for HDDs.

 

P.S. modular cables are not standard, even for PSUs from the same brand, they can be different cables and fry your disks if you try to use the wrong ones.

Link to comment

Normally the modular cables for power supplies come with multiple connectors on the one cable so you do not always need splitters.   If you think you need to use SATA->SATA splitters avoid splitting them more than 2 ways.

Link to comment
Posted (edited)

Thank y'all for the feedback regarding this conundrum.  I think I have a few possible ways forward regarding proper power distribution to the hdds. 

 

Just to clarify, typically, how many drives in total can be powered by each PSU connection for peripherals/SATA taking into consideration all potential splitters and connectors already being used on the PSU supplied power cables?  In my case my SATA power cables already have 4 connectors on them.  My PSU has 5 connections for SATA and/or molex cables.

 

As far as the disabled drives are concerned, the best way to sort this out would be to verify the emulated content is correct and simply rebuild on top or in this case probably swap the drive with SMART errors.  Is this correct? 

 

I expect it would be most wise to handle one rebuild on top/swap at a time.  Any thoughts on that?

117-320-023-04.png

Edited by username34793
clarification/s
Link to comment
  • Solution
11 hours ago, username34793 said:

In my case my SATA power cables already have 4 connectors on them.  My PSU has 5 connections for SATA and/or molex cables.

That allows you to connect 20 devices without using splitters, if you need more split the main SATA plugs into two max, so if you really needed 40 devices using 20 splitters, but you should avoid as many splitters as possible.

 

 

11 hours ago, username34793 said:

I expect it would be most wise to handle one rebuild on top/swap at a time.  Any thoughts on that?

Since both are already disabled I would rebuild both at the same time

Link to comment
  • username34793 changed the title to (SOLVED) How should I proceed? - Two drives failed during a network transfer, crashed unraid, caused unclean shutdown.

I replaced the sata power cables for all drives with custom cables for my power supply and case, which should cover proper power distribution without overloading the cables/psu. I replaced both disabled drives one at a time.  After both data rebuilds and a parity sync for a goof there were and still are no errors.  I'll consider this problem solved and likely due to bad power distribution to the drives. 

 

Thanks for the advise and insight.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...