Jump to content

parity drive got disabled


Go to solution Solved by JorgeB,

Recommended Posts

so a few days ago I ran into a problem with my unraid server and was thankfully able to get it sorted out with the help of those of you on here esp gorge but now a day later Im running into even worse issuses. The problem started with my system no longer showing any content and had to do a reboot to fix the problem. I was also have a issuse with one of my hard drives at the time being disabled, so I went out and bought a new drive and was in the process of rebuilding when a second hard drive failed, and then one of the parity drives failed while rebuilding. this sounds really bad and could use any help you guy have to offer, Ill post both logs below so you can see what changed in just a couple days. 1844 one is the most recent.

tower-diagnostics-20230429-1844.zip tower-diagnostics-20230426-1226.zip

Link to comment
6 hours ago, JorgeB said:

You are having what look like power/connection issues with multiple disks, this could most likely be a PSU or cable problem, check replace/cables and/or use a new PSU if available and post new diags after array start.

so I was able to get the unraid to boot back up but this time the two drives that were disabled were no longer showing up as even plugged in so I attempted to do as you said and shut it down and check the power and cables but upon trying to shut down it froze and wouldn't go any further. I let it run for around 6 hours as not wanting to interrupt it with another unclean shut down but to no avail, the hard drives stayed spun up the entire time and the cpu bounced back and forth from 60 to 80 percent usage, I eventually went over and hit the power button and watched as the system tried a forced shut down, another hour later and it still had failed to shut down so I held the button until it finally shut down. I know I know, this probably made this even worse, though I felt like there wasn't much choice at this point at it was no longer responding to any input and was simply spin its wheels at full speed most likely causing damage in its own right. I have the logs from before I tried to shut down and during the shut down when it was still responsive, as well as a picture of the screen as it tried to forcibly shut down. I fear the worse with this atm but trying to stay hopefully. any help would be greatly appreciated. 

my system consist of a dellr710, net app ds4246, and Cyberpower ups

20230430_101216[1].jpg

20230430_101057[1].jpg

tower-diagnostics-20230430-0419.zip tower-diagnostics-20230430-1005.zip

Link to comment
15 hours ago, JorgeB said:

The screenshot shows that you need to check filesystem on disk28, but before that post new diags after a fresh boot and array start.

sorry it took so long, I spent most the night checking the connection and making sure the nic card didn't come unseated as I read on another forum that was a known issue, I didn't have any spare power supply's but I did switch them out with another dell r710 I have running and it seems they are good. I took a diag but am hestiant to shut the server down again and try to rebuild to two drives it dropped till I hear back from you so will just let it run for a bit and see how it goes till then. 

 

 

Capture1.PNG

tower-diagnostics-20230501-1852.zip

Link to comment

Looks like emulated disk28 is mounted now. I assume (hope) that parity check was non-correcting. Is disk28 one of those unassigned? Which one? And the other was parity2?

 

SMART report for both looks OK though neither have had any self-tests.

 

Somewhat safer to rebuild to a spare if you have one and keep the original as is.

 

Do you have backups of anything important and irreplaceable?

 

Link to comment
25 minutes ago, trurl said:

Looks like emulated disk28 is mounted now. I assume (hope) that parity check was non-correcting. Is disk28 one of those unassigned? Which one? And the other was parity2?

 

SMART report for both looks OK though neither have had any self-tests.

 

Somewhat safer to rebuild to a spare if you have one and keep the original as is.

 

Do you have backups of anything important and irreplaceable?

 

Im not 100% sure which one of the unmounted was the disk 28, and yes the other was the parity2 drive, unfortunately I don't have any spare drives of equal size but if you think it would be better to wait to rebuild, I will order one as I don't have any backups to the data on the unraid to answer your other question. Lots of family videos and pictures that aren't replaceable but I understand when it comes to computers some things cant be avoided. Should I run a self test on the drives?

Link to comment

oh I just noticed one of the unmounted drives is marked xfs and one is not, just like all the drives above except the parity drives so this may help tell them apart as well as one can be mounted and one can not be so looks like the drive 11b2dao 2mhd91jb is the parity one

Edited by AC-Gamer
Link to comment
11 hours ago, AC-Gamer said:

don't have any backups to the data on the unraid to answer your other question. Lots of family videos and pictures that aren't replaceable

Parity is NOT a substitute for backups. Parity contains NONE of your data.

https://wiki.unraid.net/Manual/Overview#Parity-Protected_Array

 

Parity just allows you to keep going when a disk fails (or just gets kicked out of the array when a failed write makes it out of sync) and recover the data for that disk.

 

Plenty of more common ways to lose data including user error. You must always have another copy of anything important and irreplaceable.

 

How much data do you need to backup?

Link to comment
51 minutes ago, trurl said:

Parity is NOT a substitute for backups. Parity contains NONE of your data.

https://wiki.unraid.net/Manual/Overview#Parity-Protected_Array

 

Parity just allows you to keep going when a disk fails (or just gets kicked out of the array when a failed write makes it out of sync) and recover the data for that disk.

 

Plenty of more common ways to lose data including user error. You must always have another copy of anything important and irreplaceable.

 

How much data do you need to backup?

unfortunately quite a bit as most the server is filled with such, should I try to rebuild the two drives and see what happens? starting to get the feeling you think the information on those drives will be lost at this point which if it happens is my own fault for not building a backup as you said. I just though the raid would protect me from such but it sounds like this may be a hard lesson. At the beginning of all this I thought I would lose all the information on the server with how bad this spun out of control, if it turns out I lose only the information on one drive while that will hurt a great deal I would count myself lucky at this point if I can save the rest. although if there is a better way and one that will save it all I am for that solution. I know you mention buying another disk to rebuild there or should I rebuild with what I have? or begin coping all the information into a backup before proceeding? Can the array handle that at this point? Ill wait till I hear back from you before taking any action. 

Edited by AC-Gamer
Link to comment

If you rebuild the data disk to a spare, then you will still have the original data disk with its contents in case of problems with rebuild.

 

Doesn't matter whether parity2 is rebuilt to its same disk since it has no data anyway.

 

You could backup the data from the emulated disk by copying it off the array, but that will just make all other disks work since all disks are read to emulate a disabled disk.

 

Some people react to a failed disk by copying it to other disks in the array. Not only would that read all disks to get the emulated data, it would write to other disks in the already compromised array, including parity updates. (Or, even worse, they move instead of copy. Moves deletes the data from the source, and, of course, deletes are also write operations that update parity.) Thus just digging a deeper hole when rebuild is what is really needed.

 

Many of us don't have complete backups. Many of our files aren't important and irreplaceable enough. You get to decide what qualifies.

 

While you wait on a spare disk to rebuild disk28, you could rebuild parity2 (make sure you use the correct disk, the one that isn't xfs). That would be a good test that all is working well and won't change anything on other disks.

 

 

Link to comment
3 hours ago, trurl said:

If you rebuild the data disk to a spare, then you will still have the original data disk with its contents in case of problems with rebuild.

 

Doesn't matter whether parity2 is rebuilt to its same disk since it has no data anyway.

 

You could backup the data from the emulated disk by copying it off the array, but that will just make all other disks work since all disks are read to emulate a disabled disk.

 

Some people react to a failed disk by copying it to other disks in the array. Not only would that read all disks to get the emulated data, it would write to other disks in the already compromised array, including parity updates. (Or, even worse, they move instead of copy. Moves deletes the data from the source, and, of course, deletes are also write operations that update parity.) Thus just digging a deeper hole when rebuild is what is really needed.

 

Many of us don't have complete backups. Many of our files aren't important and irreplaceable enough. You get to decide what qualifies.

 

While you wait on a spare disk to rebuild disk28, you could rebuild parity2 (make sure you use the correct disk, the one that isn't xfs). That would be a good test that all is working well and won't change anything on other disks.

 

 

Thank you for explaining it to me so clearly, this has been a stressful process at the thought of losing so much but you've made it bearable with all your help I cant thank you enough, I know I'm far from done and things might still go badly but wanted to just express that I appreciate the help thus far. I've taken your advice and started the data rebuild on the parity 2 drive while I wait for a new drive to come in. Here's hoping things go well. Ill post again when it finishes and the new drive comes in. 

Capture3.PNG

Link to comment
10 hours ago, JorgeB said:

Disk 25 dropped offline, looks more like a power/connection issue but since it dropped there's no SMART.

I have another dell r710 I use for a different server, I pulled out the power supply from that one and tried it last time and they both seemed to work although I didn't have any spare cables to try so will order some and give that a go when the parity is finished, although the issues seems to only affect the disk 25 as this is the 3rd or 4th time the disk has been unmounted. I had planned to replace it but hadn't gotten around to it. Not sure if that's more a power issues or a cabling problem I tend to lean on cabling at this point but by no means even close to an expert. Any idea what type of cabling I need to order for these just to be safe, like a better brand then others? 

Link to comment
14 minutes ago, itimpi said:

You mention you think the power is fine, but do you use power splitter cables as you have a lot of drives?  They can be problematics sometimes, especially if you are trying to hang too many drives off a single cable from the PSU.

I wasn't sure what power splitter were so had to look them up, but to answer your question I do not use power splitter cables, I'm sure there are reasons for them but not sure why I would want to split power from one plug to multiple psu's just sounds like a bad idea. I did however order two more psu just in case.---> ( https://www.ebay.com/itm/323843716543?hash=item4b6696f5bf:g:u3YAAOSw5ZRdEjNu&amdata=enc%3AAQAIAAAA8Nyn55mUcGJ%2FHfugfwi70LM2raLFjntScbHq6wfAaLYCtxQslt4kw9Ql6PzZe0zx6F%2B6FkJ23X8FcuQ4CTX047A1osAEXeJQJ10aRAU%2F9iFRT2XB99%2Bx9oUG7WXlAuQjBZS9IUxW5QYothleeVnNnyYKNNhC%2FIB1NU%2BGSGo5iiNEpTMtE3VzAMy49zit6uQdqHfLZoKhC%2FDV0a6%2BBZkb5AI03G4yDbB4ZrJ5uyvPs2X8RYVI9q2MgY4TiEH4UQmzYdmFc4wtMlxB2JXvUZKCqOSXvPfBICI7sTaH69j9q8%2BztG3Jb5pEQEfPI0AVjLCrCw%3D%3D|tkp%3ABFBMjP3N1vxh ) The disk shelf came with only two originally and had a spot for four total so figure this might help if its a power issues. The server is still rebuilding the parity so not much new I can post as to that, and the drives come in Monday so will give an update as soon as I can. went ahead and added a more recent diag just in case, but nothing much has happened since the parity rebuild started. 

 

 

tower-diagnostics-20230504-0147.zip

Link to comment
1 minute ago, AC-Gamer said:

I wasn't sure what power splitter were so had to look them up, but to answer your question I do not use power splitter cables, I'm sure there are reasons for them but not sure why I would want to split power from one plug to multiple psu's just sounds like a bad idea

Not sure where you found a reference to multiple PSUs. The term power splitter cable typically refers to when you add it to a single cable from the PSU so you can attach more drives to that cable. 

Link to comment
2 minutes ago, itimpi said:

Not sure where you found a reference to multiple PSUs. The term power splitter cable typically refers to when you add it to a single cable from the PSU so you can attach more drives to that cable. 

Now that you say that I have seen things like that on videos but no I haven't installed any thing like that in this system, the stuff I was seeing when I searched for power splitter cables were things like this ( https://www.newegg.com/black-c2g-6-ft-cable-connectors/p/N82E16812196290 ) which is something completely different then what you were referring to so my apologies for the misunderstanding there. 

Link to comment
14 minutes ago, AC-Gamer said:

Now that you say that I have seen things like that on videos but no I haven't installed any thing like that in this system, the stuff I was seeing when I searched for power splitter cables were things like this ( https://www.newegg.com/black-c2g-6-ft-cable-connectors/p/N82E16812196290 ) which is something completely different then what you were referring to so my apologies for the misunderstanding there. 

Well at least that means you do not have the sort of power splitter problem I was thinking of :)   Just for interest how is power supplied to that many drives - are they in some sort of enclosure where the power is applied via the backplane?

Link to comment
1 hour ago, itimpi said:

Well at least that means you do not have the sort of power splitter problem I was thinking of :)   Just for interest how is power supplied to that many drives - are they in some sort of enclosure where the power is applied via the backplane?

my system consist of a dellr710, net app ds4246, and Cyberpower ups, The net app has a capacity of 24 drives plus the 6 the dell r710 has, they are attached via a sfp cable. I'm only using about half the available slots though of the netapp at the moment. they have a sata backplane 3.5 hard drive enclosure on the netapp. I ordered two more power supplies to this netapp as it original only came with 2 and had space for four. I'm not sure if this will make any difference but since power or cabling could be the problem I figured it was worth a try. the hard drives and pdus don't get here till Monday and the parity is still finishing so till then I'm left hoping. oh I linked a picture of my setup above, I do apologize for the messy cable management ahead of time.  

Link to comment
1 hour ago, AC-Gamer said:

I ordered two more power supplies to this netapp as it original only came with 2 and had space for four.

As long as the power supplies are good enough to handle the drives then I doubt that adding extra ones is going to help (you did not mention the PSU ratings) so you may not want to spend the money on the additional PSU's .   The netapp is normally aimed at production environments where uptime is crucial and the cost of extra PSU's is a marginal cost.

Link to comment
On 5/4/2023 at 4:44 AM, itimpi said:

As long as the power supplies are good enough to handle the drives then I doubt that adding extra ones is going to help (you did not mention the PSU ratings) so you may not want to spend the money on the additional PSU's .   The netapp is normally aimed at production environments where uptime is crucial and the cost of extra PSU's is a marginal cost.

Just got home from work and the parity rebuild had finished, so I took a snip of the screen to ask about the insane amount of errors it found and what I could do to fix this problem since it seems to be persisting, and ran a diagnostic of the system, when the diag was done I switch back to the main menu to find that disk 25 had dropped again and become unmounted for the like 7th time now, but the parity is fine for the moment. I ran another diag after the disk dropped so they both will be down below as well as the screen shots. I am at a lose as what to do next. I have the replacement drives still on the way and the pdu's as well even though you said its doubtful they will be needed they were relatively cheap. any help would be appreciated, as I feel like I'm going in circles here and at any moment I could lose it all. 

Capture5.PNG

Capture6.PNG

tower-diagnostics-20230505-1913.zip tower-diagnostics-20230505-1916.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...