Jump to content

Parity device disabled


Recommended Posts

Why cant we have useful error messages? I got a message parity device disabled. No other meaningful information given and it refuses to re-enable. Assuming a parity error is detected why is the correct thing to do is to disable parity instead of re-doing parity? Honestly very frustrating.

 

Running SMART reveals no errors and no CRC

 

I tried creating a new configuration. Copying the prior config. Still refuses to re-enable parity.

 

image.png.a97e6839057b8a31df782492f96dac03.png

 

image.thumb.png.668ff96229c8e5c70fdb41e4f970f01a.png

Edited by xokia
Link to comment

Have you followed the procedure that is documented here in the online documentation accessible via the Manual link at the bottom of the Unraid GUI.  In addition every forum page has a DOCS link at the top and a Documentation link at the bottom.  If you think the drive is fine you can rebuild it to itself.

Link to comment
6 hours ago, itimpi said:

Have you followed the procedure that is documented here in the online documentation accessible via the Manual link at the bottom of the Unraid GUI.  In addition every forum page has a DOCS link at the top and a Documentation link at the bottom.  If you think the drive is fine you can rebuild it to itself.

Yes I read it, it still doesnt explain to me why it believes my parity failed. Why must one go digging for the error? You flash a notification that parity failed. Why not include the reason unraid believes the drive failed into the error message you are flashing on the screen? 

 

For me unassigning and reassigning the parity drive did absolutely nothing it still refused to rebuild parity. I could not force the parity drive back into the array. It would always say drive disabled. I had to take the drive offline clear the drive then reattach the drive and parity finally restarted. I changed nothing on the system. Non of the drives are showing any sign of a drive failure. SMART check shows nothing on any of the drives. Reads/Write appear to be working fine at normal speeds of 200MB/s.

 

Also unclear to me why disabling the parity drive is the correct solution for a parity error. Would you not want to try and restart parity if you detected an error? If you detect some error threshold then disable a drive? I'm not seeing what benefit off lining a drive solved.

Edited by xokia
Link to comment

All you would get is an indication that a write failed (which is what leads to a disk being disabled), not why that write failed.   Nothing more would be available without digging into logs.

 

4 minutes ago, xokia said:

For me unassigning and reassigning the parity drive did absolutely nothing it still refused to rebuild parity



did you start the array after unassigning the array to commit the drive as missing, and then stop it again to reassign the drive before starting it again to rebuild parity? 

Link to comment
27 minutes ago, itimpi said:

All you would get is an indication that a write failed (which is what leads to a disk being disabled), not why that write failed.   Nothing more would be available without digging into logs.

 



did you start the array after unassigning the array to commit the drive as missing, and then stop it again to reassign the drive before starting it again to rebuild parity? 

Why not include the information in the log in the error message is the question? Why must the user dig into the logs? The OS created the error message the OS knows the information. Give the user a meaningful message (I know this is easier said then done).  I would think some error threshold had to be met before off lining a drive. One write fails offline a drive is the solution? How does that help me? I would think multiple attempts would be tried so the OS is sure that its dealing with a real fault. It seems to easy to kick a drive out of the array. Maybe some routine that pauses the array on error to prevent further writes, does some basic self test. If it detects additional faults offline a drive. Give the user the results of that self diagnostic? Restore array with drive disabled to keep uptime. IDK but seems like some improvements could be made here.

Yes I tried enabling array with drive deselected then stop the array again and re-add the "disabled" drive it still stated disabled. Only after clearing the drive could I get it back into the array.

 

Edited by xokia
Link to comment
23 minutes ago, xokia said:

The OS created the error message the OS knows the information

But it does not!    All the OS knows is that the write failed.  It does not understand why, only that all attempts at a retry have failed.   All that could be added is an explicit statement that a write has failed.  This might add some value as some users think that the red 'x' means a drive has failed at the hardware level, not that a write to it failed and could not be recovered via retries?   You still need to dig into the logs to try and determine why.

Link to comment
39 minutes ago, itimpi said:

But it does not!    All the OS knows is that the write failed.  It does not understand why, only that all attempts at a retry have failed.   All that could be added is an explicit statement that a write has failed.  This might add some value as some users think that the red 'x' means a drive has failed at the hardware level, not that a write to it failed and could not be recovered via retries?   You still need to dig into the logs to try and determine why.

ummmmm.......... if its in the log file did the OS not write out the error to the log file? OR maybe what you are inferring here is the underlining Linux kernel is writing the error message. And all "unraid" knows is some error occurred. I would think some sub routine would be useful here. Pause array, self diagnostic, output results of diagnostic determine if offlining array is appropriate and then give some message with the results of that diagnostic. 

Edited by xokia
Link to comment

Any error messages in the logs are generated deep within the Linux kernel well below the Unraid I/O level.   They can also vary significantly so without looking both at them, and various other files that are in the diagnostics it is hard to determine the true cause as there can be many things that can cause similar symptoms and the difference in the logs can be minimal.   Even after looking at all the information available in the diagnostics one can often only say something went wrong and give a recommended cause of action to pin down the underlying cause.   The only thing that is definitive is that a write to the drive went wrong so parity is now out-of-step with the data drives.

Link to comment
  • 3 weeks later...
Posted (edited)

ok so this is the second time parity has failed with unraid. Nothing apparent was wrong no error indicators. What I noticed was when I went to go write to one of my drives it said it was locked and writing was not possible. So I reset the system. It reboots and says parity disabled. 1 day prior it ran a parity check and everything was good. In fact it still says parity valid. Howver parity is obviously not valid since the parity disc is disabled. Smart check doesn't show anything out of the ordinary. There should have been no writes to any of these drives when the failure happened as everything is cached on a 1TB SSD and only writes to the drives when 60% full.

 

image.png.cf78f9c22cb814ddf89fe0cd37087fee.png

 

image.png.c42650666bfe5bb4eb9e42d9e874abcf.png

ST20000NM007D-20240312-2049.txt

Edited by xokia
Link to comment

Since parity is disabled, last check would have been a read-check only so I agree it should not say parity is valid.

 

You might want to consider installing the Parity Check Tuning plugin. It has been designed to make parity checks less intrusive when you have large disks to check so that checks can take a long time. Even if you do not use its other features it will also enhance Parity History entries to give more information such as why the check was run and what type of check it was.

Link to comment
17 minutes ago, itimpi said:

Since parity is disabled, last check would have been a read-check only so I agree it should not say parity is valid.

 

You might want to consider installing the Parity Check Tuning plugin. It has been designed to make parity checks less intrusive when you have large disks to check so that checks can take a long time. Even if you do not use its other features it will also enhance Parity History entries to give more information such as why the check was run and what type of check it was.

I actually have that already installed

Link to comment
Posted (edited)

I believe the parity disk is attached to an IO crest 582 board if that matters. I ordered a JMB585 with heatsink. Not sure that matters. Should be here Thursday, its something to try I guess since I am getting random failures. Its always the parity drive that fails. These are relatively new 20TB seagate exos drives (around 6 months old)

Edited by xokia
Link to comment

OK at that point parity WAS valid I think (unless I have introduced a bug into the plugin so it is not recognising read-checks) and that is what the message you showed is saying, and parity has been disabled since then.  Why is not clear.
 

 Since you reset the system we have no log from when the drives was disabled unless you happen to have the syslog server enabled with the mirror to flash option set.

Link to comment
1 minute ago, itimpi said:

 Since you reset the system we have no log from when the drives was disabled unless you happen to have the syslog server enabled with the mirror to flash option set.

No unfortunately you folks only have that set so it can only go to the OS drive and that tends to corrupt the OS drive so I no longer do that. It would be awesome if you folks enabled that so you could send the syslog to a separate USB drive. I would leave it always set.

Link to comment
4 hours ago, xokia said:

you folks

Who is this 'you folks'?   I am just an Unraid user.

 

You could set it to write to a share that is on a pool instead.    Might not catch the initial start-up sequence but should still help with crashes.   I notice that the drop-down for locations has a 'Custom' option but unfortunately it does not look like that is selectable.   Seems a bit pointless having it if it is not.

 

 

Link to comment
Posted (edited)
4 hours ago, itimpi said:

Who is this 'you folks'?   I am just an Unraid user.

 

You could set it to write to a share that is on a pool instead.    Might not catch the initial start-up sequence but should still help with crashes.   I notice that the drop-down for locations has a 'Custom' option but unfortunately it does not look like that is selectable.   Seems a bit pointless having it if it is not.

 

 

Sorry since you are an unraid "moderator" assumed maybe incorrectly you were part of the unraid team.

 

I am rebuilding the parity. This issue isnt something that shows up in a day or two. Tha last parity failure was almost a month ago so I'd need to leave syslog running. I haven't found a reliable way to do that long term.

Edited by xokia
Link to comment
42 minutes ago, xokia said:

I haven't found a reliable way to do that long ter

 

I wonder if manually editing the config/rsyslog.cfg file on the flash drive to have a path something like /mnt/disks/SyslogServer for the local folder to use a device that is handled by the Unassigned Devices plugin will work - I think I will try testing it out.   Of course even if it does it will not be officially supported at the moment and it will then mean you cannot view/alter the setting via the GUI but might be a viable short-term solution.

Link to comment
5 hours ago, itimpi said:

 

I wonder if manually editing the config/rsyslog.cfg file on the flash drive to have a path something like /mnt/disks/SyslogServer for the local folder to use a device that is handled by the Unassigned Devices plugin will work - I think I will try testing it out.   Of course even if it does it will not be officially supported at the moment and it will then mean you cannot view/alter the setting via the GUI but might be a viable short-term solution.

I'd be fine with any solution that helped debug.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...