Jump to content

6.11.5 Crash During Parity Check


Recommended Posts

I have been dealing with intermittent crashes and slowly trying different suggestions from various threads to fix it. Here is what I have tried so far:

- Changing home network (moved house)

- Safe mode with GUI

- Upgrade to 6.11.5 from 6.9 and 6.10

- Enable a fix for 11th Gen iGPUs

- Disabled XMP

 

Last night I had a crash occur during a parity check and when I rebooted and checked the machine this morning there were 170,000 errors on the parity check. I read on the forum that for a similar issue to run again with correcting, so I did and got 270,000 errors. I then did a second one without correcting and it is still getting errors. Through research I have since found I probably need to rebuild my parity drives but I think through not knowing enough about the issue and the situation I have likely overwritten my data with the invalid parity check as a fair portion of media library is now unplayable among other signs of corruption.

 

Is this assumption correct and I am SOL for the data on the array? I tried to get a new diagnostics but the CPU is at 100% and won't respond to anything other than navigating the UI. I have attached the syslog I managed to get from the flash. Any suggestions on what I should do to get back to a functioning state? Even if that means the data is lost.

 

Also, is there a way to get a list of file names on each of the shares? Anything important is recoverable elsewhere but I want to make sure I remember to put everything back on if it is all reset and to make not of the damaged data.  Did this on my desktop using the shares and dir command

syslog

Edited by AngryPig
Link to comment

A parity check never writes to data drives - it only attempts to read from them so in that sense it does not touch your data.    However if you run a correcting parity check when one or more data drives are playing up then you can end up having invalid parity so that you cannot recover without data loss from a data drive then failing

 

You should post your system's diagnostics zip file so we can get a better idea of the current state of your system.

Link to comment
4 minutes ago, itimpi said:

A parity check never writes to data drives - it only attempts to read from them so in that sense it does not touch your data.    However if you run a correcting parity check when one or more data drives are playing up then you can end up having invalid parity so that you cannot recover without data loss from a data drive then failing

 

You should post your system's diagnostics zip file so we can get a better idea of the current state of your system.

 

I have attached the diagnostics, had to get access to it again since it crashed the GUI so went into safe mode manually.

 

As for the parity check, it does a parity check on array start and it had the correction option ticked so I assumed it did it by default (the 170k errors)? Then I manually ran again with corrections and got the 270k.

warham-nas-diagnostics-20230628-0048.zip

Link to comment

On a slightly different issue - did you manually start a parity check just before taking the diagnostics?    An unclean shutdown was detected which would have an automatic parity check initiated but (unless things have changed) that should be non-correcting.   The only parity check I see in the diagnostics is correcting and if this was the automatic one it may be a bug that it is a correcting check :( 

Link to comment
1 hour ago, JorgeB said:

Multiple seeming unrelated call traces, I would start by running memtest.

 

I tried rebooting to memtest using the flash and it failed (keep doing a boot loop). My motherboard has memtest in-built though so I am running it now

 

Edit: have looked into the boot loop issue whilst memtest runs, most likely issue is that UNRAID's memtest is legacy and I probably have my BIOS as UEFI

 

1 hour ago, itimpi said:

On a slightly different issue - did you manually start a parity check just before taking the diagnostics?    An unclean shutdown was detected which would have an automatic parity check initiated but (unless things have changed) that should be non-correcting.   The only parity check I see in the diagnostics is correcting and if this was the automatic one it may be a bug that it is a correcting check :( 

 

My last actions before taking the diagnostics were (this is during the 100% CPU usage and GUI not responding properly)

- Change flash to reboot to safe mode no plugins

- Accessed flash share to copy syslog (attached in OP)

- Try reboot from GUI, failed

- Held power button to turn off and then turn back on

- Log in to GUI

- Start the array and click cancel on the parity check 

- Accessed the shares on Windows to take a list of files

- Downloaded the diagnostics from GUI

 

I also have webhook setup to send to my Discord for UNRAID but I do not see the 170k parity check, just the 270k so not sure why I only got pinged for some of the checks

Edited by AngryPig
Link to comment
2 hours ago, AngryPig said:

Edit: have looked into the boot loop issue whilst memtest runs, most likely issue is that UNRAID's memtest is legacy and I probably have my BIOS as UEFI

If you are booting in UEFI mode then the normal recommendation is to download the latest version from memtest86.com which is UEFI compatible (but for licencing reasons cannot be included in the Unraid distribution).  I suspect that is more thorough than will be built into the BIOS of your motherboard.

Link to comment
8 minutes ago, itimpi said:

If you are booting in UEFI mode then the normal recommendation is to download the latest version from memtest86.com which is UEFI compatible (but for licencing reasons cannot be included in the Unraid distribution).  I suspect that is more thorough than will be built into the BIOS of your motherboard.

I'm pretty sure this is the same one; it's integrated into my BIOS - https://rog.asus.com/motherboards/rog-strix/rog-strix-z590-a-gaming-wifi-ii-model/

IMG20230628025414.jpg

Edited by AngryPig
Link to comment
12 minutes ago, JorgeB said:

Since memtest is only definite when there are errors and you have two sticks of RAM try with just one, if the same try the other one, that will basically rule out RAM issues.

 

What's the benefit of doing individual sticks if it passed with both of them?

 

And after ruling out RAM, what would be the best way to try get the system back to a normal operating state?

Link to comment
On 6/28/2023 at 4:58 AM, JorgeB said:

If I wasn't clear I wasn't saying to run memtest individually, but use the server with just one stick and see if you see the same issues.

 

Ah gotcha, I was thinking you meant memtest not the server. Are you saying switching to single stick "may" solve the parity errors or the crashing?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...