Parity errors and Crashes after 6.11.5 upgrade.


JKY
Go to solution Solved by JKY,

Recommended Posts

At the start of last week, I upgraded the Unraid OS from 6.10.3 to 6.11.5. 

 

Other the next few days I did some extra tasks. I added a second cache pool with an NVME drive. 
I added a VM running Win10 with a GTX1060, which pointed to the NVME drive. 
I changed my RAM from the default 2100mhz to XMP native 3600mhz speeds. 

 

My parity check runs on the last Friday of every month and Saturday morning, my server was on but the UI was inaccessible. I rebooted and started to run the parity check. A few hours later, it had frozen up again. I another try and another crash, I noticed that you would see from the dashboard there was an issue, as after an hour a few cores would lock to 100% and the time left of the check would freeze. 

 

I read over the forums and saw someone had an issue kind of like this in the past. It was determined there was a hardware issue… so I started rolling back and trying a parity check after each step. As you can guess this took all weekend. 

 

Blew away VM, still crashed on parity check.
Removed NMVE Cache pool, still crashed on parity check.
Rolled back XMP, still crashed on parity check.
Lastly… rolls back the OS. This seems to have worked, I have about three hours to go until it is complete. 

 

Anyway, it would seem as if the OS was the issue, but is there a way I check the points of failure? 

 

The parity check says there are 90 sync errors, I unticked the box to correct them because I don’t know what they are. Can untick correction by default?

Link to comment
  • 2 weeks later...

So, I did a parity correcting check, then a another parity check which found no errors. I then added the NVME cache back in and did yet another check. Everything was good. 

However last night 4-5 of my 8 cores were at 100% there system was slow. I tried to spin down the array and it would not let me, I tried to reboot via the UI and again it would not let me. Terminal sudo shutdown, didn't response... I had no idea what this was stuck on. 

If I hook a monitor up I can see it was doing a checksum... fine, I left it running all night. This morning it has spinlock on. I took some snaps on my phone camera and but I cant doing anything like scroll up to follow the output. The GUI will no longer loads from a browser, I have no idea what is going on... can anyone help advise?

There is a lot of HEX but some of the lines of code include.

 

TASK
queued spin lock,
list lru,

inode iru list del,

task work run 

do syscall 64

RIP

RAP

RAX

RDX

RBP

R10

R13

/TASK

Link to comment
  • Solution
On 12/10/2022 at 10:34 AM, JorgeB said:

See if you get the syslog at least

cp /var/log/syslog /boot/syslog.txt

 

 

 

EDIT: I think its the XMP profile. I have two sticks of 8gb 3600mhz. I just set my BIOS to default which puts it back to 2100mhz. Had no issues in the past 4-5 hours. I set XMP profile when I tried to add a Windows Gaming VM, back when all the issues started. Looking around the forums, I can see other people have had issues with XMP. I will give it a few days and report back.

- Regards. 

Edited by JKY
update on issue
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.