Jump to content

First Parity Check Slow After Hardware Swap


Recommended Posts

Posted

I changed hardware about 2 weeks ago and this is the first parity check since. At around the start of the parity check one of the disks gave a hot disk warning (which never happened on my old hardware). I did get warm disk warnings when I first changed hardware and I adjusted the fan curves in the bios and did not get any warnings since. I didn't think much of these warnings since by the time I woke up the warning was gone. 

 

I logged in today and saw the parity check was only at 19% it typically finishes within 24 hours. I took a quick look at the logs and saw a lot of errors that I can't make sense of.  Could this be an issue with the new hardware? Or is this all due to the warm disk warning I got when the parity check first started? I'm not sure how to proceed or what my next steps should be, diag attached. Thanks in advance for any help.

edi-diagnostics-20240602-1204.zip

Posted

I've tried to both pause and cancel the parity check with no success. My server seems to be a in half way state where it shows the option to pause or stop the parity check but at the bottom it no longer says the parity check is running. I've also attached a new diag that hopefully shows why the parity check didn't pause or cancel.

 

image.thumb.png.573eaa690059609a0ff189f2a24818a5.png

edi-diagnostics-20240602-1448.zip

Posted
5 hours ago, JorgeB said:

Log is being spammed with what look like BIOS related issues, look for a BIOS updated, reboot and post new diags after the parity check starts.

I had updated the bios when I swapped everything over to the new hardware but upon checking looks like Asus just released a new bios May 31st. I’ll get that one installed and see if the problems persist.

Posted
13 hours ago, JorgeB said:

Log is being spammed with what look like BIOS related issues, look for a BIOS updated, reboot and post new diags after the parity check starts.

BIOS update did not seem to do anything. I also forgot that bios updates wipe the current bios config so I tried my best to match things up as best I could. I'll need to go back in later and readjust my fan curves again but that's an issue for a later date.

 

I'm still seeing quite a few ACPI warnings and errors in the logs, not sure what I"m missing here. The parity check does seem to be running at its normal speed... for now anyway.

 

image.thumb.png.b86a67bdeda69b60b5f52ef07da4bb81.png 

edi-diagnostics-20240603-1746.zip

Posted
9 hours ago, SamuraiMarv said:

I'm still seeing quite a few ACPI warnings and errors in the logs,

Try booting in safe mode, they can also be plugin related, or try removing the Nvidia GPU.

Posted (edited)
On 6/4/2024 at 3:30 AM, JorgeB said:

Try booting in safe mode, they can also be plugin related, or try removing the Nvidia GPU.

I believe I've narrowed it down to the Nvidia driver plugin. I'm not sure why this is now all of a sudden causing an issue. Is there a bios setting or something I'm missing for this to work properly? Going to try reinstalling the plugin and see if the issue returns, the only docker container utilizing the gpu is plex.

Edited by SamuraiMarv
spelling
Posted

I was wrong it wasn't the Nvidia driver it was the gpu statistics plugin. I removed that and haven't seen the error since. Going to reboot and run a no fix parity check just to see if it completes. Also it was weird those errors would only show up in the logs when I was on the "Dashboard" screen not sure why.

Posted

Disregard again, when starting Plex up the ACPI Bios Errors flood the log again. Seems to be the same issue as this user but it doesn't seem like they ever found a solution. Going to try delete the Plex codecs like the last post suggests.

Posted

Still wrong, it seems there is either something wrong with the Nvidia plugin or my GPU. This is the same GPU I used in my last build before swapping the motherboard and cpu out so not sure why it's having issues here. Anytime I either run plex or access anything that requires the gpu like going into the Nvidia driver setup page the errors fill up the logs. Which also explains why going to the dashboard page made the errors show since the gpu statistics plugin talks to the Nvidia driver plugin to do that. Unfortunately the Nvidia driver plugin is required for docker usage of the GPU. Not sure what to do here. I'll go over my bios settings again to see if there's something I'm overlooking.

Posted

For now I've uninstalled the Nvidia driver plugin and trying to do a clean parity test for some piece of mind. I did see this in the logs though.

Jun  5 19:12:51 EDI kernel: pcieport 0000:00:1c.4: AER: Corrected error message received from 0000:09:00.0
Jun  5 19:12:51 EDI kernel: pcieport 0000:09:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Jun  5 19:12:51 EDI kernel: pcieport 0000:09:00.0:   device [8086:1136] error status/mask=00001180/00002000
Jun  5 19:12:51 EDI kernel: pcieport 0000:09:00.0:    [ 7] BadDLLP               
Jun  5 19:12:51 EDI kernel: pcieport 0000:09:00.0:    [ 8] Rollover              
Jun  5 19:12:51 EDI kernel: pcieport 0000:09:00.0:    [12] Timeout    

Not sure what this means parity check is still going strong with 0 errors found. I'll let that finish over night before trying anymore troubleshooting to get the Nvidia plugin to work. I'm guessing since that is a PCI error there's something wrong with one of my devices. New diag attached.

edi-diagnostics-20240605-2028.zip

Posted
8 hours ago, JorgeB said:

I don't think this is the same issue I'm having. They re saying that their logs are filled with the errors I've had 2 of those errors in the last 24 hours. So seems like a seperate issue, is there a way I can use this to tell which PCI device its complaining about? Or is there a bios setting that needs to be changed?

Posted

That can still help with the error, if it doesn't there's also an option to try and suppress them.

 

12 minutes ago, SamuraiMarv said:

is there a way I can use this to tell which PCI device its complaining about?

It's 

09:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)

 

Posted
1 hour ago, JorgeB said:

That can still help with the error, if it doesn't there's also an option to try and suppress them.

 

It's 

09:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)

 

Hmm I'm not using any thunderbolt devices do you think I should just disabled thunderbolt support in the bios?

Posted

Circling back to the ACPI errors, I reinstalled the Nvidia plugin and the errors are back. Unfortunately I need this plugin for GPU transcoding. But for the life of me I can't figure out what's different with my new system that's causing these issues. Doing a search on the forum I see others with the Asus Hero line of boards having similar issues with no real consensus as to a true solution. Most like me are on the latest bios etc, my old motherboard was an Asus board as well albeit an older one. I've combed through the bios with no luck as to what might be the root cause of the errors. There also seems to be lots of reports of the same issues in the Nvidia plugin support thread with no real solutions there either besides updating bios which doesn't seem to work for anyone.

Posted

I ended up just giving up and removing the GPU. My 14900k can transcode better than the 1080ti I was trying to use anyway. I"m still seeing the AER errors about twice a day for now I think I'll just ignore those too as its not filling up the logs.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...