System completely unresponsive (several times) and disk related issuesses wich may or maynot be related.

Pjhal · February 12, 2022

My system froze/hanged several times requiring a forced shutdown. (completely unresponsive to everything (no web-ui, no ssh, no smb shares) but clearly still powered on). I did ping the system but i think that also got me nothing,(not 100% sure its been a few days).

Quote

2022-01-31 18:21:12 Warning Silverstone kern kernel Code: ff 48 8b 15 ef 6a 00 00 89 c0 48 8d 04 c2 48 8b 10 48 85 d2 74 80 48 81 ea 98 00 00 00 48 85 d2 0f 84 70 ff ff ff 8a 44 24 46 <38> 42 46 74 09 48 8b 92 98 00 00 00 eb d9 48 8b 4a 20 48 8b 42 28

2022-01-31 18:21:12 Warning Silverstone kern kernel RIP: 0010:nf_nat_setup_info+0x129/0x6aa [nf_nat]

2022-01-31 18:21:12 Warning Silverstone kern kernel Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U, BIOS P3.30 11/04/2019

2022-01-31 18:21:12 Warning Silverstone kern kernel CPU: 9 PID: 15782 Comm: python3 Tainted: G D W 5.10.28-Unraid #1

2022-01-31 18:21:12 Warning Silverstone kern kernel general protection fault, probably for non-canonical address 0xa52fb99018bdb8aa: 0000 [#2] SMP NOPTI

The full local logs are lost but the quote above is a part from the log as capture by my log server on a synology. (i tried to format it, hope its readable). This to me seemed like it might be the issue i wanted to include the full logs a s captured over the network by a synology nas.

But i think the external version of the log file is too big at 50 MB. Upload fails

I did fix the cache file system (XFS) several time because of the dirty shutdowns, i used this guide by Spaceinvader One on Youtube.

And did a parity check on the array.

Now the external log made it look like it might have been a memory issue to a layman like myself so i ran a memtest.

I ran the built in memtest on my 2*16GB ECC udims. It ran for well over a 150 hours, partly because i didn't have the time or energy to keep troubleshooting and ''fixing'' the system so i just left it to its own devices.

I forgot to take a picture of the screen but trust me it was a lot of passes with zero errors!

On a side note somewhere in the mids of all of this i replaced 2 * 8TB (WD white label chucked) disc 1 data and 1 parity with 2 18Tb discs (WD white label chucked).

And sinds then the s.m.ar.t part of the webGUI stopt working, the disc do pas the checks but it doesn't display the data in the Gui properly. I did find a post on the forum about this but none of the fixes worked for me. ( changing the Default SMART controller type, etc i tried all of them)

All new discs pass have recently passed short and extended smart at least 2 times, plus the preclear script has been used using the binhex-preclear docker image.

Edit: i have added a cut down version (removed older entries) of the externally captured log file: All_2022-2-12-21 28 6 - Copy.csv

Logs of the system as it is ''now''":

silverstone-diagnostics-20220212-2021_anon.zip

Edited February 12, 2022 by Pjhal

Squid · February 12, 2022

To get the obvious out of the way,

Pjhal · February 12, 2022

16 minutes ago, Squid said:

To get the obvious out of the way,

Thx for your response, I will try the bios options.

I should note though that it was never an issue until recently and i have had this server running Unraid sinds 2019.

Also the post suggests updating the BIOS but i cannot update the the bios on this system.

I updated to a P version this was a issues fix/beta bios because of issues with the ipmi kvm and apparently you cannot upgrade from that version.

And the motherboard has been end of life for a while now.

The memory part of that post also doesn't apply to me, seeing as it passed such a long memtest.

My hardware btw:

ASRockRack X470D4U
American Megatrends Inc., Version P3.30
BIOS dated: Monday, 2019-11-04

CPU: AMD Ryzen 7 3700X 8-Core @ 3600 MHz

my ram is 2 sticks of 16GB ECC udimm and from the supported list of the motherboard.

Edited February 12, 2022 by Pjhal

Pjhal · February 14, 2022

So i used the bios setting that Squid linked too.

Quote

look for "Power Supply Idle Control" (or similar) and set it to "typical current idle"

But i just got this: ''ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen''

see short remote log:

All_2022-2-14-20 11 1.html

full Unraid logs:

silverstone-diagnostics-20220214-2013_anon.zip

The system hasn't crashed (yet) but i am a bit concerned that might do so soon.

edit: this seems like the same issue to me:

6.9.0/6.9.1 - KERNEL PANIC DUE TO NETFILTER (NF_NAT_SETUP_INFO) - DOCKER STATIC IP (MACVLAN)

At least the same one i am seeing in the logs right now, not sure if it is the same as the original issue i have.

I have now deactivate my second physical Ethernet.

How do i need to change my setup to fix the above?

I am using at least 7 different ip adress on dockers, sometimes more. Using the mcvlan feature. (custom br0)

And several using the default bridge.

Edited February 14, 2022 by Pjhal

System completely unresponsive (several times) and disk related issuesses wich may or maynot be related.

Recommended Posts

Pjhal

Link to comment

Squid

Link to comment

Pjhal

Link to comment

Pjhal

Link to comment

Join the conversation