July 11, 20232 yr Hi Folks, I'm scratching my head with this one. Every few hours I lose complete network connection and the systems needs a force restart to fix the issue. I have not made any hardware changes and don't really recall if it started happening after updating to 6.12.2 or before. Any help will be much appreciated. Edited August 8, 20232 yr by Lightman32 Removed diagnostic files
July 11, 20232 yr Try switching to IPv4 only, or you can update to v6.12.3-rc2 to test, need to switch to the next branch to see the update, in any case, and if the issue continues, enable the syslog server and post that after a crash.
July 11, 20232 yr Author Thanks for the help, the network protocol is already set to IPv4 only, unless there is another option somewhere that I'm missing? I have moved up to v6.12.3-rc2 and also enabled syslog. I will test further and report back.
August 8, 20232 yr Author Right, so still having the same issue. I have turned syslog and notice the following 1. Several AER errors. I have tried fixing them with adding the pcie_aspm=off flag to the sysconf file. The errors are gone but I'm not sure if this flag has actually solved the issue or just suppressed the errors ? Aug 3 10:05:58 Tower kernel: pcieport 0000:00:1c.2: AER: Multiple Corrected error received: 0000:04:00.0 Aug 3 10:05:58 Tower kernel: pci 0000:04:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Transmitter ID) Aug 3 10:05:58 Tower kernel: pci 0000:04:00.0: device [168c:003e] error status/mask=00001001/00006000 Aug 3 10:05:58 Tower kernel: pci 0000:04:00.0: [ 0] RxErr Aug 3 10:05:58 Tower kernel: pci 0000:04:00.0: [12] Timeout 2. This could be due to illegal access to memory ? I'm not too sure about this one. I will add the following flag (pci=nommconfto) to see if this solve these errors. Aug 5 18:06:37 Tower kernel: general protection fault, probably for non-canonical address 0xe7ff88816ac91020: 0000 [#3] PREEMPT SMP PTI Aug 5 18:06:37 Tower kernel: CPU: 4 PID: 19704 Comm: shfs Tainted: P D O 6.1.38-Unraid #2 Aug 5 18:06:37 Tower kernel: Hardware name: System manufacturer System Product Name/STRIX Z270E GAMING, BIOS 1009 07/23/2017 Aug 5 18:06:37 Tower kernel: RIP: 0010:get_free_stripe+0x3c/0x8e [md_mod] Aug 5 18:06:37 Tower kernel: Code: 8b 8f d0 04 00 00 48 8d 97 d0 04 00 00 31 c0 48 39 d1 74 63 53 48 8b 9f d0 04 00 00 83 3d 85 68 00 00 03 48 8b 13 48 8b 43 08 <48> 89 42 08 48 89 10 48 89 1b 48 89 5b 08 7e 10 48 8b 73 20 48 c7 Aug 5 18:06:37 Tower kernel: RSP: 0018:ffffc9000bf13910 EFLAGS: 00010093 There are some other errors similar to above. All of this makes think I might be having hardware issues ? loose connection somewhere maybe ? Edited August 8, 20232 yr by Lightman32 Removed diagnostic files
August 8, 20232 yr Solution 32 minutes ago, Lightman32 said: The errors are gone but I'm not sure if this flag has actually solved the issue or just suppressed the errors ? It solves them, to suppress it's a different flag. 33 minutes ago, Lightman32 said: This could be due to illegal access to memory ? I'm not too sure about this one. I will add the following flag (pci=nommconfto) to see if this solve these errors. That look more like a hardware caused issue, run memtest.
August 8, 20232 yr Author Thanks for the help, running memtest was a good shout, It produced errors within first 5 minutes. The only thing I could think was either a bad ram or XMP profile which I disabled and ran the full test again and all the tests passed without XMP so that's what I'm going with for now. Once again, appreciate your help. 7 hours ago, JorgeB said: It solves them, to suppress it's a different flag. That look more like a hardware caused issue, run memtest.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.