February 20, 20242 yr Dear Unraid Users This weekend I updated my Mainboard, CPU, NVME and RAM. After the update I had some problems with my HBA, which did first not see the drives. I had to disable fast booting, after that the box booted up and all was fine. 2 Hours later the server got instantly out - puff. I checked all wires and everything, unplugged everything and plugged it in again. After that it was working. Now after severall hours the webui got unresponsive. I ssh'd into the server and all looked fine. So I rebooted the box - and this night again the same. I attacked the diagnostics - since I changed a lot of hardware - I have no idea where to beginn. Every help is appreciated - Thank you! igor-diagnostics-20240220-0535.zip
February 20, 20242 yr Community Expert Enable the syslog server and post that after a crash, but if the issues started after changing the hardware there's a good chance it's a hardware problem, and those usually don't leave anything logged.
February 20, 20242 yr Author It is enabled - anything I can do to check? Now it runs, all docker up but the Docker page wont load... it is like nothing in patricular but everything isnt working good Edit: Feb 20 18:32:37 Igor kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:07:00.0 Feb 20 18:32:37 Igor kernel: atlantic 0000:07:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Feb 20 18:32:37 Igor kernel: atlantic 0000:07:00.0: device [1d6a:07b1] error status/mask=00000001/0000a000 My log is spammed with that. I think atlantic is my 10gbit nic from asus. I added the section of this threat: Edited February 20, 20242 yr by gandalf15
February 20, 20242 yr Community Expert 10 minutes ago, gandalf15 said: I added the section of this threat If that didn't help try this one, to see if it can at least suppress the errors,: https://forums.unraid.net/topic/111161-pcie-errors/?do=findComment&comment=1013378
February 20, 20242 yr Author Thank you I will try So far no errors in the log - but I dont know if it runs stable. Also I saw that the boot log said to check the flash drive (boot drive) for errors. Did a ckdsk on it and it corrected / restored like 5 files. I will report back if it still crashes and or freezes
February 23, 20242 yr Author It runs stable now - but the parity sincy reported 166 errors. Could be sync it crashed a few times I thought. But new parity sync started jsut after the first reports again 15 Errors. I never had an error before. I guess something is still not right. Anyone some tips what to do?
February 24, 20242 yr Community Expert If a parity check found new errors right after a correcting check, it suggests there's still an issue, I would start by running memtest
February 24, 20242 yr Author 2 hours ago, JorgeB said: If a parity check found new errors right after a correcting check, it suggests there's still an issue, I would start by running memtest This is exactly what I did. With XMP 2 / 1 I got FAILED. I deactived it and the test PASSED. I guess the new mainboard and ram dont like each other with XMP. Since it passed I did a parity check again - once it finishes I will rerun one and see if there are still errors. Maybe this was the reason for the crashes. Edit: A full Parity check takes 2.5 days - I will report back once done! Edited February 24, 20242 yr by gandalf15
February 24, 20242 yr Community Expert 33 minutes ago, gandalf15 said: Maybe this was the reason for the crashes. Most likely.
February 29, 20242 yr Author Ok so without XMP the first parity check corrected antoher 150ish errors. Another one after that showed 0 errors. it is stable now and runs very well. Thank you for the support!
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.