April 26, 20251 yr hi my unraid keeps breaking the gui crashes or some how it cant find any of my shares iv tried everything i can think of when i would open logs it would open and close within a second but from what i say it said no memory from nginx i managed to ssh into it and get it to save the diagnostics tower-diagnostics-20250426-2338.zip i am running a full smart test on all of the drives and will add when it does it .Mem64 tested passed i removed alot of plugins that i were probably causing problems i can switch pcs and with a diffent gpu but not ram to see if thats causing problems normaly after it fails the gui doesn’t work but the docker does mostly still work but i cant restart it (in person or ssh so i have to unclean turn it off all the drives are kinda old i got 20x3tb off ebay and all around 6y old made 2013-14 but planing to add in some ssd in cache the longest iv gotten it to run for is 7days Edited April 27, 20251 yr by dylan21 added more info
April 27, 20251 yr Author js checked logs and found this (after a restart) Apr 27 00:09:20 Tower kernel: pcieport 0000:00:1c.6: AER: Multiple Corrected error message received from 0000:04:00.0 Apr 27 00:09:20 Tower kernel: nvidia 0000:04:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Apr 27 00:09:20 Tower kernel: nvidia 0000:04:00.0: device [10de:21c4] error status/mask=00000001/0000a000 Apr 27 00:09:20 Tower kernel: nvidia 0000:04:00.0: [ 0] RxErr Apr 27 00:09:21 Tower kernel: pcieport 0000:00:1c.6: AER: Corrected error message received from 0000:04:00.0 Apr 27 00:09:21 Tower kernel: nvidia 0000:04:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Apr 27 00:09:21 Tower kernel: nvidia 0000:04:00.0: device [10de:21c4] error status/mask=00000001/0000a000 Apr 27 00:09:21 Tower kernel: nvidia 0000:04:00.0: [ 0] RxErr (First) Apr 27 00:09:27 Tower kernel: pcieport 0000:00:1c.6: AER: Multiple Corrected error message received from 0000:04:00.0 Apr 27 00:09:27 Tower kernel: nvidia 0000:04:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Apr 27 00:09:27 Tower kernel: nvidia 0000:04:00.0: device [10de:21c4] error status/mask=00000001/0000a000 Apr 27 00:09:27 Tower kernel: nvidia 0000:04:00.0: [ 0] RxErr (First)
April 27, 20251 yr Solution For the PCIe errors, try this first: https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/?do=findComment&comment=1165009
April 29, 20251 yr Author thx i have tryed that and switched servers so i now need to wait and see if it throws more errors
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.