July 6, 20241 yr Please help, this is driving me insane. Please note I used to have a lot more frequent crashes when I was running lots of containers ... I clicked scrub in anger and wiped out all my appdata...ive got to rebuild everything :''( I have done the following: attached both the persistent syslog & the anonymised diagnostics. Run memtest for 3 passes and it come back clean...all disks pass SMART test. Noticed the system clock is off (will sort) and also noticed . Made the ipvlan change also from macvlan The only thing i can think is that 3 passes on memtest is not enough OR that the networking setup is causing issues. Just as an FYI my Unraid box is hooked up to pfSense. It has 1 x 1Gb port connecting it. I have 3 x VLANs to manage different subnets. All the subnets, outbound NAT, routing, etc. is handled by pfSense. tower-smart-20240708-0148 (2).zip tower-smart-20240708-0148 (1).zip tower-smart-20240708-0148.zip tower-smart-20240708-0147.zip tower-smart-20240708-0146.zip tower-diagnostics-20240706-1135.zip syslog-1720306751
July 6, 20241 yr Community Expert If you mean that the server is rebooting by itself, that is almost always a hardware problem.
July 6, 20241 yr Author 1 hour ago, JorgeB said: If you mean that the server is rebooting by itself, that is almost always a hardware problem. I am then assuming the most usual culprit is RAM? Then drives? I've not had a hardware issue before and struggling to diagnose then. Got any pointers on common tests to run and order of probability? Thank you
July 7, 20241 yr Community Expert RAM, PSU and board/CPU would be the main suspects, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.
July 16, 20241 yr Author Hey @JorgeB how would you test board/CPU? I have run memtest for over 24h with no errors. I have checked my PSU with a power supply tester - no issues Run SMART report on all drives - no issues The crashes do not occur for anything particular like heavy usage of moving files or stressing the GPU. It occurred even when not running docker containers/just idling. The frequency varies, sometimes once in 24-48h window then others 5 times in a row, then again 6 hours later. Only had these issues after upgrading from UnRaid 5.X -> UnRaid 6.X
July 16, 20241 yr Community Expert 28 minutes ago, toughiv said: how would you test board/CPU? You'd need to swap with a different one.
July 24, 20241 yr Author On 7/16/2024 at 9:12 AM, JorgeB said: You'd need to swap with a different one. Okay so I am still experiencing crashes. I have done: 36 hour memtest -> 0 errors PSU test -> nothing wrong with PSU Swapped CPU for new one Swapped Mobo for new one All that expense for naught - as it crashed again 1 hour after booting up the new motherboard and cpu...
July 24, 20241 yr Community Expert Is the server crashing or still rebooting on its own? Does it still do that if you don't start the VM and docker services?
July 24, 20241 yr Author should i run it in safe mode for a while to see if the restarts occur? and dont turn on VM/Docker Then incrementally bring services back online until crashes occur? Safe mode Not Safe mode (no docker) Not Safe mode /w docker
July 24, 20241 yr Community Expert 1 hour ago, toughiv said: should i run it in safe mode for a while to see if the restarts occur? and dont turn on VM/Docker Then incrementally bring services back online until crashes occur? It's worth a try.
July 30, 20241 yr Author On 7/24/2024 at 11:09 AM, JorgeB said: It's worth a try. So I did the following: Safe Mode /w no gui = 3 days with no crashes, so proceeded to next step GUI Mode with no array = 2 days with no crashes so proceeded to next step GUI Mode with array, but no Docker and VMS = 1 day but then a crash All disks are healthy. What could this be do you think?
July 30, 20241 yr Author 1 hour ago, JorgeB said: Try with docker enabled but VMs disabled. I meant both no docker and no vms. Just idling, with Gui + array started
July 30, 20241 yr Community Expert Not sure I follow, I understood it only crashed with VMs running, not idling.
July 30, 20241 yr Author 5 hours ago, JorgeB said: Not sure I follow, I understood it only crashed with VMs running, not idling. Originally it was crashing with Docker running. I had one VM but it was barely anything and a new addition to the stack. This has been ongoing for quite some time, i ensured the array started automatically and just let it restart & do its thing... However, the crashes seemed to be more frequent lately, to the point where it was becoming a blocker to me doing my stuff. That's when i decided to make this thread and get real serious about trying to diagnose this issue. So after doing the checks mentioned before: 1 ) RAM memtest for 36 hours 2) PSU test 3) SMART Reporting for all drives You then said it may be a CPU/MoBo problem, but the system just halting and restarting more often than not is a hardware issue. So, I went on eBay and spent a couple hundred on those new bits. However, the issue persists. That's now why i run the series of tests: - Running both [Safe Mode] & [GUI w/o Array] didn't cause any crashes. - It seems turning on the array is causing crashes...and the crashes happen more frequently if i run all my docker containers. My gut it telling me to say two things: 1) The array, once started, cannot be stopped - it always says "retry unmounting disk shares" 2) Maybe there is a driver / storage issue somewhere (given crashes happen with the array turned on) However, nothing shows in the Persistent Syslog. Edited July 30, 20241 yr by toughiv
July 30, 20241 yr Community Expert OK, I think I misunderstood your previous post, as VMs were running when it crashed, IMHO, if it crashes with just the array running, without docker and VMs, it's almost certainly hardware, but you should try to figure out what is preventing the array from stopping, even though that is most likely unrelated to the crashing.
July 30, 20241 yr Author 2 minutes ago, JorgeB said: OK, I think I misunderstood your previous post, as VMs were running when it crashed, IMHO, if it crashes with just the array running, without docker and VMs, it's almost certainly hardware, but you should try to figure out what is preventing the array from stopping, even though that is most likely unrelated to the crashing. It has just crashed now and even when booting, it'll crash again and do so a couple times until it manages to stay up and running Given that it'll crash before the array has started in those instances, what do you think that could be? In fact it just happened and i was watching the screen and it crashed when the third party drivers (nvidia plugin) was being installed on boot ... maybe a coincidence. It is booting back up now...let's see if it happens there again. Yep it happened there twice!! Maybe it is the nvidia plugin?
July 30, 20241 yr Author 2 minutes ago, toughiv said: It has just crashed now and even when booting, it'll crash again and do so a couple times until it manages to stay up and running Given that it'll crash before the array has started in those instances, what do you think that could be? In fact it just happened and i was watching the screen and it crashed when the third party drivers (nvidia plugin) was being installed on boot ... maybe a coincidence. It is booting back up now...let's see if it happens there again. Yep it happened there twice!! Maybe it is the nvidia plugin? I just tried to boot into Safe Mode - Gui - No plugins and it still crashed... maybe red herring on the nvidia front
July 30, 20241 yr Author Just now, toughiv said: I just tried to boot into Safe Mode - Gui - No plugins and it still crashed... maybe red herring on the nvidia front it actually still tries to install the nvidia third party plugin even when that safe mode option selected
July 30, 20241 yr Community Expert 44 minutes ago, toughiv said: what do you think that could be? I think it still continues to point to hardware, but still difficult to say which component exactly, since there are several that can cause that, if you have already swapped board and CPU, RAM and PSU would be the main remaining suspects, if you have multiple sticks of RAM, try just one, if the same, try the other one, that will basically rule out the RAM.
July 31, 20241 yr Author 22 hours ago, JorgeB said: I think it still continues to point to hardware, but still difficult to say which component exactly, since there are several that can cause that, if you have already swapped board and CPU, RAM and PSU would be the main remaining suspects, if you have multiple sticks of RAM, try just one, if the same, try the other one, that will basically rule out the RAM. So just run the box without the GPU plugged in and it crashed. Even though i run memtest for 36H, it could still be the RAM? If so, i'll buy a little 16GB and give it a go... Feels like such an odd issue
July 31, 20241 yr Community Expert 10 minutes ago, toughiv said: Even though i run memtest for 36H, it could still be the RAM? Yep, memtest is only definitive if it finds errors.
July 31, 20241 yr Author 50 minutes ago, JorgeB said: Yep, memtest is only definitive if it finds errors. Okay dont close this thread please - ill get some RAM bought and post results (fingers crossed!)
August 23, 20241 yr Author @JorgeB - I have replaced the RAM and UnRaid still randomly restarted. So now I have proven: - PSU is fine - Replaced the MoBo - Replaced the CPU - Replaced the RAM - Run it without the GPU All still crashes. Surely this is an UnRaid issue?
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.