Jump to content

Unraid Crashes after upgraded to 6.12.6 and starting a zfs array


Recommended Posts

Hi,

 

I recently upgraded to 6.12.6, and decided to give zfs a try with a 2 disk array. Soon after that, Unraid seemed to stop working after a few hours/days of up time. The system remained powered on, drive power lights are on, but no disk activity, and no network access, and a file transfer to a local unassigned disk also stopped. This has happened 2 times now. 

 

Here are the diags, let me know what else I can do, I've temporarily plugged in a monitor to see what happens during the next crash. I don't plan to start the array yet though. 

 

Unfortunately, the only way out seems to be a hard reboot, which triggers a parity check. 

 

Thanks. 

homecloudv2-diagnostics-20231218-2251.zip

Link to comment
2 minutes ago, artkingjw said:

ok, i created a cache preferred share called 'syslog' and pointed the syslog server there. 

 

it looks like my next step is to just, let my server run, and wait for a crash? 

screenshot.2610.jpg

Not quite.    With the settings shown you will only have the syslog server in listening mode with nothing being written.   As mentioned in the link you need to put the IP of your Unraid server into the Remote syslog Server field to start getting the server to log to itself.    There is also the mirror to flash field if you want it also/instead log to the logs folder on the flash dtive.

  • Like 1
Link to comment
11 minutes ago, itimpi said:

Not quite.    With the settings shown you will only have the syslog server in listening mode with nothing being written.   As mentioned in the link you need to put the IP of your Unraid server into the Remote syslog Server field to start getting the server to log to itself.    There is also the mirror to flash field if you want it also/instead log to the logs folder on the flash dtive.

Gotcha, I'll do both just in case. 

screenshot.2612.jpg

Link to comment
  • 1 month later...

Well it's not censored, it had the names of my shares and my email on it. I've redacted them manually, hopefully didn't miss anything.

 

The attached syslog is only of the session that ultimately crashed. It recorded prior sessions which did not crash, so I deleted them.

 

In the end, I could not access the server to issue a shutdown command, not through the webgui, not through SSH. I tried to access it on console, locally (connected a monitor + keyboard to the server), and it didn't let me log in. Clicking the shutdown button didn't initiate a 'graceful shutdown...'. Then 'forcing shutdown...' also did not shut the server down. 

syslog censored.log

Link to comment
16 minutes ago, JorgeB said:

Try booting in safe mode and/or closing any browser windows open to the GUI, only open when you need to use it then close again.

Ok, I'll keep this in mind. Does this include any docker things like Krusader, Tdarr, etc? I got into a habit of having my most used tools open in pinned tabs. 

I think my server crashed again a few hours ago, after I retrieved the syslog file. I'll reboot and try safe mode later. 

Link to comment
9 hours ago, JorgeB said:

Just the Unraid GUI, at least for now.

Unraid came back! Without a reboot into safe mode. I just closed the browser tabs as you suggested, and waited a few hours as I did other stuff. Came back to check out of curiosity and it returned. Should I boot into safe mode anyway? What would I do in safe mode? Just use it as a file server and see if it crashes in safe mode too? 

Edited by artkingjw
Link to comment

Seems like it's still not stable. When I tried it a few hours after your message, the webGUI became inaccessible again. 

 

I was busy so didn't have time to look into it but left the server on in case it came back. Fast forward a day or two, and the power went out while I was working away from home. My UPS Settings on Unraid were to shutdown after 15 seconds on UPS power, but according to people at home, it did not shut down. Instead, they tried shutting it down by pressing the power button once - which did not work after a few minutes. Then they forced it to shutdown by holding the power button down. I took this to mean that the server was still crashed during the power outage? 

 

After the power outage, the server didn't boot using the USB drive. I did a check disk repair operation in Windows, which allowed Unraid to boot again. 

 

I'm not sure what went wrong, but have attached the diags and syslog here. 

 

Thanks in advance. 

homecloudv2-diagnostics-20240127-1452.zip syslog censored 2.log

Link to comment
4 minutes ago, trurl said:

This is probably not long enough. Have you timed how long your server actually needs to shutdown?

 

OH? Thanks for the tip, I'll read through that link soon. 

 

My impression was that the 'clean shutdown' signal would start AFTER 15 seconds on UPS battery power. What I've been told in the past is to have that number quite small, since my UPS uses an SLA battery, which does not like to run down to empty. It sounds like that time should be timed to be slightly greater than the typical time it takes for the server to perform a clean shutdown? 

Link to comment

So I finally had time to look at the server today. Forced a power down etc. I took the opportunity to inspect the hardware. In short, I found 1 of the RAM sticks loose for some reason - one side didn't have the tab locked. I have no idea how this happened, I've built/rebuilt many computers, many times over several years and never had this happen once. I have a fondness for clicks and would have known if I installed a RAM stick which did not click twice. I'm not sure if this would have caused the aforementioned issues. Remember that my issues involves a failure to access the Unraid GUI and directories, yet, some functions such as VMs remained totally functional. 

 

In theory, the RAM stick could have also come loose while I was inspecting the hardware - I removed some nearby cables and could have bumped it. If this were the case, then a loose RAM stick would not be the cause for my crashes. 

 

In any case, I'll be running the server in safe mode, with no dockers nor VMs for the next while to see if the issues repeat. I noticed that in the past days, the issue happened more quickly. Months ago, the server would run for weeks without problems; recently, it would become inaccessible within a few hours, even though VMs could still be remotely accessed via VNC. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...