Dealing with unclean shutdowns


Recommended Posts

On 8/31/2021 at 3:51 PM, trurl said:

What have you already tried that was discussed in this thread you have posted in?

at this moment manually before soft reboot the server

- stopped the two VM:s before rebooting.

- rebooting with external nvme to usb drive attached and unplugged 

- unmount smb shares

- stopped dockers

- increased the timer in  Settings->Disk Settings->Shutdown time-out to: 420s

- increased VM shutdown time-out: 300s

 

at the moment i am doing a Parity-Sync / data rebuild from this morning when i have to replace an old data drive so i am unable to try to reboot at the moment.

Edited by TIE Fighter
Link to comment

@trurl 

parity rebuild completed manually stopped the vms and dockers.

one docker was not possible to stop and forced the array to not be put offline.

for future knowledge it was a docker that i added from github rep that manage the HBA Card Adaptec 71605.

i only use this to monitor the temp of the card.

hard rebooted and stopped the docker from autostart.

 

now been trying a cuple of shutdown and reboot attempts and everything seems ok.

 

cheers

Link to comment
  • 1 month later...
9 hours ago, sirhotness said:

im glad i found this topic.  thank you for posting this.  i am going to try this as i get intermittent unclean shutdowns and then i have to start the array back up which does a parity check.  i am going to attach my syslog, is this helpful?  i am new so i am wondering if i need to do anything else to prevent this from happening in the future.

 

thank you for all your help as it is much appreciated.

Please don't post about the same thing in multiple threads. It makes it impossible to coordinate replies.

 

Since this user already posted about this on another thread and received a reply, please go there if you have anything to add:

 

 

Link to comment

Hello  all! This is quite a convenient thread, though I didn't find the answer to my question so I hope I'm not asking the same thing as somebody else.

 

I experienced an unclean shutdown due to a power outage, and when the power returned and the server came back online, it mounted the disks and began a parity check automatically (as expected).

 

I would like to ask - when starting a parity check manually, the "write corrections to parity" is ticked by default. When the parity check starts automatically, is it starting with this ticked or unticked? How do I found out and control this behaviour?

 

Furthermore, I am under the impression that unless I want to correct the parity data, I should untick this option when doing parity checks. This is so that any corrupt data on the data disks can be fixed using the parity data. As this is likely when there is an unexpected power outage, I believe that it would be proper practice to perform the automatic parity check without writing corrections to parity. Is my assessment correct?

 

Thank you all!

 

Link to comment
  • 3 weeks later...

My unraid was off for a few weeks, yesterday I have installed a new HDDs to rebuild (create a new) array and suddenly I started receiving unclean shutdowns. Before it was stable, no issues running for a month or more.  Installed HDDs are a brand Ironwolf NAS 4TB, SMART didn't report any issues.

 

Can you please take a look at my logs and advise what could be the reason of my issues?

unraid-diagnostics-20211116-1308.zip

Link to comment

Well, difficult to say. My server is running headless thus I don't know exactly what's happening there. But when I'm logging in after such a random  'restart' I see that parity check is in progress. Also in the Fix common problems I see:

 

Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged

 

EDIT:

 

I have connected a GPU to see what's happening with unraid once it's crashed but there's nothing displayed on the screen. Also now the system doesn't reboot itself I need to use power button to turn off and on again.

 

I have configured a syslog however not sure if there's anything interesting here.

syslog

Edited by ataman
Link to comment
On 11/16/2021 at 10:09 AM, ataman said:

syslog however not sure if there's anything interesting here

Any idea what this is about?

192.168.1.222	Nov 17 12:34:50	unraid	cron	notice	crond[1910]	time disparity of 440 minutes detected

440 minutes = 7 hours 20 minutes

 

Not clear there was any reboot during that syslog. Can you get that syslog server to put dates on the timestamps? Or just send it to somewhere on Unraid, it will look more like the syslogs we are familiar with.

Link to comment
6 hours ago, trurl said:

Or just send it to somewhere on Unraid

 

I've managed to save the logs to the flash drive. I brought the server next to my desk and connected to the screen so I can better troubleshoot it. 

 

Another restart/unclean shutdown happened around 22:13. What is interested that server didn't reboot, it just stopped responding in the WebGui (or ping, shares etc). On the connected screen I was still able to see the boot up lines with flashing unraid login:

 

After 2 or 3 minutes, I was able to login to the WebGui again, and as always I saw the message about unclean shutdown and parity-check in progress..

 

Is it possible that it's more a software issue or maybe flash drive which is failing? 

syslog_flash.txt

Link to comment
1 minute ago, trurl said:

Have you done memtest?

 

Yes I did just today, passed 2 times.

 

Also checked the Ryzen thread before. What is interesting that unraid was working fine for a couple of months and just after installation of new HDDs and applying "New config" the problem started.

 

btw. is it normall to see below lines in the logs? I've noticed them each time when unraid doesn't respond

 

192.168.1.222	Nov 17 22:14:21	unraid	user	info	emhttpd	shcmd (38): /etc/rc.d/rc.samba restart
192.168.1.222	Nov 17 22:14:21	unraid	daemon	err	nmbd[4207]	[2021/11/17 22:14:21.962095,  0] ../../source3/nmbd/nmbd.c:59(terminate)
192.168.1.222	Nov 17 22:14:21	unraid	daemon	err	nmbd[4207]	  Got SIGTERM: going down...
192.168.1.222	Nov 17 22:14:21	unraid	daemon	err	winbindd[4217]	[2021/11/17 22:14:21.962147,  0] ../../source3/winbindd/winbindd.c:244(winbindd_sig_term_handler)
192.168.1.222	Nov 17 22:14:21	unraid	daemon	err	winbindd[4217]	  Got sig[15] terminate (is_parent=1)

 

Link to comment
  • 2 weeks later...
1 hour ago, dchamb said:

What should the Disk Settings be set to if you do not have VM Manager enabled? 

On 8/22/2021 at 1:54 PM, itimpi said:

try hitting the button to Stop the array and time how long it takes.   You need to make sure the Disk Settings -> shutdown timeout setting is longer than that.

  

 

Link to comment
  • 1 month later...
15 minutes ago, gamerkonks said:

Does anyone know what would cause an unclean shutdown when shutting down with the array stopped?

That sounds like a bug although not at all clear what could trigger it.  Maybe the flash drive had dropped offline so that Unraid could not update the array status on it to say it was successfully stopped?

Link to comment
17 hours ago, itimpi said:

That sounds like a bug although not at all clear what could trigger it.  Maybe the flash drive had dropped offline so that Unraid could not update the array status on it to say it was successfully stopped?

That must be it.

I did get an error saying the flash drive was read only last time I booted, but it seems fine now.

Link to comment
  • 2 weeks later...

Hey guys,

I've been dealing with unclean shutdowns for a while and I'm not sure why. In all honesty, I haven't really had the time to deal with it.

However this last time it caused one of my SSDs xfs to become corrupt and it ended up being my plex sdd, so now I'm making time.

I have attached a diagnostics file (that is where I saw which SSD is corrupted - sdu), but I have yet to find the root of the problem.

 

To be clear, I've fixed the SSD drive and plex is back up and running, fortunately it wasn't the DB that was corrupted, however I REALLY want to avoid this again.

 

Thank you.

 

Any help is greatly appreciated.

tower-diagnostics-20220120-0514.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.