Unraid server locking up weekly


Recommended Posts

For a few weeks now, my server has been locking up every weekend.  At first I didn't notice the regularity, but this week I noticed the uptime was 6 days and 20 hours when I was dealing with it.  Considering I woke up around 3:30 AM and started messing with it, the uptime would have been right at 7 days if I had waited until morning to take care of it, like usual.

 

The lockup first becomes apparent because the network shares become unavailable to file explorer and any other applications using the shares.  While the shares are unavailable, the webGUI is still partially working.  The exact state of the webGUI has been different each week.  Some pages load fully, while others only load partly.  For example, the past few weeks the MAIN page would load, except the Array Operation section at the bottom would be blank.  In each case I have been able to access the page to download diagnostics, but until this week the diagnostics never would actually download.  The week the diagnostics did finally download so I have something to upload.

 

Recovering from the lockup always ends in me shutting down manually and restarting, which of course is followed by a parity check.  I have tried shutting down via the webgui and the terminal window without success.  With a monitor connected to the server, when I try powering down from the terminal window I can see the process starting, but it never finishes and actually shuts off the hardware.

 

Since this week the webgui was a little more complete (i.e. Array Operation was loading) I got to see a little more info than past episodes.  One interesting thing is it indicated Mover was running, but no actual disk activity was indicated.  I don't know if that is significant, it's just something I saw.  The regularity of this happening every saturday night/sunday morning made me look for a corresponding scheduled event.  I have a number of things that check overnight, such as application updates that check daily, the only weekly item I found was SSD Trim (enabled for my cache SSDs) set for Sunday at 2AM.  I am going to disable Trim for now and see if it solves the problem. 

 

Any thoughts on Trim locking up the system?    

UNBUCKET Main 09052020c.pdf unbucket-diagnostics-20200906-0331.zip

Link to comment

I would guess ( @JorgeB ) could tell for sure that there's problems with the cache drive which may possibly be causing this

 

g 30 07:54:18 UNBUCKET emhttpd: shcmd (109): mount -t btrfs -o noatime,nodiratime,degraded -U 6bd4f0c7-7a8d-4c23-b9cb-8fbb05f39307 /mnt/cache
Aug 30 07:54:18 UNBUCKET kernel: BTRFS info (device sds1): allowing degraded mounts
Aug 30 07:54:18 UNBUCKET kernel: BTRFS info (device sds1): disk space caching is enabled
Aug 30 07:54:18 UNBUCKET kernel: BTRFS info (device sds1): has skinny extents
Aug 30 07:54:18 UNBUCKET kernel: BTRFS warning (device sds1): devid 2 uuid b450540a-bb2e-4508-96db-e3f32dc9ad66 is missing
Aug 30 07:54:18 UNBUCKET kernel: BTRFS info (device sds1): bdev (null) errs: wr 251066129, rd 74242592, flush 2853347, corrupt 0, gen 0
Aug 30 07:54:19 UNBUCKET kernel: BTRFS info (device sds1): enabling ssd optimizations

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.