jkBuckethead Posted September 7, 2020 Share Posted September 7, 2020 For a few weeks now, my server has been locking up every weekend. At first I didn't notice the regularity, but this week I noticed the uptime was 6 days and 20 hours when I was dealing with it. Considering I woke up around 3:30 AM and started messing with it, the uptime would have been right at 7 days if I had waited until morning to take care of it, like usual. The lockup first becomes apparent because the network shares become unavailable to file explorer and any other applications using the shares. While the shares are unavailable, the webGUI is still partially working. The exact state of the webGUI has been different each week. Some pages load fully, while others only load partly. For example, the past few weeks the MAIN page would load, except the Array Operation section at the bottom would be blank. In each case I have been able to access the page to download diagnostics, but until this week the diagnostics never would actually download. The week the diagnostics did finally download so I have something to upload. Recovering from the lockup always ends in me shutting down manually and restarting, which of course is followed by a parity check. I have tried shutting down via the webgui and the terminal window without success. With a monitor connected to the server, when I try powering down from the terminal window I can see the process starting, but it never finishes and actually shuts off the hardware. Since this week the webgui was a little more complete (i.e. Array Operation was loading) I got to see a little more info than past episodes. One interesting thing is it indicated Mover was running, but no actual disk activity was indicated. I don't know if that is significant, it's just something I saw. The regularity of this happening every saturday night/sunday morning made me look for a corresponding scheduled event. I have a number of things that check overnight, such as application updates that check daily, the only weekly item I found was SSD Trim (enabled for my cache SSDs) set for Sunday at 2AM. I am going to disable Trim for now and see if it solves the problem. Any thoughts on Trim locking up the system? UNBUCKET Main 09052020c.pdf unbucket-diagnostics-20200906-0331.zip Quote Link to comment
Squid Posted September 7, 2020 Share Posted September 7, 2020 I would guess ( @JorgeB ) could tell for sure that there's problems with the cache drive which may possibly be causing this g 30 07:54:18 UNBUCKET emhttpd: shcmd (109): mount -t btrfs -o noatime,nodiratime,degraded -U 6bd4f0c7-7a8d-4c23-b9cb-8fbb05f39307 /mnt/cache Aug 30 07:54:18 UNBUCKET kernel: BTRFS info (device sds1): allowing degraded mounts Aug 30 07:54:18 UNBUCKET kernel: BTRFS info (device sds1): disk space caching is enabled Aug 30 07:54:18 UNBUCKET kernel: BTRFS info (device sds1): has skinny extents Aug 30 07:54:18 UNBUCKET kernel: BTRFS warning (device sds1): devid 2 uuid b450540a-bb2e-4508-96db-e3f32dc9ad66 is missing Aug 30 07:54:18 UNBUCKET kernel: BTRFS info (device sds1): bdev (null) errs: wr 251066129, rd 74242592, flush 2853347, corrupt 0, gen 0 Aug 30 07:54:19 UNBUCKET kernel: BTRFS info (device sds1): enabling ssd optimizations Quote Link to comment
JorgeB Posted September 7, 2020 Share Posted September 7, 2020 It's possible, there's a cache device missing and Unraid is failing to remove it because there's filesystem corruption, and since there are filesystem issues they can then also cause other problems, in any case this needs to be fixed, best bet it to backup then reformat cache. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.