6.10.3 Unraid freezes


Go to solution Solved by DBJordan,

Recommended Posts

Unraid freezes multiple times a week and I have to cold reboot it. (This isn't just the http server going down -- it won't respond to key presses from a keyboard directly connected to the server.) This configuration used to work for months at a time, but it seems something has gone wrong. I've run memtest86 overnight with no findings. I'm not sure what else to try. Any ideas?

 

truesource-diagnostics-20220620-1353.zip

Edited by DBJordan
Link to comment

Thanks for the help!

 

I was able to set RAM to 1866 but couldn't find an option to handle power supply idle control or c-states. I tried this:

 

Started here and picked "CPU Configuration"

 IMG-1030.thumb.jpg.0217ca81e7a752275991b844fc8f34e2.jpg

 

Once in there, I changed C6 mode from "enabled" to "disabled."

 

IMG-1031.thumb.jpg.26624fb39f40e47c0a4aaee2b80579a1.jpg

 

Also, btrfs scrub detected some irreparable errors in the syslog:

22-06-25 17:42:53    Kernel.Info    172.16.100.100    Jun 25 17:42:53 Truesource kernel: BTRFS info (device nvme0n1p1): device stats zeroed by btrfs (25391)
2022-06-25 17:42:53    Kernel.Info    172.16.100.100    Jun 25 17:42:53 Truesource kernel: BTRFS info (device nvme0n1p1): device stats zeroed by btrfs (25391)
2022-06-25 17:42:56    Kernel.Info    172.16.100.100    Jun 25 17:42:56 Truesource kernel: BTRFS info (device nvme0n1p1): device stats zeroed by btrfs (25402)
2022-06-25 17:42:56    Kernel.Info    172.16.100.100    Jun 25 17:42:56 Truesource kernel: BTRFS info (device nvme0n1p1): device stats zeroed by btrfs (25402)
2022-06-25 17:43:10    Kernel.Info    172.16.100.100    Jun 25 17:43:09 Truesource kernel: BTRFS info (device nvme0n1p1): scrub: started on devid 1
2022-06-25 17:43:10    Kernel.Info    172.16.100.100    Jun 25 17:43:09 Truesource kernel: BTRFS info (device nvme0n1p1): scrub: started on devid 2
2022-06-25 17:44:00    Kernel.Warning    172.16.100.100    Jun 25 17:44:00 Truesource kernel: BTRFS warning (device nvme0n1p1): checksum error at logical 932456402944 on dev /dev/nvme0n1p1, physical 222713057280, root 5, inode 6909694, offset 2324074496, length 4096, links 1 (path: PRIVATE)
2022-06-25 17:44:00    Kernel.Error    172.16.100.100    Jun 25 17:44:00 Truesource kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
2022-06-25 17:44:00    Kernel.Error    172.16.100.100    Jun 25 17:44:00 Truesource kernel: BTRFS error (device nvme0n1p1): unable to fixup (regular) error at logical 932456402944 on dev /dev/nvme0n1p1
2022-06-25 17:44:40    Kernel.Warning    172.16.100.100    Jun 25 17:44:40 Truesource kernel: BTRFS warning (device nvme0n1p1): checksum error at logical 1262362894336 on dev /dev/nvme0n1p1, physical 574094385152, root 5, inode 13617139, offset 347021312, length 4096, links 1 (path: PRIVATE)
2022-06-25 17:44:40    Kernel.Error    172.16.100.100    Jun 25 17:44:40 Truesource kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
2022-06-25 17:44:40    Kernel.Error    172.16.100.100    Jun 25 17:44:40 Truesource kernel: BTRFS error (device nvme0n1p1): unable to fixup (regular) error at logical 1262362894336 on dev /dev/nvme0n1p1
2022-06-25 17:44:41    Kernel.Warning    172.16.100.100    Jun 25 17:44:40 Truesource kernel: BTRFS warning (device nvme0n1p1): checksum error at logical 1280042659840 on dev /dev/nvme0n1p1, physical 591774150656, root 5, inode 14185170, offset 340320256, length 4096, links 1 (path: PRIVATE)
2022-06-25 17:44:41    Kernel.Error    172.16.100.100    Jun 25 17:44:40 Truesource kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
2022-06-25 17:44:41    Kernel.Error    172.16.100.100    Jun 25 17:44:40 Truesource kernel: BTRFS error (device nvme0n1p1): unable to fixup (regular) error at logical 1280042659840 on dev /dev/nvme0n1p1
2022-06-25 17:44:41    Kernel.Info    172.16.100.100    Jun 25 17:44:40 Truesource kernel: BTRFS info (device nvme0n1p1): scrub: finished on devid 1 with status: 0
2022-06-25 17:46:02    Kernel.Warning    172.16.100.100    Jun 25 17:46:02 Truesource kernel: BTRFS warning (device nvme0n1p1): checksum error at logical 932456402944 on dev /dev/nvme1n1p1, physical 222692085760, root 5, inode 6909694, offset 2324074496, length 4096, links 1 (path: PRIVATE)
2022-06-25 17:46:02    Kernel.Error    172.16.100.100    Jun 25 17:46:02 Truesource kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
2022-06-25 17:46:02    Kernel.Error    172.16.100.100    Jun 25 17:46:02 Truesource kernel: BTRFS error (device nvme0n1p1): unable to fixup (regular) error at logical 932456402944 on dev /dev/nvme1n1p1
2022-06-25 17:48:19    Kernel.Warning    172.16.100.100    Jun 25 17:48:19 Truesource kernel: BTRFS warning (device nvme0n1p1): checksum error at logical 1262362894336 on dev /dev/nvme1n1p1, physical 574073413632, root 5, inode 13617139, offset 347021312, length 4096, links 1 (path: PRIVATE)
2022-06-25 17:48:19    Kernel.Error    172.16.100.100    Jun 25 17:48:19 Truesource kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
2022-06-25 17:48:19    Kernel.Error    172.16.100.100    Jun 25 17:48:19 Truesource kernel: BTRFS error (device nvme0n1p1): unable to fixup (regular) error at logical 1262362894336 on dev /dev/nvme1n1p1
2022-06-25 17:48:22    Kernel.Warning    172.16.100.100    Jun 25 17:48:21 Truesource kernel: BTRFS warning (device nvme0n1p1): checksum error at logical 1280042659840 on dev /dev/nvme1n1p1, physical 591753179136, root 5, inode 14185170, offset 340320256, length 4096, links 1 (path: PRIVATE)
2022-06-25 17:48:22    Kernel.Error    172.16.100.100    Jun 25 17:48:21 Truesource kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
2022-06-25 17:48:22    Kernel.Error    172.16.100.100    Jun 25 17:48:21 Truesource kernel: BTRFS error (device nvme0n1p1): unable to fixup (regular) error at logical 1280042659840 on dev /dev/nvme1n1p1
2022-06-25 17:48:22    Kernel.Info    172.16.100.100    Jun 25 17:48:22 Truesource kernel: BTRFS info (device nvme0n1p1): scrub: finished on devid 2 with status: 0

 

Link to comment

Hi,

 

I also have this issue after upgrading to 6.10.3. Typically:

  • Server becomes unreachable on the network
  • Physically, server is on but unresponsive - even via direct keyboard / mouse interaction
  • I had to give it a hard reset a couple of times but it won't last 8 hours before freezing again, even the parity checks don't get a chance to complete

I have reverted to the previous version as my backups are stored on the Unraid environment and I couldn't afford to be without. After roll back, the server has been stable for just over 24 hours. I'll have a look at the BIOS settings as recommended above and change accordingly if needed.

 

Link to comment
  • 3 weeks later...

Still get some intermittent reboots a few times a week. Had a crash and reboot just before 0500. Auto-starting the array after unexpected shutdown is disabled, so the logs say nothing after reboot until I logged in to the webpage at 1123.

 

Have noticed the automated mover exits with a 1. When I run it manually, it completes with return code of 0. Any thoughts on whether this is an indicator as to why the system is subsequently rebooting?

 

Logs attached.

 

SyslogCatchAll-2022-07-17.txt

Link to comment
  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.