bjsmith911 Posted November 13, 2023 Share Posted November 13, 2023 Folks. Syslog and diagnostics attached. Symptom: server will run for some number of days or weeks, and then crashes. This has happened maybe 3 times, but I only just moved the syslog to a different machine. Dockers stop running, VMs stop running, webGUI non-responsive, does not respond to ping; machine itself remains powered with fans and lights on. Its on UPS with auto-shutdown, so I do not think it's a power blip (I had one of those a week ago, where the machine died and did not reboot when power came back. I see the warnings/errors at 23:21:42, but do not know how to interpret. Server is back online now, hence the diagnostics file. It isn't a major issue, but would prefer to not have to run parity checks all the time lol . vesper-diagnostics-20231112-1717.zip syslog.txt Quote Link to comment
JorgeB Posted November 13, 2023 Share Posted November 13, 2023 See if this applies to you: Quote Link to comment
domrockt Posted November 13, 2023 Share Posted November 13, 2023 i had an similar error with my ARC770 i passed it trough an VM and i think its Kernel related because the Unraid Kernel is not supporting the ARC in full. I removed my ARC Card and had no error since. So it could be an passed trough device. Quote Link to comment
bjsmith911 Posted December 14, 2023 Author Share Posted December 14, 2023 I moved my VMs and converted my Unassigned Devices NVMe drives to a Cache Pool. Still running qBit, but uninstalled the Unassigned Devices plugin. Current uptime 7 days 10 hours (Since 6.12.6 Upgrade). It appears to be an issue with Unassigned Devices. Quote Link to comment
bjsmith911 Posted January 29 Author Share Posted January 29 Not resolved. Unassigned Devices is still uninstalled; qBit runs in a VM (I have it shutdown now, but didnt' before, but still - it's in a Windows 10 VM)s. Two crashes in the past week; one after nearly a month of uptime, the second less than 24h after the first's parity check finished. Same symptoms; machine still running (Fans & lights on, HDDs spinning) but no access to Shares/Web/Dockers(Tailscale, Homebridge)/VMs; requires hard shutdown and reboot. I only managed to capture the syslog from crash 2 (syslog server being on my backup OMV NAS that needed some attention). syslog (2).txt vesper-diagnostics-20240129-0831.zip Quote Link to comment
JorgeB Posted January 29 Share Posted January 29 One thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
bjsmith911 Posted January 30 Author Share Posted January 30 And crashed again today. Running in bareback mode with no VMs/Dockers running (homebridge docker is still running; nice for cameras). I'll try safe-mode next; I'll run a mem-test on the 7th if it's crashed by then. Nothing mentioned about the crash in the syslog at all. Quote Link to comment
bjsmith911 Posted February 1 Author Share Posted February 1 (edited) Edit 2: Unable to reboot; logs stalled; "Array stopping: unmounting disks" but no further unmounting attempts. Tried "reboot" from cli and GUI - did not reboot. This is getting ridiculous. No way to force a reboot or force an unmount?!? Edit1: Found this one; corrupt file wouldn't transfer; hung up doing something; also prevented Array from unmounting. Data point: removed nearly all my plugins, all VMs off, only Plex and Homebridge running. Caught server with random CPUs pinned; server idle (I had a sync running from another VM pulling files onto my backup NAS; terminated that and shutdown the VM; no change). Not sure if it's related, but it doesn't seem right. No crash yet. Nothing in "Processes" with any significant CPU usage; only 2 above 0.1%, (6% & 2%) - not sure how else to determine what is using it. vesper-diagnostics-20240131-2055.zip Edited February 1 by bjsmith911 Quote Link to comment
JorgeB Posted February 1 Share Posted February 1 Jan 31 17:13:43 Vesper kernel: unraidd+0x51a/0x1140 [md_mod] Jan 31 17:13:43 Vesper kernel: md_thread+0xf4/0x122 [md_mod] Unraid driver is crashing, this is almost always a hardware issue, start by running memtest. Quote Link to comment
bjsmith911 Posted February 8 Author Share Posted February 8 No issues since last boot; took it offline yesterday to run a memtest. I'll upload the full report later (don't have the memtest USB with me) but the memory checks out. (Possibly vulnerable to high frequency row hammer bit flips, but no errors). I've remove several, but not all, plugins, including the Nvidia plugin; also removed the GT1030 that was in there, left over from my quad-monitor workstation days. Quote Link to comment
bjsmith911 Posted July 31 Author Share Posted July 31 I keep seeing posts re: intermittent crashing. These folks are lambasted by people saying it isn't a problem, but then why do the posts keep coming? Anyway. I am not a developer, and I have inconsistently downloaded and examined my syslogs, but they do consistently show BUGS as the last timestamp before the system goes unresponsive. E.g.: Nov1123:21:42Vesper kernel:BUG: unable to handle page fault for address: 00000200636d12d0 Nov1123:21:42Vesper kernel:#PF: supervisor read access in kernel mode Nov1123:21:42Vesper kernel:#PF: error_code(0x0000) - not-present page Nov1123:21:42Vesper kernel:PGD 0 P4D 0 Nov1123:21:42Vesper kernel:Oops: 0000 [#1] PREEMPT SMP NOPTI Nov1123:21:42Vesper kernel:CPU: 2 PID: 163 Comm: kswapd0 Tainted: P U O 6.1.49-Unraid #1 Nov1123:21:42Vesper kernel:Hardware name: Micro-Star International Co., Ltd. MS-7D06/MPG Z590 GAMING CARBON WIFI (MS-7D06), BIOS 1.B0 06/12/2023 and Jul293:05:44Vesper kernel: BUG: kernel NULL pointer dereference, address: 0000000000000081 Jul293:05:44Vesper kernel: #PF: supervisor read access in kernel mode Jul293:05:44Vesper kernel: #PF: error_code(0x0000) - not-present page Jul293:05:44Vesper kernel: PGD 15fcd5067 P4D 15fcd5067 PUD 15fcd4067 PMD 0 Jul293:05:44Vesper kernel: Oops: 0000 [#2] PREEMPT SMP NOPTI Jul293:05:44Vesper kernel: CPU: 7 PID: 15899 Comm: shfs Tainted: P UD O 6.1.79-Unraid #1 Jul293:05:44Vesper kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D06/MPG Z590 GAMING CARBON WIFI (MS-7D06), BIOS 1.B0 06/12/2023 These posts seem to be layered with animosity from both the Unraid faithful and the "victims", but what is getting lost in the dialogue is that there is an issue. I understand it may ultimately be related to some pairing of Linux Kernel-to-specific Hardware, and not an Unraid issue, but an issue nonetheless. Examples: https://www.facebook.com/groups/217132562182318/posts/1593856457843248/ https://www.facebook.com/groups/217132562182318/posts/1594972577731636/ https://www.facebook.com/groups/217132562182318/posts/1598660484029512/ Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.