louij2 Posted December 5, 2023 Share Posted December 5, 2023 Hi, I had a pair of disks with UDMA CRC errors so I decided to swap out for some WD Golds, which has not been an issue before. This time after leaving the UnRaid unattended it will crash, I think (No syslog to tell me what happened apart from the below) Quote 2023-12-05T02:41:06+00:00 Tower nginx: 2023/12/05 02:41:06 [error] 11248#11248: *40758 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.0.0.88, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "tower.local", referrer: "http://tower.local/Main" 2023-12-05T03:54:42+00:00 Tower Parity Check Tuning: Manual Correcting Parity-Check: Manually paused 2023-12-05T04:00:01+00:00 Tower crond[1457]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Nothing to tell me in fix common problems. Could it be permission related as I had to use mover to get the data off the disks. Thanks, Luca Quote Link to comment
louij2 Posted December 5, 2023 Author Share Posted December 5, 2023 (edited) Well the WEB GUI is also not accessible, so I have to restart Edited December 5, 2023 by louij2 Quote Link to comment
louij2 Posted December 5, 2023 Author Share Posted December 5, 2023 2023-12-05T08:32:56+00:00 Tower avahi-daemon[16634]: *** WARNING: Detected another IPv4 mDNS stack running on this host. This makes mDNS unreliable and is thus not recommended. *** Quote Link to comment
louij2 Posted December 6, 2023 Author Share Posted December 6, 2023 It seems to be doing okay now not had a crash overnight Quote Link to comment
louij2 Posted December 7, 2023 Author Share Posted December 7, 2023 Sadly ate my own words and had some sort of crash I managed to capture these errors and messages from the kernel. Could it be a hardware issue? 2023-12-07T01:31:13+00:00 Tower kernel: traps: smartctl[24308] general protection fault ip:14fe53a92894 sp:7ffe61edaae8 error:0 in libc-2.37.so[14fe539d0000+169000] 2023-12-07T03:56:49+00:00 Tower kernel: Linux agpgart interface v0.103 2023-12-07T03:56:49+00:00 Tower kernel: ACPI: bus type drm_connector registered 2023-12-07T03:56:50+00:00 Tower kernel: [drm] amdgpu kernel modesetting enabled. 2023-12-07T03:56:50+00:00 Tower kernel: amdgpu: Ignoring ACPI CRAT on non-APU system 2023-12-07T03:56:50+00:00 Tower kernel: amdgpu: Virtual CRAT table created for CPU 2023-12-07T03:56:50+00:00 Tower kernel: amdgpu: Topology: Add CPU node 2023-12-07T03:56:54+00:00 Tower root: # supported by the Linux kernel. Apple provides mkfs and fsck for 2023-12-07T03:57:05+00:00 Tower kernel: spl: loading out-of-tree module taints kernel. 2023-12-07T03:57:05+00:00 Tower kernel: znvpair: module license 'CDDL' taints kernel. 2023-12-07T03:57:05+00:00 Tower kernel: Disabling lock debugging due to kernel taint 2023-12-07T03:57:06+00:00 Tower kernel: ZFS: Loaded module v2.1.14-1, ZFS pool version 5000, ZFS filesystem version 5 2023-12-07T03:57:06+00:00 Tower kernel: md: unRAID driver 2.9.27 installed 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (1): import 0 sde 64 3907018532 0 WDC_WD40EMAZ-11LW3B0_WD-WX21D690LK0Y 2023-12-07T03:57:06+00:00 Tower kernel: md: import disk0: (sde) WDC_WD40EMAZ-11LW3B0_WD-WX21D690LK0Y size: 3907018532 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (2): import 1 sdf 64 3907018532 0 WDC_WD4002FYYZ-01B7CB0_N8GDKUUY 2023-12-07T03:57:06+00:00 Tower kernel: md: import disk1: (sdf) WDC_WD4002FYYZ-01B7CB0_N8GDKUUY size: 3907018532 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (3): import 2 sdg 64 3907018532 0 WDC_WD4002FYYZ-01B7CB0_N8GBPDDY 2023-12-07T03:57:06+00:00 Tower kernel: md: import disk2: (sdg) WDC_WD4002FYYZ-01B7CB0_N8GBPDDY size: 3907018532 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (4): import 3 sdd 64 3907018532 0 WDC_WD40EMAZ-11LW3B0_WD-WX31D79H311X 2023-12-07T03:57:06+00:00 Tower kernel: md: import disk3: (sdd) WDC_WD40EMAZ-11LW3B0_WD-WX31D79H311X size: 3907018532 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (5): import 4 sdi 64 1953514552 0 WDC_WD20EARX-00PASB0_WD-WCAZAC337276 2023-12-07T03:57:06+00:00 Tower kernel: md: import disk4: (sdi) WDC_WD20EARX-00PASB0_WD-WCAZAC337276 size: 1953514552 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (6): import 5 sdj 64 1953514552 0 WDC_WD20EARX-00PASB0_WD-WCAZAC367766 2023-12-07T03:57:06+00:00 Tower kernel: md: import disk5: (sdj) WDC_WD20EARX-00PASB0_WD-WCAZAC367766 size: 1953514552 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (7): import 6 sdc 64 3907018532 0 WDC_WD4002FYYZ-01B7CB0_N8GA996Y 2023-12-07T03:57:06+00:00 Tower kernel: md: import disk6: (sdc) WDC_WD4002FYYZ-01B7CB0_N8GA996Y size: 3907018532 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (8): import 7 sdh 64 3907018532 0 WDC_WD4002FYYZ-01B7CB0_N8GAA9EY 2023-12-07T03:57:06+00:00 Tower kernel: md: import disk7: (sdh) WDC_WD4002FYYZ-01B7CB0_N8GAA9EY size: 3907018532 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (9): import 8 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (10): import 9 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (11): import 10 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (12): import 11 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (13): import 12 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (14): import 13 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (15): import 14 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (16): import 15 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (17): import 16 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (18): import 17 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (19): import 18 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (20): import 19 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (21): import 20 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (22): import 21 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (23): import 22 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (24): import 23 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (25): import 24 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (26): import 25 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (27): import 26 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (28): import 27 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (29): import 28 2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (30): import 29 2023-12-07T03:57:06+00:00 Tower kernel: md: import_slot: 29 empty 2023-12-07T03:57:09+00:00 Tower kernel: RPC: Registered named UNIX socket transport module. 2023-12-07T03:57:09+00:00 Tower kernel: RPC: Registered udp transport module. 2023-12-07T03:57:09+00:00 Tower kernel: RPC: Registered tcp transport module. 2023-12-07T03:57:09+00:00 Tower kernel: RPC: Registered tcp NFSv4.1 backchannel transport module. 2023-12-07T03:57:11+00:00 Tower kernel: NFSD: Using UMH upcall client tracking operations. 2023-12-07T03:57:11+00:00 Tower kernel: NFSD: starting 90-second grace period (net f0000000) 2023-12-07T04:02:56+00:00 Tower kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: 2023-12-07T04:02:56+00:00 Tower kernel: rcu: 6-...!: (1 GPs behind) idle=e750/0/0x0 softirq=18565/18565 fqs=1 (false positive?) 2023-12-07T04:02:56+00:00 Tower kernel: (detected by 8, t=60006 jiffies, g=42277, q=860 ncpus=12) 2023-12-07T04:02:56+00:00 Tower kernel: Sending NMI from CPU 8 to CPUs 6: 2023-12-07T04:02:56+00:00 Tower kernel: rcu: rcu_preempt kthread timer wakeup didn't happen for 70004 jiffies! g42277 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 2023-12-07T04:02:56+00:00 Tower kernel: rcu: Possible timer handling issue on cpu=6 timer-softirq=2508 2023-12-07T04:02:56+00:00 Tower kernel: rcu: rcu_preempt kthread starved for 70007 jiffies! g42277 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=6 2023-12-07T04:02:56+00:00 Tower kernel: rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. 2023-12-07T04:02:56+00:00 Tower kernel: rcu: RCU grace-period kthread stack dump: 2023-12-07T04:02:56+00:00 Tower kernel: task:rcu_preempt state:I stack:0 pid:15 ppid:2 flags:0x00004000 2023-12-07T04:02:56+00:00 Tower kernel: Call Trace: 2023-12-07T04:02:56+00:00 Tower kernel: <TASK> 2023-12-07T04:02:56+00:00 Tower kernel: __schedule+0x5b2/0x612 2023-12-07T04:02:56+00:00 Tower kernel: ? _raw_spin_unlock_irqrestore+0x24/0x3a 2023-12-07T04:02:56+00:00 Tower kernel: ? __mod_timer+0x207/0x232 2023-12-07T04:02:56+00:00 Tower kernel: ? rcu_gp_init+0x494/0x494 2023-12-07T04:02:56+00:00 Tower kernel: schedule+0x8e/0xcc 2023-12-07T04:02:56+00:00 Tower kernel: schedule_timeout+0x9d/0xd7 2023-12-07T04:02:56+00:00 Tower kernel: ? __next_timer_interrupt+0xf6/0xf6 2023-12-07T04:02:56+00:00 Tower kernel: rcu_gp_fqs_loop+0x12d/0x475 2023-12-07T04:02:56+00:00 Tower kernel: rcu_gp_kthread+0x151/0x16d 2023-12-07T04:02:56+00:00 Tower kernel: kthread+0xe7/0xef 2023-12-07T04:02:56+00:00 Tower kernel: ? kthread_complete_and_exit+0x1b/0x1b 2023-12-07T04:02:56+00:00 Tower kernel: ret_from_fork+0x22/0x30 2023-12-07T04:02:56+00:00 Tower kernel: </TASK> 2023-12-07T04:02:56+00:00 Tower kernel: rcu: Stack dump where RCU GP kthread last ran: 2023-12-07T04:02:56+00:00 Tower kernel: Sending NMI from CPU 8 to CPUs 6: 2023-12-07T04:03:56+00:00 Tower kernel: rcu: INFO: rcu_preempt self-detected stall on CPU 2023-12-07T04:03:56+00:00 Tower kernel: rcu: 8-....: (60000 ticks this GP) idle=191c/1/0x4000000000000000 softirq=15683/15683 fqs=11999 2023-12-07T04:03:56+00:00 Tower kernel: (t=60000 jiffies g=42281 q=1095 ncpus=12) 2023-12-07T04:03:56+00:00 Tower kernel: CPU: 8 PID: 241 Comm: kworker/8:1 Tainted: P O 6.1.64-Unraid #1 2023-12-07T04:03:56+00:00 Tower kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F62 01/24/2022 2023-12-07T04:03:56+00:00 Tower kernel: Workqueue: events once_deferred 2023-12-07T04:03:56+00:00 Tower kernel: RIP: 0010:smp_call_function_many_cond+0x26a/0x283 2023-12-07T04:03:56+00:00 Tower kernel: Code: d0 48 89 df e8 64 fa ff ff 3b 05 d0 e0 2a 01 73 1f 48 63 c8 48 8b 55 00 48 03 14 cd 00 9b 16 82 8b 4a 08 80 e1 01 74 04 f3 90 <eb> f4 ff c0 eb c8 48 83 c4 38 5b 5d 41 5c 41 5d 41 5e 41 5f e9 4c 2023-12-07T04:03:56+00:00 Tower kernel: RSP: 0018:ffffc90004b3fd70 EFLAGS: 00000202 2023-12-07T04:03:56+00:00 Tower kernel: RAX: 0000000000000000 RBX: ffff88842ea2e608 RCX: 0000000000000001 2023-12-07T04:03:56+00:00 Tower kernel: RDX: ffff88842e833080 RSI: 0000000000000020 RDI: ffff88842ea2e608 2023-12-07T04:03:56+00:00 Tower kernel: RBP: ffff88842ea2e600 R08: 0000000000000000 R09: 0000000000000000 2023-12-07T04:03:56+00:00 Tower kernel: R10: 8080808080808080 R11: fefefefefefefeff R12: 0000000000000001 2023-12-07T04:03:56+00:00 Tower kernel: R13: 0000000000000000 R14: ffffffff8102a991 R15: 0000000000000002 2023-12-07T04:03:56+00:00 Tower kernel: FS: 0000000000000000(0000) GS:ffff88842ea00000(0000) knlGS:0000000000000000 2023-12-07T04:03:56+00:00 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2023-12-07T04:03:56+00:00 Tower kernel: CR2: 000020872bd82000 CR3: 000000000220a000 CR4: 00000000003506e0 2023-12-07T04:03:56+00:00 Tower kernel: Call Trace: 2023-12-07T04:03:56+00:00 Tower kernel: <IRQ> 2023-12-07T04:03:56+00:00 Tower kernel: ? rcu_dump_cpu_stacks+0x95/0xb9 2023-12-07T04:03:56+00:00 Tower kernel: ? rcu_sched_clock_irq+0x345/0xa45 2023-12-07T04:03:56+00:00 Tower kernel: ? do_set_msr+0x12/0x12 [kvm] 2023-12-07T04:03:56+00:00 Tower kernel: ? notifier_call_chain+0x38/0x5a 2023-12-07T04:03:56+00:00 Tower kernel: ? timekeeping_update+0xe8/0x117 2023-12-07T04:03:56+00:00 Tower kernel: ? tick_init_jiffy_update+0x7c/0x7c 2023-12-07T04:03:56+00:00 Tower kernel: ? update_process_times+0x62/0x81 2023-12-07T04:03:56+00:00 Tower kernel: ? tick_sched_timer+0x43/0x71 2023-12-07T04:03:56+00:00 Tower kernel: ? __hrtimer_run_queues+0xeb/0x190 2023-12-07T04:03:56+00:00 Tower kernel: ? hrtimer_interrupt+0x9c/0x16e 2023-12-07T04:03:56+00:00 Tower kernel: ? __sysvec_apic_timer_interrupt+0xc5/0x12f 2023-12-07T04:03:56+00:00 Tower kernel: ? sysvec_apic_timer_interrupt+0x80/0xa6 2023-12-07T04:03:56+00:00 Tower kernel: </IRQ> 2023-12-07T04:03:56+00:00 Tower kernel: <TASK> 2023-12-07T04:03:56+00:00 Tower kernel: ? asm_sysvec_apic_timer_interrupt+0x16/0x20 Quote Link to comment
JorgeB Posted December 7, 2023 Share Posted December 7, 2023 Please post the diagnostics. Quote Link to comment
louij2 Posted December 7, 2023 Author Share Posted December 7, 2023 tower-diagnostics-20231207-1459.zip Quote Link to comment
JorgeB Posted December 7, 2023 Share Posted December 7, 2023 Nothing relevant but it's soon after a reboot, enable the syslog server and post that after the issue occurs. Quote Link to comment
louij2 Posted December 8, 2023 Author Share Posted December 8, 2023 All my logs above are from the sys log server around the time of failure. Didn’t get much more than that. Quote Link to comment
itimpi Posted December 8, 2023 Share Posted December 8, 2023 3 hours ago, louij2 said: All my logs above are from the sys log server around the time of failure. Didn’t get much more than that. The diagnostics you posted will not include the syslog file from the syslog server - that needs posting separately Quote Link to comment
louij2 Posted December 8, 2023 Author Share Posted December 8, 2023 I have syslog server running on a separate device. I captured this error just before crash. Attached are full syslog. Many Thanks, Quote 2023-12-08T01:37:27+00:00 Tower kernel: traps: smartctl[20375] general protection fault ip:1504d1413b0d sp:7ffde2ccf920 error:0 in libc-2.37.so[1504d1344000+169000] Tower.log Quote Link to comment
Solution JorgeB Posted December 9, 2023 Solution Share Posted December 9, 2023 Make sure this been taken care of: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173 1 Quote Link to comment
louij2 Posted December 11, 2023 Author Share Posted December 11, 2023 On 12/9/2023 at 9:01 AM, JorgeB said: Make sure this been taken care of: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173 Many thanks I'll double check again. This could have turned off when I've swapped RAM around etc Quote Link to comment
louij2 Posted December 14, 2023 Author Share Posted December 14, 2023 Seems okay now! I do recall having this issue before. Would be good if Fix Common Problems picked up on that. 1 Quote Link to comment
louij2 Posted December 16, 2023 Author Share Posted December 16, 2023 Sadly it crashed again today but it has lasted longer. Have to restart the NAS and then on boot array not started. Might have to just get the premium support booked in. Quote Link to comment
trurl Posted December 16, 2023 Share Posted December 16, 2023 I'm late to this thread, but thought I would comment on the first post: On 12/5/2023 at 3:17 AM, louij2 said: disks with UDMA CRC errors so I decided to swap out CRC errors are connection problems not disk problems. Quote Link to comment
trurl Posted December 16, 2023 Share Posted December 16, 2023 21 minutes ago, louij2 said: crashed again today syslog from syslog server ? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.