Array stopping after swapping out failing disks


louij2
Go to solution Solved by JorgeB,

Recommended Posts

Hi,

 

I had a pair of disks with UDMA CRC errors so I decided to swap out for some WD Golds, which has not been an issue before.

This time after leaving the UnRaid unattended it will crash, I think (No syslog to tell me what happened apart from the below)

 

Quote

2023-12-05T02:41:06+00:00 Tower nginx: 2023/12/05 02:41:06 [error] 11248#11248: *40758 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.0.0.88, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "tower.local", referrer: "http://tower.local/Main"

2023-12-05T03:54:42+00:00 Tower Parity Check Tuning: Manual Correcting Parity-Check: Manually paused
2023-12-05T04:00:01+00:00 Tower crond[1457]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
 

Nothing to tell me in fix common problems.

Could it be permission related as I had to use mover to get the data off the disks.

 

Thanks,

Luca

Link to comment

Sadly ate my own words and had some sort of crash I managed to capture these errors and messages from the kernel. Could it be a hardware issue?

 

2023-12-07T01:31:13+00:00 Tower kernel: traps: smartctl[24308] general protection fault ip:14fe53a92894 sp:7ffe61edaae8 error:0 in libc-2.37.so[14fe539d0000+169000]
2023-12-07T03:56:49+00:00 Tower kernel: Linux agpgart interface v0.103
2023-12-07T03:56:49+00:00 Tower kernel: ACPI: bus type drm_connector registered
2023-12-07T03:56:50+00:00 Tower kernel: [drm] amdgpu kernel modesetting enabled.
2023-12-07T03:56:50+00:00 Tower kernel: amdgpu: Ignoring ACPI CRAT on non-APU system
2023-12-07T03:56:50+00:00 Tower kernel: amdgpu: Virtual CRAT table created for CPU
2023-12-07T03:56:50+00:00 Tower kernel: amdgpu: Topology: Add CPU node
2023-12-07T03:56:54+00:00 Tower root: # supported by the Linux kernel.  Apple provides mkfs and fsck for 
2023-12-07T03:57:05+00:00 Tower kernel: spl: loading out-of-tree module taints kernel.
2023-12-07T03:57:05+00:00 Tower kernel: znvpair: module license 'CDDL' taints kernel.
2023-12-07T03:57:05+00:00 Tower kernel: Disabling lock debugging due to kernel taint
2023-12-07T03:57:06+00:00 Tower kernel: ZFS: Loaded module v2.1.14-1, ZFS pool version 5000, ZFS filesystem version 5
2023-12-07T03:57:06+00:00 Tower kernel: md: unRAID driver 2.9.27 installed
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (1): import 0 sde 64 3907018532 0 WDC_WD40EMAZ-11LW3B0_WD-WX21D690LK0Y
2023-12-07T03:57:06+00:00 Tower kernel: md: import disk0: (sde) WDC_WD40EMAZ-11LW3B0_WD-WX21D690LK0Y size: 3907018532 
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (2): import 1 sdf 64 3907018532 0 WDC_WD4002FYYZ-01B7CB0_N8GDKUUY
2023-12-07T03:57:06+00:00 Tower kernel: md: import disk1: (sdf) WDC_WD4002FYYZ-01B7CB0_N8GDKUUY size: 3907018532 
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (3): import 2 sdg 64 3907018532 0 WDC_WD4002FYYZ-01B7CB0_N8GBPDDY
2023-12-07T03:57:06+00:00 Tower kernel: md: import disk2: (sdg) WDC_WD4002FYYZ-01B7CB0_N8GBPDDY size: 3907018532 
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (4): import 3 sdd 64 3907018532 0 WDC_WD40EMAZ-11LW3B0_WD-WX31D79H311X
2023-12-07T03:57:06+00:00 Tower kernel: md: import disk3: (sdd) WDC_WD40EMAZ-11LW3B0_WD-WX31D79H311X size: 3907018532 
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (5): import 4 sdi 64 1953514552 0 WDC_WD20EARX-00PASB0_WD-WCAZAC337276
2023-12-07T03:57:06+00:00 Tower kernel: md: import disk4: (sdi) WDC_WD20EARX-00PASB0_WD-WCAZAC337276 size: 1953514552 
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (6): import 5 sdj 64 1953514552 0 WDC_WD20EARX-00PASB0_WD-WCAZAC367766
2023-12-07T03:57:06+00:00 Tower kernel: md: import disk5: (sdj) WDC_WD20EARX-00PASB0_WD-WCAZAC367766 size: 1953514552 
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (7): import 6 sdc 64 3907018532 0 WDC_WD4002FYYZ-01B7CB0_N8GA996Y
2023-12-07T03:57:06+00:00 Tower kernel: md: import disk6: (sdc) WDC_WD4002FYYZ-01B7CB0_N8GA996Y size: 3907018532 
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (8): import 7 sdh 64 3907018532 0 WDC_WD4002FYYZ-01B7CB0_N8GAA9EY
2023-12-07T03:57:06+00:00 Tower kernel: md: import disk7: (sdh) WDC_WD4002FYYZ-01B7CB0_N8GAA9EY size: 3907018532 
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (9): import 8
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (10): import 9
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (11): import 10
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (12): import 11
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (13): import 12
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (14): import 13
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (15): import 14
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (16): import 15
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (17): import 16
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (18): import 17
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (19): import 18
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (20): import 19
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (21): import 20
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (22): import 21
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (23): import 22
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (24): import 23
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (25): import 24
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (26): import 25
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (27): import 26
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (28): import 27
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (29): import 28
2023-12-07T03:57:06+00:00 Tower kernel: mdcmd (30): import 29
2023-12-07T03:57:06+00:00 Tower kernel: md: import_slot: 29 empty
2023-12-07T03:57:09+00:00 Tower kernel: RPC: Registered named UNIX socket transport module.
2023-12-07T03:57:09+00:00 Tower kernel: RPC: Registered udp transport module.
2023-12-07T03:57:09+00:00 Tower kernel: RPC: Registered tcp transport module.
2023-12-07T03:57:09+00:00 Tower kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
2023-12-07T03:57:11+00:00 Tower kernel: NFSD: Using UMH upcall client tracking operations.
2023-12-07T03:57:11+00:00 Tower kernel: NFSD: starting 90-second grace period (net f0000000)
2023-12-07T04:02:56+00:00 Tower kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
2023-12-07T04:02:56+00:00 Tower kernel: rcu:    6-...!: (1 GPs behind) idle=e750/0/0x0 softirq=18565/18565 fqs=1 (false positive?)
2023-12-07T04:02:56+00:00 Tower kernel:         (detected by 8, t=60006 jiffies, g=42277, q=860 ncpus=12)
2023-12-07T04:02:56+00:00 Tower kernel: Sending NMI from CPU 8 to CPUs 6:
2023-12-07T04:02:56+00:00 Tower kernel: rcu: rcu_preempt kthread timer wakeup didn't happen for 70004 jiffies! g42277 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
2023-12-07T04:02:56+00:00 Tower kernel: rcu:    Possible timer handling issue on cpu=6 timer-softirq=2508
2023-12-07T04:02:56+00:00 Tower kernel: rcu: rcu_preempt kthread starved for 70007 jiffies! g42277 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=6
2023-12-07T04:02:56+00:00 Tower kernel: rcu:    Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
2023-12-07T04:02:56+00:00 Tower kernel: rcu: RCU grace-period kthread stack dump:
2023-12-07T04:02:56+00:00 Tower kernel: task:rcu_preempt     state:I stack:0     pid:15    ppid:2      flags:0x00004000
2023-12-07T04:02:56+00:00 Tower kernel: Call Trace:
2023-12-07T04:02:56+00:00 Tower kernel: <TASK>
2023-12-07T04:02:56+00:00 Tower kernel: __schedule+0x5b2/0x612
2023-12-07T04:02:56+00:00 Tower kernel: ? _raw_spin_unlock_irqrestore+0x24/0x3a
2023-12-07T04:02:56+00:00 Tower kernel: ? __mod_timer+0x207/0x232
2023-12-07T04:02:56+00:00 Tower kernel: ? rcu_gp_init+0x494/0x494
2023-12-07T04:02:56+00:00 Tower kernel: schedule+0x8e/0xcc
2023-12-07T04:02:56+00:00 Tower kernel: schedule_timeout+0x9d/0xd7
2023-12-07T04:02:56+00:00 Tower kernel: ? __next_timer_interrupt+0xf6/0xf6
2023-12-07T04:02:56+00:00 Tower kernel: rcu_gp_fqs_loop+0x12d/0x475
2023-12-07T04:02:56+00:00 Tower kernel: rcu_gp_kthread+0x151/0x16d
2023-12-07T04:02:56+00:00 Tower kernel: kthread+0xe7/0xef
2023-12-07T04:02:56+00:00 Tower kernel: ? kthread_complete_and_exit+0x1b/0x1b
2023-12-07T04:02:56+00:00 Tower kernel: ret_from_fork+0x22/0x30
2023-12-07T04:02:56+00:00 Tower kernel: </TASK>
2023-12-07T04:02:56+00:00 Tower kernel: rcu: Stack dump where RCU GP kthread last ran:
2023-12-07T04:02:56+00:00 Tower kernel: Sending NMI from CPU 8 to CPUs 6:
2023-12-07T04:03:56+00:00 Tower kernel: rcu: INFO: rcu_preempt self-detected stall on CPU
2023-12-07T04:03:56+00:00 Tower kernel: rcu:    8-....: (60000 ticks this GP) idle=191c/1/0x4000000000000000 softirq=15683/15683 fqs=11999
2023-12-07T04:03:56+00:00 Tower kernel:         (t=60000 jiffies g=42281 q=1095 ncpus=12)
2023-12-07T04:03:56+00:00 Tower kernel: CPU: 8 PID: 241 Comm: kworker/8:1 Tainted: P           O       6.1.64-Unraid #1
2023-12-07T04:03:56+00:00 Tower kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F62 01/24/2022
2023-12-07T04:03:56+00:00 Tower kernel: Workqueue: events once_deferred
2023-12-07T04:03:56+00:00 Tower kernel: RIP: 0010:smp_call_function_many_cond+0x26a/0x283
2023-12-07T04:03:56+00:00 Tower kernel: Code: d0 48 89 df e8 64 fa ff ff 3b 05 d0 e0 2a 01 73 1f 48 63 c8 48 8b 55 00 48 03 14 cd 00 9b 16 82 8b 4a 08 80 e1 01 74 04 f3 90 <eb> f4 ff c0 eb c8 48 83 c4 38 5b 5d 41 5c 41 5d 41 5e 41 5f e9 4c
2023-12-07T04:03:56+00:00 Tower kernel: RSP: 0018:ffffc90004b3fd70 EFLAGS: 00000202
2023-12-07T04:03:56+00:00 Tower kernel: RAX: 0000000000000000 RBX: ffff88842ea2e608 RCX: 0000000000000001
2023-12-07T04:03:56+00:00 Tower kernel: RDX: ffff88842e833080 RSI: 0000000000000020 RDI: ffff88842ea2e608
2023-12-07T04:03:56+00:00 Tower kernel: RBP: ffff88842ea2e600 R08: 0000000000000000 R09: 0000000000000000
2023-12-07T04:03:56+00:00 Tower kernel: R10: 8080808080808080 R11: fefefefefefefeff R12: 0000000000000001
2023-12-07T04:03:56+00:00 Tower kernel: R13: 0000000000000000 R14: ffffffff8102a991 R15: 0000000000000002
2023-12-07T04:03:56+00:00 Tower kernel: FS:  0000000000000000(0000) GS:ffff88842ea00000(0000) knlGS:0000000000000000
2023-12-07T04:03:56+00:00 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2023-12-07T04:03:56+00:00 Tower kernel: CR2: 000020872bd82000 CR3: 000000000220a000 CR4: 00000000003506e0
2023-12-07T04:03:56+00:00 Tower kernel: Call Trace:
2023-12-07T04:03:56+00:00 Tower kernel: <IRQ>
2023-12-07T04:03:56+00:00 Tower kernel: ? rcu_dump_cpu_stacks+0x95/0xb9
2023-12-07T04:03:56+00:00 Tower kernel: ? rcu_sched_clock_irq+0x345/0xa45
2023-12-07T04:03:56+00:00 Tower kernel: ? do_set_msr+0x12/0x12 [kvm]
2023-12-07T04:03:56+00:00 Tower kernel: ? notifier_call_chain+0x38/0x5a
2023-12-07T04:03:56+00:00 Tower kernel: ? timekeeping_update+0xe8/0x117
2023-12-07T04:03:56+00:00 Tower kernel: ? tick_init_jiffy_update+0x7c/0x7c
2023-12-07T04:03:56+00:00 Tower kernel: ? update_process_times+0x62/0x81
2023-12-07T04:03:56+00:00 Tower kernel: ? tick_sched_timer+0x43/0x71
2023-12-07T04:03:56+00:00 Tower kernel: ? __hrtimer_run_queues+0xeb/0x190
2023-12-07T04:03:56+00:00 Tower kernel: ? hrtimer_interrupt+0x9c/0x16e
2023-12-07T04:03:56+00:00 Tower kernel: ? __sysvec_apic_timer_interrupt+0xc5/0x12f
2023-12-07T04:03:56+00:00 Tower kernel: ? sysvec_apic_timer_interrupt+0x80/0xa6
2023-12-07T04:03:56+00:00 Tower kernel: </IRQ>
2023-12-07T04:03:56+00:00 Tower kernel: <TASK>
2023-12-07T04:03:56+00:00 Tower kernel: ? asm_sysvec_apic_timer_interrupt+0x16/0x20

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.