[SOLVED] disk disabled during parity check


Recommended Posts

Yes, those are the tunables I recommend and they generally perform better than defaults, and up till recently there were no call traces with them, but for a few releases a few users start getting some, and usually just lowering the sync_thresh tunable is enough to get rid of them, start lowering it little by little and stop when there are no more traces, you can change the value during a parity check.

Link to comment
3 hours ago, johnnie.black said:

Yes, those are the tunables I recommend and they generally perform better than defaults, and up till recently there were no call traces with them, but for a few releases a few users start getting some, and usually just lowering the sync_thresh tunable is enough to get rid of them, start lowering it little by little and stop when there are no more traces, you can change the value during a parity check.

Adjusted the sync_thresh down 20 and so far 25% done and no call traces yet. 

 

Though now my parity disk link hard reset is back. First parity check after replacing the card and sata card it was gone, now the second one with no other changes and it is back.

 

Dec  5 08:16:46 VOID kernel: mdcmd (134): check nocorrect
Dec  5 08:16:46 VOID kernel: md: recovery thread: check P ...
Dec  5 08:16:46 VOID kernel: md: using 8192k window, over a total of 5860522532 blocks.
Dec  5 08:16:46 VOID kernel: md: recovery thread: P incorrect, sector=0
Dec  5 08:16:46 VOID kernel: md: recovery thread: P incorrect, sector=24
Dec  5 08:16:46 VOID kernel: md: recovery thread: P incorrect, sector=32
Dec  5 08:16:46 VOID kernel: md: recovery thread: P incorrect, sector=96
Dec  5 08:16:53 VOID kernel: md: recovery thread: P incorrect, sector=1474968
Dec  5 08:18:01 VOID emhttpd: req (13): csrf_token=****************&title=System+Log&cmd=%2FwebGui%2Fscripts%2Ftail_log&arg1=syslog
Dec  5 08:18:01 VOID emhttpd: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog
Dec  5 08:19:03 VOID kernel: ata8.00: exception Emask 0x10 SAct 0x0 SErr 0x90002 action 0xe frozen
Dec  5 08:19:03 VOID kernel: ata8.00: irq_stat 0x00400000, PHY RDY changed
Dec  5 08:19:03 VOID kernel: ata8: SError: { RecovComm PHYRdyChg 10B8B }
Dec  5 08:19:03 VOID kernel: ata8.00: failed command: READ DMA EXT
Dec  5 08:19:03 VOID kernel: ata8.00: cmd 25/00:78:20:12:cd/00:03:01:00:00/e0 tag 30 dma 454656 in
Dec  5 08:19:03 VOID kernel:         res 50/00:00:1f:12:cd/00:00:01:00:00/e0 Emask 0x10 (ATA bus error)
Dec  5 08:19:03 VOID kernel: ata8.00: status: { DRDY }
Dec  5 08:19:03 VOID kernel: ata8: hard resetting link
Dec  5 08:19:13 VOID kernel: ata8: softreset failed (1st FIS failed)
Dec  5 08:19:13 VOID kernel: ata8: hard resetting link
Dec  5 08:19:19 VOID kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec  5 08:19:19 VOID kernel: ata8.00: configured for UDMA/133
Dec  5 08:19:19 VOID kernel: ata8: EH complete

 

Link to comment
19 minutes ago, johnnie.black said:

If ATA8 is on the Asmedia then it's likely a cable/connection problem, swap cables/backplane with another disk.

It is and it had the cable swapped last night with a new SATAIII cable fresh out of the plastic. the first parity run after that (last night) and the error was gone but now the second one this morning and it is back. 

 

I have a modular PSU but am out of molex connectors so I have been using sata to molex splitters to power two of the Norco bays. I'll have to buy more as I don't have any more splitters on hand.

 

Edit: sometimes my reading comprehension is horrid lol. I'll try swapping its spot with the other asmedia data disk and see if the errors follow it. 

Edited by weirdcrap
Link to comment
8 hours ago, weirdcrap said:

It is and it had the cable swapped last night with a new SATAIII cable fresh out of the plastic. the first parity run after that (last night) and the error was gone but now the second one this morning and it is back. 

 

I have a modular PSU but am out of molex connectors so I have been using sata to molex splitters to power two of the Norco bays. I'll have to buy more as I don't have any more splitters on hand.

 

Edit: sometimes my reading comprehension is horrid lol. I'll try swapping its spot with the other asmedia data disk and see if the errors follow it. 

While I was waiting for that parity check to finish UnRAID hung (before I got to try swapping anything). WebUI, dockers, SSH/Telnet all unresponsive but for some reason my terminal still worked. 

 

When I logged in it immediately showed me a message about a "hang time value being exceeded" or something along those lines. I managed to reboot it and capture the diagnostics to the flash drive which I will attach after the server finishes booting back up all the way.

 

Edit; Shut the server down and swapped the parity and data disk on the asmedia controller to continue my troubleshooting.

 

void-diagnostics-20181205-2026.zip

Edited by weirdcrap
Link to comment
14 hours ago, johnnie.black said:

You need to lower sync thresh more, still call traces during the check, try around 1500.

I have lowered it further but am going out of town for a long weekend so I will come back and post when I return and have tested the new settings.

 

Thanks for all your help so far Johnnie!

Edited by weirdcrap
Link to comment

Returned from vacation and started a parity check.

 

Now the disk resetting is ata4, a 2TB Seagate drive NOT in the same top norco 5 bay enclosure as where my other suspect disks started at (it is in the one below it).

 

Dec 10 07:51:54 VOID kernel: ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x40d0202 action 0xe frozen
Dec 10 07:51:54 VOID kernel: ata4.00: irq_stat 0x00400000, PHY RDY changed
Dec 10 07:51:54 VOID kernel: ata4: SError: { RecovComm Persist PHYRdyChg CommWake 10B8B DevExch }
Dec 10 07:51:54 VOID kernel: ata4.00: failed command: READ DMA EXT
Dec 10 07:51:54 VOID kernel: ata4.00: cmd 25/00:78:28:14:cd/00:01:01:00:00/e0 tag 0 dma 192512 in
Dec 10 07:51:54 VOID kernel:         res 50/00:00:27:80:11/00:00:02:00:00/e0 Emask 0x10 (ATA bus error)
Dec 10 07:51:54 VOID kernel: ata4.00: status: { DRDY }
Dec 10 07:51:54 VOID kernel: ata4: hard resetting link
Dec 10 07:51:59 VOID kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 10 07:51:59 VOID kernel: ata4.00: configured for UDMA/133
Dec 10 07:51:59 VOID kernel: ata4: EH complete
Dec 10 07:52:03 VOID kernel: ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Dec 10 07:52:03 VOID kernel: ata4.00: irq_stat 0x00400000, PHY RDY changed
Dec 10 07:52:03 VOID kernel: ata4: SError: { RecovComm Persist PHYRdyChg 10B8B }
Dec 10 07:52:03 VOID kernel: ata4.00: failed command: READ DMA EXT
Dec 10 07:52:03 VOID kernel: ata4.00: cmd 25/00:00:a8:49:dc/00:04:01:00:00/e0 tag 3 dma 524288 in
Dec 10 07:52:03 VOID kernel:         res 50/00:00:a7:b5:20/00:00:02:00:00/e0 Emask 0x10 (ATA bus error)
Dec 10 07:52:03 VOID kernel: ata4.00: status: { DRDY }
Dec 10 07:52:03 VOID kernel: ata4: hard resetting link
Dec 10 07:52:13 VOID kernel: ata4: softreset failed (1st FIS failed)
Dec 10 07:52:13 VOID kernel: ata4: hard resetting link
Dec 10 07:52:20 VOID kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 10 07:52:20 VOID kernel: ata4.00: configured for UDMA/133
Dec 10 07:52:20 VOID kernel: ata4: EH complete
Dec 10 07:54:03 VOID kernel: ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
Dec 10 07:54:03 VOID kernel: ata4.00: irq_stat 0x00400000, PHY RDY changed
Dec 10 07:54:03 VOID kernel: ata4: SError: { RecovComm Persist PHYRdyChg 10B8B }
Dec 10 07:54:03 VOID kernel: ata4.00: failed command: READ DMA EXT
Dec 10 07:54:03 VOID kernel: ata4.00: cmd 25/00:00:20:d0:33/00:04:03:00:00/e0 tag 1 dma 524288 in
Dec 10 07:54:03 VOID kernel:         res 50/00:00:1f:3c:78/00:00:03:00:00/e0 Emask 0x10 (ATA bus error)
Dec 10 07:54:03 VOID kernel: ata4.00: status: { DRDY }
Dec 10 07:54:03 VOID kernel: ata4: hard resetting link
Dec 10 07:54:09 VOID kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 10 07:54:09 VOID kernel: ata4.00: configured for UDMA/133
Dec 10 07:54:09 VOID kernel: ata4: EH complete

 

I can't be 100% sure as the cabling runs behind the case wall for this drive and I would have to pick up and flip the entire server to get back there, but I believe it is hooked up directly to my motherboard SATA controller. void-diagnostics-20181210-0758.zip

 

 

EDIT: To illustrate what has been swapped where so far I took a pic of the server and made some colored circles. Red was swapped with red, green with green, and purple is my latest troublemaker in this post. 

IMG_20181201_091406.thumb.jpg.5546d4174c239282dcea8fb2db70d47c.jpg

Edited by weirdcrap
Link to comment

I'm thinking it might be time to buy a new PSU. because of how my PSU is mounted in the system I could never see the wattage rating and I couldn't quite remember what it was nor could I find my purchase receipt for it.

 

Finally shut the thing down and pulled it out and it is only a 650 which is kind of on the light side for everything I have hooked up in here and given that I'm guessing the PSU is at least 3 years old by now: https://outervision.com/b/U6ohNP

 

All my drives aren't 7200rpm but I couldnt recall how many so I just went with all faster disks in the calculation. 

 

EDIT: Finally found the email receipt, I bought it back in 2012!

 

EDIT2: Bought an 850W EVGA new in plastic on Ebay for $100: https://www.newegg.com/Product/Product.aspx?Item=9SIA85V4SC8056

 


Should future proof me plenty as I can't fit any more disks in this thing and I don't plan on attaching anything else that is power hungry. Maybe upgrading mobo and CPU at some point but that would be it.

Edited by weirdcrap
Link to comment

down to 1024 and it still happened within an hour of adjusting the values.

 

Dec 14 06:06:01 VOID kernel: mdcmd (58): set md_num_stripes 4096
Dec 14 06:06:01 VOID kernel: mdcmd (59): set md_sync_window 2048
Dec 14 06:06:01 VOID kernel: mdcmd (60): set md_sync_thresh 1024
Dec 14 06:06:01 VOID kernel: mdcmd (61): set md_write_method
Dec 14 06:06:01 VOID kernel: mdcmd (62): set spinup_group 0 0
Dec 14 06:06:01 VOID kernel: mdcmd (63): set spinup_group 1 0
Dec 14 06:06:01 VOID kernel: mdcmd (64): set spinup_group 2 0
Dec 14 06:06:01 VOID kernel: mdcmd (65): set spinup_group 3 0
Dec 14 06:06:01 VOID kernel: mdcmd (66): set spinup_group 4 0
Dec 14 06:06:01 VOID kernel: mdcmd (67): set spinup_group 5 0
Dec 14 06:06:01 VOID kernel: mdcmd (68): set spinup_group 6 0
Dec 14 06:06:01 VOID kernel: mdcmd (69): set spinup_group 7 0
Dec 14 06:06:01 VOID kernel: mdcmd (70): set spinup_group 8 0
Dec 14 06:06:01 VOID kernel: mdcmd (71): set spinup_group 9 0
Dec 14 06:06:01 VOID kernel: mdcmd (72): set spinup_group 10 0
Dec 14 06:06:01 VOID kernel: mdcmd (73): set spinup_group 11 0
Dec 14 06:06:01 VOID kernel: mdcmd (74): set spinup_group 12 0
Dec 14 06:06:01 VOID kernel: mdcmd (75): set spinup_group 13 0
Dec 14 06:06:01 VOID kernel: mdcmd (76): set spinup_group 14 0
Dec 14 06:06:01 VOID kernel: mdcmd (77): set spinup_group 15 0
Dec 14 06:06:01 VOID kernel: mdcmd (78): set spinup_group 16 0
Dec 14 06:06:01 VOID kernel: mdcmd (79): set spinup_group 17 0
Dec 14 06:06:01 VOID kernel: mdcmd (80): set spinup_group 18 0
Dec 14 06:06:01 VOID kernel: mdcmd (81): set spinup_group 19 0
Dec 14 06:06:28 VOID emhttpd: req (8): csrf_token=****************&title=System+Log&cmd=%2FwebGui%2Fscripts%2Ftail_log&arg1=syslog
Dec 14 06:06:28 VOID emhttpd: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog
Dec 14 06:59:00 VOID kernel: INFO: rcu_sched self-detected stall on CPU
Dec 14 06:59:00 VOID kernel: 	2-....: (60000 ticks this GP) idle=e4a/1/4611686018427387906 softirq=5969851/5969851 fqs=14532 
Dec 14 06:59:00 VOID kernel: 	 (t=60000 jiffies g=1902670 c=1902669 q=27572)
Dec 14 06:59:00 VOID kernel: NMI backtrace for cpu 2
Dec 14 06:59:00 VOID kernel: CPU: 2 PID: 8980 Comm: unraidd Not tainted 4.18.20-unRAID #1
Dec 14 06:59:00 VOID kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 R2.0, BIOS 2603 06/26/2015
Dec 14 06:59:00 VOID kernel: Call Trace:
Dec 14 06:59:00 VOID kernel: 
Dec 14 06:59:00 VOID kernel: dump_stack+0x5d/0x79
Dec 14 06:59:00 VOID kernel: nmi_cpu_backtrace+0x71/0x83
Dec 14 06:59:00 VOID kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
Dec 14 06:59:00 VOID kernel: nmi_trigger_cpumask_backtrace+0x57/0xd7
Dec 14 06:59:00 VOID kernel: rcu_dump_cpu_stacks+0x91/0xbb
Dec 14 06:59:00 VOID kernel: rcu_check_callbacks+0x23f/0x5ca
Dec 14 06:59:00 VOID kernel: ? tick_sched_handle.isra.5+0x2f/0x2f
Dec 14 06:59:00 VOID kernel: update_process_times+0x23/0x45
Dec 14 06:59:00 VOID kernel: tick_sched_timer+0x36/0x64
Dec 14 06:59:00 VOID kernel: __hrtimer_run_queues+0xb1/0x105
Dec 14 06:59:00 VOID kernel: hrtimer_interrupt+0xf4/0x20d
Dec 14 06:59:00 VOID kernel: smp_apic_timer_interrupt+0x79/0x91
Dec 14 06:59:00 VOID kernel: apic_timer_interrupt+0xf/0x20
Dec 14 06:59:00 VOID kernel: 
Dec 14 06:59:00 VOID kernel: RIP: 0010:xor_avx_5+0x231/0x352
Dec 14 06:59:00 VOID kernel: Code: c4 c1 7d 6f 92 40 01 00 00 c4 c1 6c 57 93 40 01 00 00 c5 ec 57 93 40 01 00 00 c5 ec 57 95 40 01 00 00 c5 ec 57 90 40 01 00 00  fd 7f 90 40 01 00 00 c4 c1 7d 6f 9a 60 01 00 00 c4 c1 64 57 9b 
Dec 14 06:59:00 VOID kernel: RSP: 0018:ffffc900021d3c68 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff13
Dec 14 06:59:00 VOID kernel: RAX: ffff88040f61ca00 RBX: ffff88040f606a00 RCX: ffff88040f606000
Dec 14 06:59:00 VOID kernel: RDX: 0000000000000000 RSI: ffff88040f61c000 RDI: 0000000000001000
Dec 14 06:59:00 VOID kernel: RBP: ffff88040f605a00 R08: ffff88040f607000 R09: ffff88040f610000
Dec 14 06:59:00 VOID kernel: R10: ffff88040f610a00 R11: ffff88040f607a00 R12: 0000000000000a00
Dec 14 06:59:00 VOID kernel: R13: ffff88040f61c000 R14: ffff88040f605000 R15: ffff88040f606000
Dec 14 06:59:00 VOID kernel: ? xor_avx_5+0x2d/0x352
Dec 14 06:59:00 VOID kernel: check_parity+0x118/0x349 [md_mod]
Dec 14 06:59:00 VOID kernel: handle_stripe+0xe8a/0x1226 [md_mod]
Dec 14 06:59:00 VOID kernel: unraidd+0xbc/0x123 [md_mod]
Dec 14 06:59:00 VOID kernel: ? md_open+0x2c/0x2c [md_mod]
Dec 14 06:59:00 VOID kernel: md_thread+0xcc/0xf1 [md_mod]
Dec 14 06:59:00 VOID kernel: ? wait_woken+0x68/0x68
Dec 14 06:59:00 VOID kernel: kthread+0x10b/0x113
Dec 14 06:59:00 VOID kernel: ? kthread_flush_work_fn+0x9/0x9
Dec 14 06:59:00 VOID kernel: ret_from_fork+0x22/0x40
Dec 14 07:01:40 VOID kernel: INFO: rcu_sched self-detected stall on CPU
Dec 14 07:01:40 VOID kernel: 	3-....: (59999 ticks this GP) idle=5fa/1/4611686018427387906 softirq=5826078/5826078 fqs=14552 
Dec 14 07:01:40 VOID kernel: 	 (t=60001 jiffies g=1902684 c=1902683 q=32177)
Dec 14 07:01:40 VOID kernel: NMI backtrace for cpu 3

Down to 800 now to see if that will be stable.

 

EDIT: already call traced at 800

 

Dec 14 07:26:02 VOID kernel: CPU: 4 PID: 8980 Comm: unraidd Not tainted 4.18.20-unRAID #1
Dec 14 07:26:02 VOID kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 R2.0, BIOS 2603 06/26/2015
Dec 14 07:26:02 VOID kernel: Call Trace:
Dec 14 07:26:02 VOID kernel: <IRQ>
Dec 14 07:26:02 VOID kernel: dump_stack+0x5d/0x79
Dec 14 07:26:02 VOID kernel: nmi_cpu_backtrace+0x71/0x83
Dec 14 07:26:02 VOID kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
Dec 14 07:26:02 VOID kernel: nmi_trigger_cpumask_backtrace+0x57/0xd7
Dec 14 07:26:02 VOID kernel: rcu_dump_cpu_stacks+0x91/0xbb
Dec 14 07:26:02 VOID kernel: rcu_check_callbacks+0x23f/0x5ca
Dec 14 07:26:02 VOID kernel: ? tick_sched_handle.isra.5+0x2f/0x2f
Dec 14 07:26:02 VOID kernel: update_process_times+0x23/0x45
Dec 14 07:26:02 VOID kernel: tick_sched_timer+0x36/0x64
Dec 14 07:26:02 VOID kernel: __hrtimer_run_queues+0xb1/0x105
Dec 14 07:26:02 VOID kernel: hrtimer_interrupt+0xf4/0x20d
Dec 14 07:26:02 VOID kernel: smp_apic_timer_interrupt+0x79/0x91
Dec 14 07:26:02 VOID kernel: apic_timer_interrupt+0xf/0x20
Dec 14 07:26:02 VOID kernel: </IRQ>
Dec 14 07:26:02 VOID kernel: RIP: 0010:xor_avx_5+0x28d/0x352
Dec 14 07:26:02 VOID kernel: Code: c5 fd 7f 98 60 01 00 00 c4 c1 7d 6f 82 80 01 00 00 c4 c1 7c 57 83 80 01 00 00 c5 fc 57 83 80 01 00 00 c5 fc 57 85 80 01 00 00 <c5> fc 57 80 80 01 00 00 c5 fd 7f 80 80 01 00 00 c4 c1 7d 6f 8a a0 
Dec 14 07:26:02 VOID kernel: RSP: 0018:ffffc900021d3c68 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff13
Dec 14 07:26:02 VOID kernel: RAX: ffff88040f0f1600 RBX: ffff88040f0e3600 RCX: ffff88040f0e3000
Dec 14 07:26:02 VOID kernel: RDX: 0000000000000000 RSI: ffff88040f0f1000 RDI: 0000000000001000
Dec 14 07:26:02 VOID kernel: RBP: ffff88040f0e2600 R08: ffff88040f0e4000 R09: ffff88040f0e5000
Dec 14 07:26:02 VOID kernel: R10: ffff88040f0e5600 R11: ffff88040f0e4600 R12: 0000000000000600
Dec 14 07:26:02 VOID kernel: R13: ffff88040f0f1000 R14: ffff88040f0e2000 R15: ffff88040f0e3000
Dec 14 07:26:02 VOID kernel: ? xor_avx_5+0x2d/0x352
Dec 14 07:26:02 VOID kernel: check_parity+0x118/0x349 [md_mod]
Dec 14 07:26:02 VOID kernel: handle_stripe+0xe8a/0x1226 [md_mod]
Dec 14 07:26:02 VOID kernel: unraidd+0xbc/0x123 [md_mod]
Dec 14 07:26:02 VOID kernel: ? md_open+0x2c/0x2c [md_mod]
Dec 14 07:26:02 VOID kernel: md_thread+0xcc/0xf1 [md_mod]
Dec 14 07:26:02 VOID kernel: ? wait_woken+0x68/0x68
Dec 14 07:26:02 VOID kernel: kthread+0x10b/0x113
Dec 14 07:26:02 VOID kernel: ? kthread_flush_work_fn+0x9/0x9
Dec 14 07:26:02 VOID kernel: ret_from_fork+0x22/0x40

EDIT2: I'm just going to take it all the way back to the default 192.

 

EDIT3: After further tweaking, I got no more call traces with a sync_thresh value of 256 or lower. Running a second check now to ensure my results are consistent. IT doesn't seem to drastically affect my parity check speed either which is always nice.

Edited by weirdcrap
Link to comment

New call trace that is different from the others. The is is the only one I have gotten so far in the last two parity checks. 

 

Dec 14 16:53:15 VOID emhttpd: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog
Dec 14 18:54:37 VOID kernel: ------------[ cut here ]------------
Dec 14 18:54:37 VOID kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Dec 14 18:54:37 VOID kernel: WARNING: CPU: 4 PID: 31 at net/sched/sch_generic.c:461 dev_watchdog+0x150/0x1a8
Dec 14 18:54:37 VOID kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod it87 hwmon_vid bonding edac_mce_amd kvm_amd ccp kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd r8169 mpt3sas fam15h_power wmi_bmof ahci mxm_wmi glue_helper i2c_piix4 k10temp i2c_core mii wmi libahci raid_class pcc_cpufreq scsi_transport_sas button acpi_cpufreq
Dec 14 18:54:37 VOID kernel: CPU: 4 PID: 31 Comm: ksoftirqd/4 Not tainted 4.18.20-unRAID #1
Dec 14 18:54:37 VOID kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 R2.0, BIOS 2603 06/26/2015
Dec 14 18:54:37 VOID kernel: RIP: 0010:dev_watchdog+0x150/0x1a8
Dec 14 18:54:37 VOID kernel: Code: 15 fd 97 00 00 75 36 4c 89 ef c6 05 09 fd 97 00 01 e8 93 c5 fd ff 89 e9 4c 89 ee 48 c7 c7 ee 0f d9 81 48 89 c2 e8 53 c0 b2 ff <0f> 0b eb 0f ff c5 48 81 c2 40 01 00 00 39 cd 75 98 eb 13 48 8b 83 
Dec 14 18:54:37 VOID kernel: RSP: 0018:ffffc900019cbdb0 EFLAGS: 00010282
Dec 14 18:54:37 VOID kernel: RAX: 0000000000000000 RBX: ffff88044bb363b0 RCX: 0000000000000007
Dec 14 18:54:37 VOID kernel: RDX: 0000000000000000 RSI: ffff88045ed16470 RDI: ffff88045ed16470
Dec 14 18:54:37 VOID kernel: RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000020400
Dec 14 18:54:37 VOID kernel: R10: 0000000000000987 R11: 000000000000b6a8 R12: ffff88044bb3639c
Dec 14 18:54:37 VOID kernel: R13: ffff88044bb36000 R14: ffff8804482e5c80 R15: 0000000000000004
Dec 14 18:54:37 VOID kernel: FS:  0000000000000000(0000) GS:ffff88045ed00000(0000) knlGS:0000000000000000
Dec 14 18:54:37 VOID kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 14 18:54:37 VOID kernel: CR2: 0000000001af7544 CR3: 0000000447f76000 CR4: 00000000000406e0
Dec 14 18:54:37 VOID kernel: Call Trace:
Dec 14 18:54:37 VOID kernel: call_timer_fn+0x18/0x7b
Dec 14 18:54:37 VOID kernel: ? qdisc_reset+0xc0/0xc0
Dec 14 18:54:37 VOID kernel: expire_timers+0x7f/0x8e
Dec 14 18:54:37 VOID kernel: run_timer_softirq+0x72/0x120
Dec 14 18:54:37 VOID kernel: ? __switch_to_asm+0x34/0x70
Dec 14 18:54:37 VOID kernel: ? __switch_to_asm+0x40/0x70
Dec 14 18:54:37 VOID kernel: ? __switch_to+0x1fe/0x30d
Dec 14 18:54:37 VOID kernel: ? __switch_to_asm+0x40/0x70
Dec 14 18:54:37 VOID kernel: __do_softirq+0xce/0x1e2
Dec 14 18:54:37 VOID kernel: ? smpboot_park_thread+0x25/0x25
Dec 14 18:54:37 VOID kernel: run_ksoftirqd+0x19/0x2d
Dec 14 18:54:37 VOID kernel: smpboot_thread_fn+0x134/0x149
Dec 14 18:54:37 VOID kernel: kthread+0x10b/0x113
Dec 14 18:54:37 VOID kernel: ? kthread_flush_work_fn+0x9/0x9
Dec 14 18:54:37 VOID kernel: ret_from_fork+0x22/0x40
Dec 14 18:54:37 VOID kernel: ---[ end trace 3af301e239cd4c04 ]---
Dec 14 18:54:37 VOID kernel: r8169 0000:02:00.0 eth0: link up
Dec 14 21:25:12 VOID kernel: mdcmd (222): spindown 13
Dec 14 23:00:01 VOID Plugin Auto Update: Checking for available plugin updates
Dec 14 23:00:06 VOID Plugin Auto Update: Community Applications Plugin Auto Update finished
Dec 15 00:02:43 VOID crond[1836]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Dec 15 00:10:39 VOID kernel: mdcmd (223): spindown 8
Dec 15 00:10:40 VOID kernel: mdcmd (224): spindown 10
Dec 15 00:10:41 VOID kernel: mdcmd (225): spindown 12
Dec 15 00:10:47 VOID kernel: mdcmd (226): spindown 9
Dec 15 00:10:49 VOID kernel: mdcmd (227): spindown 7
Dec 15 00:11:00 VOID kernel: mdcmd (228): spindown 15
Dec 15 02:00:02 VOID Docker Auto Update: Community Applications Docker Autoupdate running
Dec 15 02:00:02 VOID Docker Auto Update: Checking for available updates
Dec 15 02:00:09 VOID Docker Auto Upda

Link to comment
On 12/15/2018 at 4:54 AM, johnnie.black said:

That one is NIC related, and quite common and usually not a big deal.

Sweet, I thought it looked NIC related and am glad it is nothing to be concerned about.

 

256 sync_thresh and the new PSU seems to have fixed all my problems.

Going to start swapping disks back into their original locations and see if they continue to behave themselves.

 

EDIT:

 

Final result, it appears to have been all down to an underpowered PSU. After putting the new PSU in and swapping all disks and cables back I have had zero issues.

 

Thanks for all your help Johnnie.

Edited by weirdcrap
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.