March 3, 20188 yr In RC1 & RC2, if all disk spin down, then shutdown/reboot won't work, GUI gone but telnet still function and /dev/md2 md3 .... still in mount state. Only hard reset can reboot and array will auto parity check when start again. diagnostics not work too ....... Filesystem 1K-blocks Used Available Use% Mounted on rootfs 16358820 583740 15775080 4% / tmpfs 32768 220 32548 1% /run devtmpfs 16396408 0 16396408 0% /dev tmpfs 16440244 0 16440244 0% /dev/shm cgroup_root 8192 0 8192 0% /sys/fs/cgroup tmpfs 131072 756 130316 1% /var/log /dev/sda1 7800304 554656 7245648 8% /boot /dev/loop0 7680 7680 0 100% /lib/modules /dev/loop1 4608 4608 0 100% /lib/firmware /dev/md2 5860522532 5760145344 100117472 99% /mnt/disk2 /dev/md3 5860522532 5764139424 96021264 99% /mnt/disk3 /dev/md4 5860522532 5769269168 90980816 99% /mnt/disk4 /dev/md5 5860522532 5467203020 392547092 94% /mnt/disk5 /dev/md6 5860522532 5762692312 97055912 99% /mnt/disk6 /dev/md7 5860522532 971330608 4888639312 17% /mnt/disk7 /dev/nvme0n1p1 500107576 82069920 417126992 17% /mnt/cacheroot@X370:~# diagnostics Starting diagnostics collection... Edited March 3, 20188 yr by Benson
March 3, 20188 yr Author Try to get diagnostics , but it just hang in collection ... BTW, dmesg woulg help ? It seems md2 md3 ...... haven't spin up [ 73.970071] mdcmd (60): check correct [ 73.970083] md: recovery thread: check P Q ... [ 73.973566] md: using 15360k window, over a total of 5860522532 blocks. [ 112.950447] mdcmd (61): nocheck [ 113.001749] md: md_do_sync: got signal, exit... [ 113.039032] md: recovery thread: completion status: -4 [ 259.021840] mdcmd (62): spindown 0 [ 259.892214] mdcmd (63): spindown 1 [ 260.439064] mdcmd (64): spindown 2 [ 261.259056] mdcmd (65): spindown 3 [ 262.078673] mdcmd (66): spindown 4 [ 262.908611] mdcmd (67): spindown 5 [ 263.743781] mdcmd (68): spindown 6 [ 264.670593] mdcmd (69): spindown 7 [ 265.644618] mdcmd (70): spindown 29 [ 290.608096] mdcmd (71): nocheck [ 290.608099] md: nocheck_array: check not active [ 290.608268] mdcmd (72): spinup 0 [ 290.608286] mdcmd (73): spinup 1 [ 290.608301] mdcmd (74): spinup 2 [ 290.608317] mdcmd (75): spinup 3 [ 290.608331] mdcmd (76): spinup 4 [ 290.608340] mdcmd (77): spinup 5 [ 290.608351] mdcmd (78): spinup 6 [ 290.608362] mdcmd (79): spinup 7 [ 290.608386] mdcmd (80): spinup 29 [ 304.913977] nfsd: last server has exited, flushing export cache
March 3, 20188 yr Author Attach, thanks Will try safe mode have problem or not syslog Edited January 19, 20206 yr by Benson
March 3, 20188 yr Author Try in safe mode, same problem found In safe mode spin down OK spin up OK spin down OK reboot ...... same problem Mar 3 18:36:00 X370 emhttpd: shcmd (149): /etc/rc.d/rc.samba stop Mar 3 18:36:00 X370 emhttpd: shcmd (150): rm -f /etc/avahi/services/smb.service Mar 3 18:36:00 X370 avahi-daemon[2088]: Files changed, reloading. Mar 3 18:36:00 X370 avahi-daemon[2088]: Service group file /services/smb.service vanished, removing services. Mar 3 18:36:00 X370 emhttpd: shcmd (152): /etc/rc.d/rc.nfsd stop Mar 3 18:36:00 X370 rpc.mountd[2073]: Caught signal 15, un-registering and exiting. Mar 3 18:36:01 X370 emhttpd: Stopping mover... Mar 3 18:36:01 X370 emhttpd: shcmd (154): /usr/local/sbin/mover stop Mar 3 18:36:01 X370 root: mover: not running Mar 3 18:36:01 X370 emhttpd: Sync filesystems... Mar 3 18:36:01 X370 emhttpd: shcmd (155): sync Mar 3 18:36:01 X370 kernel: nfsd: last server has exited, flushing export cache Mar 3 18:36:02 X370 emhttpd: shcmd (156): umount /mnt/user Mar 3 18:36:02 X370 emhttpd: shcmd (157): rmdir /mnt/user Mar 3 18:36:02 X370 emhttpd: shcmd (158): rm -f /boot/config/plugins/dynamix/mover.cron Mar 3 18:36:02 X370 emhttpd: shcmd (159): /usr/local/sbin/update_cron Mar 3 18:36:02 X370 emhttpd: Unmounting disks... Mar 3 18:36:02 X370 emhttpd: shcmd (160): umount /mnt/disk1 Mar 3 18:36:02 X370 kernel: ------------[ cut here ]------------ Mar 3 18:36:02 X370 kernel: WARNING: CPU: 2 PID: 0 at kernel/rcu/tree.c:2725 rcu_process_callbacks+0x320/0x36b Mar 3 18:36:02 X370 kernel: Modules linked in: nfsd lockd grace sunrpc md_mod igb i2c_algo_bit ixgbe ptp pps_core mdio edac_mce_amd kvm mpt3sas crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel ahci libahci aes_x86_64 raid_class crypto_simd ccp scsi_transport_sas i2c_piix4 glue_helper mxm_wmi i2c_core wmi_bmof nvme cryptd nvme_core wmi button acpi_cpufreq [last unloaded: pps_core] Mar 3 18:36:02 X370 kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.23-unRAID #1 Mar 3 18:36:02 X370 kernel: Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 3803 01/22/2018 Mar 3 18:36:02 X370 kernel: task: ffff8807fb652a00 task.stack: ffffc900031b4000 Mar 3 18:36:02 X370 kernel: RIP: 0010:rcu_process_callbacks+0x320/0x36b Mar 3 18:36:02 X370 kernel: RSP: 0018:ffff88081ec83f18 EFLAGS: 00010002 Mar 3 18:36:02 X370 kernel: RAX: ffffffffffffd800 RBX: ffff88081eca14c0 RCX: 000000000066c601 Mar 3 18:36:02 X370 kernel: RDX: 0000000000000003 RSI: ffff88081ec83f20 RDI: ffff88081eca14f8 Mar 3 18:36:02 X370 kernel: RBP: ffffffff81c399c0 R08: 0000000000024080 R09: ffffffff8108c8f5 Mar 3 18:36:02 X370 kernel: R10: ffffea001e689000 R11: 0000000000000020 R12: ffff88081eca14f8 Mar 3 18:36:02 X370 kernel: R13: 7fffffffffffffff R14: 0000000000000246 R15: fffffffffffffffc Mar 3 18:36:02 X370 kernel: FS: 0000000000000000(0000) GS:ffff88081ec80000(0000) knlGS:0000000000000000 Mar 3 18:36:02 X370 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 3 18:36:02 X370 kernel: CR2: 000000000072d688 CR3: 00000007f8e04000 CR4: 00000000003406e0 Mar 3 18:36:02 X370 kernel: Call Trace: Mar 3 18:36:02 X370 kernel: <IRQ> Mar 3 18:36:02 X370 kernel: ? hrtimer_forward+0x74/0x7c Mar 3 18:36:02 X370 kernel: __do_softirq+0xcd/0x1c2 Mar 3 18:36:02 X370 kernel: irq_exit+0x4f/0x8e Mar 3 18:36:02 X370 kernel: smp_apic_timer_interrupt+0x7a/0x85 Mar 3 18:36:02 X370 kernel: apic_timer_interrupt+0x7d/0x90 Mar 3 18:36:02 X370 kernel: </IRQ> Mar 3 18:36:02 X370 kernel: RIP: 0010:native_safe_halt+0x2/0x3 Mar 3 18:36:02 X370 kernel: RSP: 0018:ffffc900031b7f20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 Mar 3 18:36:02 X370 kernel: RAX: ffffffff815dd668 RBX: 0000000000000000 RCX: 0100000000000000 Mar 3 18:36:02 X370 kernel: RDX: 0000000000016e32 RSI: 0000000000000000 RDI: 0000000000000000 Mar 3 18:36:02 X370 kernel: RBP: 0000000000000000 R08: 0000008621b25fdc R09: 000000000000ff00 Mar 3 18:36:02 X370 kernel: R10: ffffc900031b7eb0 R11: ffff88081efc0000 R12: 0000000000000000 Mar 3 18:36:02 X370 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Mar 3 18:36:02 X370 kernel: ? ldsem_down_write+0x1b4/0x1b4 Mar 3 18:36:02 X370 kernel: default_idle+0x5/0x8 Mar 3 18:36:02 X370 kernel: do_idle+0xbe/0x179 Mar 3 18:36:02 X370 kernel: cpu_startup_entry+0x18/0x1a Mar 3 18:36:02 X370 kernel: secondary_startup_64+0xa5/0xb0 Mar 3 18:36:02 X370 kernel: Code: a8 00 00 00 eb 13 48 2b 05 eb f3 ba 00 48 39 c2 7d 07 48 89 93 90 00 00 00 48 83 7b 38 00 0f 94 c1 48 85 d2 0f 94 c0 38 c1 74 02 <0f> 0b 4c 89 f7 57 9d 0f 1f 44 00 00 4c 89 e7 e8 1f 0e 00 00 84 Mar 3 18:36:02 X370 kernel: ---[ end trace 6df911811ecdaf58 ]--- Mar 3 18:38:49 X370 sshd[3099]: Accepted password for root from 192.168.9.181 port 64862 ssh2 Edited March 3, 20188 yr by Benson
March 3, 20188 yr Author Actually say, this not relate shutdown/reboot. Once disk spin down, it can't stop array (unmount failed) .... Will return back some hrs later, may try more test.
March 3, 20188 yr 9 minutes ago, Benson said: Actually say, this not relate shutdown/reboot. Once disk spin down, it can't stop array (unmount failed) .... Will return back some hrs later, may try more test. Do you have a CLI session or something else open which may prevent the unmounting?
March 3, 20188 yr I wonder if this is related to a new call trace I was getting, which would knock network shares offline and prohibit shutdown: Mar 2 19:34:54 Brahms1 kernel: md: sync done. time=42513sec Mar 2 19:34:54 Brahms1 kernel: md: recovery thread: completion status: 0 Mar 2 19:49:55 Brahms1 kernel: mdcmd (50): spindown 0 Mar 2 19:49:56 Brahms1 kernel: mdcmd (51): spindown 6 Mar 2 19:56:10 Brahms1 emhttpd: req (3): csrf_token=****************&title=System+Log&cmd=%2FwebGui%2Fscripts%2Ftail_log&arg1=syslog Mar 2 19:56:10 Brahms1 emhttpd: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog Mar 2 20:00:16 Brahms1 kernel: ------------[ cut here ]------------ Mar 2 20:00:16 Brahms1 kernel: WARNING: CPU: 5 PID: 0 at kernel/rcu/tree.c:2725 rcu_process_callbacks+0x320/0x36b Mar 2 20:00:16 Brahms1 kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod ipmi_devintf bonding tg3 ipmi_ssif intel_powerclamp coretemp kvm_intel kvm mpt3sas crc32c_intel igb intel_cstate raid_class intel_uncore scsi_transport_sas ptp pps_core i2c_i801 ata_piix i2c_algo_bit i2c_core button ipmi_si acpi_cpufreq [last unloaded: tg3] Mar 2 20:00:16 Brahms1 kernel: CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.14.23-unRAID #1 Mar 2 20:00:16 Brahms1 kernel: Hardware name: HP ProLiant DL120 G6/ProLiant DL120 G6, BIOS O26 07/01/2013 Mar 2 20:00:16 Brahms1 kernel: task: ffff88013ab6bfc0 task.stack: ffffc900006b4000 Mar 2 20:00:16 Brahms1 kernel: RIP: 0010:rcu_process_callbacks+0x320/0x36b Mar 2 20:00:16 Brahms1 kernel: RSP: 0018:ffff88013fd43f18 EFLAGS: 00010002 Mar 2 20:00:16 Brahms1 kernel: RAX: ffffffffffffd800 RBX: ffff88013fd614c0 RCX: 00000000313f7a01 Mar 2 20:00:16 Brahms1 kernel: RDX: 0000000000000001 RSI: ffff88013fd43f20 RDI: ffff88013fd614f8 Mar 2 20:00:16 Brahms1 kernel: RBP: ffffffff81c399c0 R08: 0000000000024080 R09: ffffffff8108c8f5 Mar 2 20:00:16 Brahms1 kernel: R10: ffffea000278e600 R11: 0000000000000020 R12: ffff88013fd614f8 Mar 2 20:00:16 Brahms1 kernel: R13: 7fffffffffffffff R14: 0000000000000246 R15: fffffffffffffffe Mar 2 20:00:16 Brahms1 kernel: FS: 0000000000000000(0000) GS:ffff88013fd40000(0000) knlGS:0000000000000000 Mar 2 20:00:16 Brahms1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 2 20:00:16 Brahms1 kernel: CR2: 0000149d89457000 CR3: 0000000001c0a000 CR4: 00000000000026e0 Mar 2 20:00:16 Brahms1 kernel: Call Trace: Mar 2 20:00:16 Brahms1 kernel: <IRQ> Mar 2 20:00:16 Brahms1 kernel: ? hrtimer_forward+0x74/0x7c Mar 2 20:00:16 Brahms1 kernel: __do_softirq+0xcd/0x1c2 Mar 2 20:00:16 Brahms1 kernel: irq_exit+0x4f/0x8e Mar 2 20:00:16 Brahms1 kernel: smp_apic_timer_interrupt+0x7a/0x85 Mar 2 20:00:16 Brahms1 kernel: apic_timer_interrupt+0x7d/0x90 Mar 2 20:00:16 Brahms1 kernel: </IRQ> Mar 2 20:00:16 Brahms1 kernel: RIP: 0010:cpuidle_enter_state+0xe0/0x135 Mar 2 20:00:16 Brahms1 kernel: RSP: 0018:ffffc900006b7ef8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 Mar 2 20:00:16 Brahms1 kernel: RAX: ffff88013fd60940 RBX: 0000000000000000 RCX: 000000000000001f Mar 2 20:00:16 Brahms1 kernel: RDX: 0000283bad811e98 RSI: 0000000000020140 RDI: 0000000000000000 Mar 2 20:00:16 Brahms1 kernel: RBP: ffff88013fd68800 R08: 00007081f2485c1f R09: 0000000000000001 Mar 2 20:00:16 Brahms1 kernel: R10: ffffc900006b7ed8 R11: 00000000002f9418 R12: 0000000000000003 Mar 2 20:00:16 Brahms1 kernel: R13: 0000283bad811e98 R14: ffffffff81c591f8 R15: 0000283bad71f531 Mar 2 20:00:16 Brahms1 kernel: ? cpuidle_enter_state+0xbb/0x135 Mar 2 20:00:16 Brahms1 kernel: do_idle+0x11a/0x179 Mar 2 20:00:16 Brahms1 kernel: cpu_startup_entry+0x18/0x1a Mar 2 20:00:16 Brahms1 kernel: secondary_startup_64+0xa5/0xb0 Mar 2 20:00:16 Brahms1 kernel: Code: a8 00 00 00 eb 13 48 2b 05 eb f3 ba 00 48 39 c2 7d 07 48 89 93 90 00 00 00 48 83 7b 38 00 0f 94 c1 48 85 d2 0f 94 c0 38 c1 74 02 <0f> 0b 4c 89 f7 57 9d 66 66 90 66 90 4c 89 e7 e8 1f 0e 00 00 84 Mar 2 20:00:16 Brahms1 kernel: ---[ end trace e13abcfd14c19384 ]--- Mar 2 21:00:02 Brahms1 root: /etc/libvirt: 920.5 MiB (965189632 bytes) trimmed Mar 2 21:00:02 Brahms1 root: /var/lib/docker: 16.5 GiB (17667182592 bytes) trimmed Mar 2 21:00:02 Brahms1 root: /mnt/cache: 377.7 GiB (405560147968 bytes) trimmed Mar 3 00:00:02 Brahms1 Plugin Auto Update: Checking for available plugin updates Mar 3 00:00:05 Brahms1 Plugin Auto Update: ca.update.applications.plg version 2018.03.02 does not meet age requirements to update Mar 3 00:00:05 Brahms1 Plugin Auto Update: ca.backup2.plg version 2018.03.02 does not meet age requirements to update Mar 3 00:00:05 Brahms1 Plugin Auto Update: community.applications.plg version 2018.03.02 does not meet age requirements to update Mar 3 00:00:06 Brahms1 Plugin Auto Update: Community Applications Plugin Auto Update finished
March 3, 20188 yr 3 hours ago, 1812 said: I wonder if this is related to a new call trace I was getting, which would knock network shares offline and prohibit shutdown: Does the same thing happen when you start the system in safe mode?
March 3, 20188 yr Author 13 hours ago, bonienl said: Do you have a CLI session or something else open which may prevent the unmounting? Sure haven't Just return back and try some test (reset many many time) , below all can't fix the problem - Direct I/O, on / off - BIOS CSM, on / off - UEFI boot mode, on / off - upgrade to RC3 Some more detail during troubleshoot Problem : Once spin down all or most drive, even spin up before stop array, the array will sit on waiting umount. Telnet or GUI still function. This problem would not happen in "maintenance mode" Additional info. In Telnet, if I type "umount /mnt/diskX" , then it will hang the session, but disk actually be unmount Thanks Giveup first, going to sleep !!!!!!!! Edited March 4, 20188 yr by Benson
March 3, 20188 yr Can't reproduce this issue. Try this: Start array Spindown Spinup Stop array Does that sequence fail?
March 3, 20188 yr Perhaps something to do with his nvme device? (form syslog) Mar 3 18:09:03 X370 kernel: mdcmd (108): spinup 0 Mar 3 18:09:03 X370 kernel: mdcmd (109): spinup 1 Mar 3 18:09:03 X370 kernel: mdcmd (110): spinup 2 Mar 3 18:09:03 X370 kernel: mdcmd (111): spinup 3 Mar 3 18:09:03 X370 kernel: mdcmd (112): spinup 4 Mar 3 18:09:03 X370 kernel: mdcmd (113): spinup 5 Mar 3 18:09:03 X370 kernel: mdcmd (114): spinup 6 Mar 3 18:09:03 X370 kernel: mdcmd (115): spinup 7 Mar 3 18:09:03 X370 kernel: mdcmd (116): spinup 29 Mar 3 18:09:03 X370 root: HDIO_DRIVE_CMD(setidle) failed: Inappropriate ioctl for device Mar 3 18:09:03 X370 root: Mar 3 18:09:03 X370 root: /dev/nvme0n1: Mar 3 18:09:03 X370 root: setting standby to 0 (off) Mar 3 18:09:03 X370 emhttpd: shcmd (300): exit status: 25
March 3, 20188 yr Just now, bonienl said: Perhaps something to do with his nvme device? (form syslog) Mar 3 18:09:03 X370 kernel: mdcmd (108): spinup 0 Mar 3 18:09:03 X370 kernel: mdcmd (109): spinup 1 Mar 3 18:09:03 X370 kernel: mdcmd (110): spinup 2 Mar 3 18:09:03 X370 kernel: mdcmd (111): spinup 3 Mar 3 18:09:03 X370 kernel: mdcmd (112): spinup 4 Mar 3 18:09:03 X370 kernel: mdcmd (113): spinup 5 Mar 3 18:09:03 X370 kernel: mdcmd (114): spinup 6 Mar 3 18:09:03 X370 kernel: mdcmd (115): spinup 7 Mar 3 18:09:03 X370 kernel: mdcmd (116): spinup 29 Mar 3 18:09:03 X370 root: HDIO_DRIVE_CMD(setidle) failed: Inappropriate ioctl for device Mar 3 18:09:03 X370 root: Mar 3 18:09:03 X370 root: /dev/nvme0n1: Mar 3 18:09:03 X370 root: setting standby to 0 (off) Mar 3 18:09:03 X370 emhttpd: shcmd (300): exit status: 25 Don't think so. I think it's related to the rcu issue. The disk1 device never unmounts.
March 4, 20188 yr Author 6 hours ago, limetech said: Can't reproduce this issue. Try this: Start array Spindown Spinup Stop array Does that sequence fail? Yes, 100% eachtime, 3rd step Spinup won't make things different. i.e. Start array Spindown Stop array ..... problem happen This fault 1st happen in normal automatic disk spin-down, and make me to troubleshoot. So if I want to use 6.5 then I must set all disk never spin-down. Just wonder no one ( ryzen only ? ) report such critical problem. Haven't try RC4 Edited March 4, 20188 yr by Benson
March 4, 20188 yr 8 hours ago, bonienl said: Does the same thing happen when you start the system in safe mode? don't know. server is running normally right now on rc3, but here is another new to me message in the logs regarding hrtimer: Mar 3 08:10:58 Brahms1 kernel: mdcmd (52): spindown 1 Mar 3 08:11:27 Brahms1 kernel: mdcmd (53): spindown 2 Mar 3 08:11:29 Brahms1 kernel: mdcmd (54): spindown 3 Mar 3 08:11:30 Brahms1 kernel: mdcmd (55): spindown 4 Mar 3 08:11:30 Brahms1 kernel: mdcmd (56): spindown 5 Mar 3 08:11:59 Brahms1 kernel: mdcmd (57): spindown 6 Mar 3 08:12:01 Brahms1 kernel: mdcmd (58): spindown 0 Mar 3 08:12:01 Brahms1 kernel: mdcmd (59): spindown 7 Mar 3 10:35:53 Brahms1 kernel: hrtimer: interrupt took 44531 ns Mar 3 19:36:09 Brahms1 kernel: mdcmd (60): spindown 7
Archived
This topic is now archived and is closed to further replies.