October 12, 201510 yr Now that I've made sure that I have no hardware issues... I was deleting some files from my TV Shared Folder... And the operation stopped and windows explorer crashed. Now my shfs is running at 100% and the server load is slowly rising. The load will continue rising until the machine becomes completely unresponsive. I am also unable to reboot the server, it just stalls and will not reboot until the reset/power button is physically pressed. Unraid 6.1.3 BIOSTAR NM70I-1037u 1.8g Dual Core Celeron 4G DDR3 1600 1x SYBA SI-PEX40064 1x 3tb cache 1x 2tb data 2x 1tb data 3x 500g data 1x 500g cache Dockers: Sabnzbd Sickbeard Couchpotato Apache Crashplan HTPC-Manager Transmission infosphere-diagnostics-20151012-1900.zip
October 13, 201510 yr Author Restarted server and I was able to delete the same files that caused the last lockup. Its seems that something causes the array to become unwritable, which causes processes to hang and eventually lockup the system? Any Ideas? This is HUGELY inconvenient.
October 13, 201510 yr Author I can't seem to find anything wrong with the config or any errors. I checked all the disks using the GUI tools and all of them passed. None of the disks throw SMART errors. MEMTEST ran for 24hours with 0 Errors. I'm running out of things to try.
October 13, 201510 yr I would suggest also running a file system check on each of the data/cache disks to make sure the problem is not due to some file system corruption.
October 13, 201510 yr Author I would suggest also running a file system check on each of the data/cache disks to make sure the problem is not due to some file system corruption. This has been done. Several Times in fact.
October 14, 201510 yr Author Any other troubleshooting? Is there anything in the logs that looks funny... Cause I'm not seeing anything?
October 24, 201510 yr Author Did it again... after 11 days this time. infosphere-diagnostics-20151024-1838.zip
October 24, 201510 yr Author root@InfoSphere:/usr/local/emhttp# lsof -p 21149 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME shfs 21149 root cwd DIR 0,2 380 2 / shfs 21149 root rtd DIR 0,2 380 2 / shfs 21149 root txt REG 0,2 83216 4568 /usr/local/sbin/shfs shfs 21149 root mem REG 0,2 171470 6323 /lib64/ld-2.17.so shfs 21149 root mem REG 0,2 134893 6252 /lib64/libpthread-2.17.so shfs 21149 root mem REG 0,2 48928 6341 /lib64/libcrypt-2.17.so shfs 21149 root mem REG 0,2 1944360 6269 /lib64/libcrypto.so.1.0.0 shfs 21149 root mem REG 0,2 242840 6305 /lib64/libfuse.so.2.9.3 shfs 21149 root mem REG 0,2 2102965 6246 /lib64/libc-2.17.so shfs 21149 root mem REG 0,2 18988 6296 /lib64/libdl-2.17.so shfs 21149 root 0u CHR 1,3 0t0 1029 /dev/null shfs 21149 root 1u CHR 1,3 0t0 1029 /dev/null shfs 21149 root 2u CHR 1,3 0t0 1029 /dev/null shfs 21149 root 3u CHR 10,229 0t0 5361 /dev/fuse shfs 21149 root 6r REG 0,2 984 4588 /usr/local/emhttp/update.htm
October 28, 201510 yr Author Think I'm going to throw in the towel. I'm going to roll this box back to version 5. The goodies are nice, but I need a reliable NAS more.
October 28, 201510 yr You may need to run in safe mode for a while to see what might be causing the situation. I saw this in the diagnostic syslog. Oct 18 18:41:50 InfoSphere kernel: device vnet0 entered promiscuous mode Oct 18 18:41:50 InfoSphere kernel: VM: port 2(vnet0) entered listening state Oct 18 18:41:50 InfoSphere kernel: VM: port 2(vnet0) entered listening state Oct 18 18:41:54 InfoSphere kernel: kvm: zapping shadow pages for mmio generation wraparound Oct 18 18:42:05 InfoSphere kernel: VM: port 2(vnet0) entered learning state Oct 18 18:42:09 InfoSphere kernel: ------------[ cut here ]------------ Oct 18 18:42:09 InfoSphere kernel: WARNING: CPU: 0 PID: 8344 at arch/x86/kernel/cpu/perf_event_intel_ds.c:315 reserve_ds_buffers+0x10e/0x347() Oct 18 18:42:09 InfoSphere kernel: alloc_bts_buffer: BTS buffer allocation failure Oct 18 18:42:09 InfoSphere kernel: Modules linked in: kvm_intel kvm vhost_net vhost macvtap macvlan md_mod xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables tun xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat hid_logitech_hidpp i2c_i801 r8169 mii ahci hid_logitech_dj libahci [last unloaded: md_mod] Oct 18 18:42:09 InfoSphere kernel: CPU: 0 PID: 8344 Comm: qemu-system-x86 Not tainted 4.1.7-unRAID #3 Oct 18 18:42:09 InfoSphere kernel: Hardware name: BIOSTAR Group NM70I-1037U/NM70I-1037U, BIOS 4.6.5 06/05/2013 Oct 18 18:42:09 InfoSphere kernel: 0000000000000009 ffff880008b9f858 ffffffff815eff9a 0000000000000000 Oct 18 18:42:09 InfoSphere kernel: ffff880008b9f8a8 ffff880008b9f898 ffffffff810477cb ffff880008b9f888 Oct 18 18:42:09 InfoSphere kernel: ffffffff8101fe63 0000000000000000 0000000000000000 0000000000010e10 Oct 18 18:42:09 InfoSphere kernel: Call Trace: Oct 18 18:42:09 InfoSphere kernel: [<ffffffff815eff9a>] dump_stack+0x4c/0x6e Oct 18 18:42:09 InfoSphere kernel: [<ffffffff810477cb>] warn_slowpath_common+0x97/0xb1 Oct 18 18:42:09 InfoSphere kernel: [<ffffffff8101fe63>] ? reserve_ds_buffers+0x10e/0x347 Oct 18 18:42:09 InfoSphere kernel: [<ffffffff81047826>] warn_slowpath_fmt+0x41/0x43 Oct 18 18:42:09 InfoSphere kernel: [<ffffffff8101fe63>] reserve_ds_buffers+0x10e/0x347 Oct 18 18:42:09 InfoSphere kernel: [<ffffffff8101ac34>] x86_reserve_hardware+0x141/0x153 Oct 18 18:42:09 InfoSphere kernel: [<ffffffff8101ac8a>] x86_pmu_event_init+0x44/0x240 Oct 18 18:42:09 InfoSphere kernel: [<ffffffff810a7ad4>] perf_try_init_event+0x42/0x74 Oct 18 18:42:09 InfoSphere kernel: [<ffffffff810ad190>] perf_init_event+0x9d/0xd4 Oct 18 18:42:09 InfoSphere kernel: [<ffffffff810ad54c>] perf_event_alloc+0x385/0x4f7 Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa037c523>] ? stop_counter+0x2f/0x2f [kvm] Oct 18 18:42:09 InfoSphere kernel: [<ffffffff810ad6ec>] perf_event_create_kernel_counter+0x2e/0x12c Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa037c63e>] reprogram_counter+0xc0/0x109 [kvm] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa037c709>] reprogram_fixed_counter+0x82/0x8d [kvm] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa037c8f1>] reprogram_idx+0x4a/0x4f [kvm] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa037cc53>] kvm_pmu_set_msr+0x16a/0x29b [kvm] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa036102a>] kvm_set_msr_common+0xa7d/0xd44 [kvm] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa03aa161>] ? vmx_set_rflags+0x34/0x36 [kvm_intel] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa035f916>] ? __kvm_set_rflags+0x45/0x4e [kvm] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa03b320e>] vmx_set_msr+0x1b2/0x189e [kvm_intel] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa035cd43>] kvm_set_msr+0x61/0x63 [kvm] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa03ac138>] handle_wrmsr+0x3b/0x64 [kvm_intel] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa03b1418>] vmx_handle_exit+0x84c/0x8ec [kvm_intel] Oct 18 18:42:09 InfoSphere kernel: [<ffffffff81081ea1>] ? rcu_note_context_switch+0x14a/0x167 Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa03a92dc>] ? vmx_invpcid_supported+0x1b/0x1b [kvm_intel] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa03a92dc>] ? vmx_invpcid_supported+0x1b/0x1b [kvm_intel] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa0366a9e>] kvm_arch_vcpu_ioctl_run+0xcfc/0xeb0 [kvm] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa03ab773>] ? __vmx_load_host_state.part.53+0x125/0x12c [kvm_intel] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa036161a>] ? kvm_arch_vcpu_load+0x139/0x143 [kvm] Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa0358fd0>] kvm_vcpu_ioctl+0x169/0x48f [kvm] Oct 18 18:42:09 InfoSphere kernel: [<ffffffff8110c046>] do_vfs_ioctl+0x367/0x421 Oct 18 18:42:09 InfoSphere kernel: [<ffffffff81113d33>] ? __fget+0x6c/0x78 Oct 18 18:42:09 InfoSphere kernel: [<ffffffff8110c139>] SyS_ioctl+0x39/0x64 Oct 18 18:42:09 InfoSphere kernel: [<ffffffff815f562e>] system_call_fastpath+0x12/0x71 Oct 18 18:42:09 InfoSphere kernel: ---[ end trace 550a31b84df716cb ]--- Oct 18 18:42:20 InfoSphere kernel: VM: topology change detected, propagating Oct 18 18:42:20 InfoSphere kernel: VM: port 2(vnet0) entered forwarding state Oct 18 18:50:52 InfoSphere kernel: mdcmd (244): spindown 1 Another point worth noting is filesystems are using reiserfs. If you've ever used any of the suspect beta's that have the reiserfs corruption, then it could be manifesting as another problem. Over the year or so we've seen people with strange issues when populating reiserfs filesystems after utilizing the suspect betas. I'm not saying this is the problem, but it's hard to detect when it is. I did not see any reiserfs specific issues in the logs, but I also don't know the history of any beta usage as well.
October 28, 201510 yr Author I never used any of the beta's. I suspect it's something funky with the NM70 Chipset and the Linux kernel. I've had issues in the past with OpenELEC, but this is a different issue. But since nothing seems to show up with the logs and it seems that I'm the only person using this hardware it seems like a big hill to climb.
November 11, 201510 yr Author Ok... I've switched all my data drives from ReiserFS to XFS. This MAY have solved some things. Not 100% sure yet as only time will tell. I did see an odd Reiser error on my terminal when I had a crash: "REISERFS error (device MD2): vs-4010 is_reusable block number is out of range" but running the scan in Maintenance mode found nothing, so I moved all my data disk by disk and changed to XFS. Ran a parity check and now everyone is humming along.
November 20, 201510 yr Author It seems that converting my data disks to XFS has solved the issues. I'm on my 14th day up without issue or interruption. Anyone have advise on how to swap the cache drive from Reiser without messing up my dockers?
November 20, 201510 yr It seems that converting my data disks to XFS has solved the issues. I'm on my 14th day up without issue or interruption. A number of people have reported similar situations.
November 20, 201510 yr Author I have notice it seems to be a very similar situation. Would love to know why, though.
Archived
This topic is now archived and is closed to further replies.