Found The Trigger for a crash

October 12, 201510 yr

Now that I've made sure that I have no hardware issues...

I was deleting some files from my TV Shared Folder... And the operation stopped and windows explorer crashed. Now my shfs is running at 100% and the server load is slowly rising. The load will continue rising until the machine becomes completely unresponsive. I am also unable to reboot the server, it just stalls and will not reboot until the reset/power button is physically pressed.

Unraid 6.1.3

BIOSTAR NM70I-1037u 1.8g Dual Core Celeron

4G DDR3 1600

1x SYBA SI-PEX40064

1x 3tb cache

1x 2tb data

2x 1tb data

3x 500g data

1x 500g cache

Dockers:

Sabnzbd

Sickbeard

Couchpotato

Apache

Crashplan

HTPC-Manager

Transmission

infosphere-diagnostics-20151012-1900.zip

Quote

October 13, 201510 yr

Author

Restarted server and I was able to delete the same files that caused the last lockup.

Its seems that something causes the array to become unwritable, which causes processes to hang and eventually lockup the system?

Any Ideas? This is HUGELY inconvenient.

Quote

October 13, 201510 yr

Author

I can't seem to find anything wrong with the config or any errors.

I checked all the disks using the GUI tools and all of them passed.

None of the disks throw SMART errors.

MEMTEST ran for 24hours with 0 Errors.

I'm running out of things to try.

Quote

October 13, 201510 yr

I would suggest also running a file system check on each of the data/cache disks to make sure the problem is not due to some file system corruption.

Quote

October 13, 201510 yr

Author

I would suggest also running a file system check on each of the data/cache disks to make sure the problem is not due to some file system corruption.

This has been done. Several Times in fact.

Quote

October 14, 201510 yr

Author

Any other troubleshooting?

Is there anything in the logs that looks funny... Cause I'm not seeing anything?

Quote

October 16, 201510 yr

Select SAFE-mode from the boot menu.

Quote

October 24, 201510 yr

Author

Did it again... after 11 days this time.

infosphere-diagnostics-20151024-1838.zip

Quote

October 24, 201510 yr

Author

root@InfoSphere:/usr/local/emhttp# lsof -p 21149

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME

shfs 21149 root cwd DIR 0,2 380 2 /

shfs 21149 root rtd DIR 0,2 380 2 /

shfs 21149 root txt REG 0,2 83216 4568 /usr/local/sbin/shfs

shfs 21149 root mem REG 0,2 171470 6323 /lib64/ld-2.17.so

shfs 21149 root mem REG 0,2 134893 6252 /lib64/libpthread-2.17.so

shfs 21149 root mem REG 0,2 48928 6341 /lib64/libcrypt-2.17.so

shfs 21149 root mem REG 0,2 1944360 6269 /lib64/libcrypto.so.1.0.0

shfs 21149 root mem REG 0,2 242840 6305 /lib64/libfuse.so.2.9.3

shfs 21149 root mem REG 0,2 2102965 6246 /lib64/libc-2.17.so

shfs 21149 root mem REG 0,2 18988 6296 /lib64/libdl-2.17.so

shfs 21149 root 0u CHR 1,3 0t0 1029 /dev/null

shfs 21149 root 1u CHR 1,3 0t0 1029 /dev/null

shfs 21149 root 2u CHR 1,3 0t0 1029 /dev/null

shfs 21149 root 3u CHR 10,229 0t0 5361 /dev/fuse

shfs 21149 root 6r REG 0,2 984 4588 /usr/local/emhttp/update.htm

Quote

October 28, 201510 yr

Author

Think I'm going to throw in the towel. I'm going to roll this box back to version 5. The goodies are nice, but I need a reliable NAS more.

Quote

October 28, 201510 yr

You may need to run in safe mode for a while to see what might be causing the situation.

I saw this in the diagnostic syslog.

Oct 18 18:41:50 InfoSphere kernel: device vnet0 entered promiscuous mode
Oct 18 18:41:50 InfoSphere kernel: VM: port 2(vnet0) entered listening state
Oct 18 18:41:50 InfoSphere kernel: VM: port 2(vnet0) entered listening state
Oct 18 18:41:54 InfoSphere kernel: kvm: zapping shadow pages for mmio generation wraparound
Oct 18 18:42:05 InfoSphere kernel: VM: port 2(vnet0) entered learning state
Oct 18 18:42:09 InfoSphere kernel: ------------[ cut here ]------------
Oct 18 18:42:09 InfoSphere kernel: WARNING: CPU: 0 PID: 8344 at arch/x86/kernel/cpu/perf_event_intel_ds.c:315 reserve_ds_buffers+0x10e/0x347()
Oct 18 18:42:09 InfoSphere kernel: alloc_bts_buffer: BTS buffer allocation failure
Oct 18 18:42:09 InfoSphere kernel: Modules linked in: kvm_intel kvm vhost_net vhost macvtap macvlan md_mod xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables tun xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat hid_logitech_hidpp i2c_i801 r8169 mii ahci hid_logitech_dj libahci [last unloaded: md_mod]
Oct 18 18:42:09 InfoSphere kernel: CPU: 0 PID: 8344 Comm: qemu-system-x86 Not tainted 4.1.7-unRAID #3
Oct 18 18:42:09 InfoSphere kernel: Hardware name: BIOSTAR Group NM70I-1037U/NM70I-1037U, BIOS 4.6.5 06/05/2013
Oct 18 18:42:09 InfoSphere kernel: 0000000000000009 ffff880008b9f858 ffffffff815eff9a 0000000000000000
Oct 18 18:42:09 InfoSphere kernel: ffff880008b9f8a8 ffff880008b9f898 ffffffff810477cb ffff880008b9f888
Oct 18 18:42:09 InfoSphere kernel: ffffffff8101fe63 0000000000000000 0000000000000000 0000000000010e10
Oct 18 18:42:09 InfoSphere kernel: Call Trace:
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff815eff9a>] dump_stack+0x4c/0x6e
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff810477cb>] warn_slowpath_common+0x97/0xb1
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff8101fe63>] ? reserve_ds_buffers+0x10e/0x347
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff81047826>] warn_slowpath_fmt+0x41/0x43
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff8101fe63>] reserve_ds_buffers+0x10e/0x347
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff8101ac34>] x86_reserve_hardware+0x141/0x153
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff8101ac8a>] x86_pmu_event_init+0x44/0x240
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff810a7ad4>] perf_try_init_event+0x42/0x74
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff810ad190>] perf_init_event+0x9d/0xd4
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff810ad54c>] perf_event_alloc+0x385/0x4f7
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa037c523>] ? stop_counter+0x2f/0x2f [kvm]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff810ad6ec>] perf_event_create_kernel_counter+0x2e/0x12c
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa037c63e>] reprogram_counter+0xc0/0x109 [kvm]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa037c709>] reprogram_fixed_counter+0x82/0x8d [kvm]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa037c8f1>] reprogram_idx+0x4a/0x4f [kvm]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa037cc53>] kvm_pmu_set_msr+0x16a/0x29b [kvm]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa036102a>] kvm_set_msr_common+0xa7d/0xd44 [kvm]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa03aa161>] ? vmx_set_rflags+0x34/0x36 [kvm_intel]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa035f916>] ? __kvm_set_rflags+0x45/0x4e [kvm]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa03b320e>] vmx_set_msr+0x1b2/0x189e [kvm_intel]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa035cd43>] kvm_set_msr+0x61/0x63 [kvm]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa03ac138>] handle_wrmsr+0x3b/0x64 [kvm_intel]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa03b1418>] vmx_handle_exit+0x84c/0x8ec [kvm_intel]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff81081ea1>] ? rcu_note_context_switch+0x14a/0x167
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa03a92dc>] ? vmx_invpcid_supported+0x1b/0x1b [kvm_intel]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa03a92dc>] ? vmx_invpcid_supported+0x1b/0x1b [kvm_intel]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa0366a9e>] kvm_arch_vcpu_ioctl_run+0xcfc/0xeb0 [kvm]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa03ab773>] ? __vmx_load_host_state.part.53+0x125/0x12c [kvm_intel]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa036161a>] ? kvm_arch_vcpu_load+0x139/0x143 [kvm]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffffa0358fd0>] kvm_vcpu_ioctl+0x169/0x48f [kvm]
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff8110c046>] do_vfs_ioctl+0x367/0x421
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff81113d33>] ? __fget+0x6c/0x78
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff8110c139>] SyS_ioctl+0x39/0x64
Oct 18 18:42:09 InfoSphere kernel: [<ffffffff815f562e>] system_call_fastpath+0x12/0x71
Oct 18 18:42:09 InfoSphere kernel: ---[ end trace 550a31b84df716cb ]---
Oct 18 18:42:20 InfoSphere kernel: VM: topology change detected, propagating
Oct 18 18:42:20 InfoSphere kernel: VM: port 2(vnet0) entered forwarding state
Oct 18 18:50:52 InfoSphere kernel: mdcmd (244): spindown 1

Another point worth noting is filesystems are using reiserfs.

If you've ever used any of the suspect beta's that have the reiserfs corruption, then it could be manifesting as another problem.

Over the year or so we've seen people with strange issues when populating reiserfs filesystems after utilizing the suspect betas.

I'm not saying this is the problem, but it's hard to detect when it is.

I did not see any reiserfs specific issues in the logs, but I also don't know the history of any beta usage as well.

Quote

October 28, 201510 yr

Author

I never used any of the beta's.

I suspect it's something funky with the NM70 Chipset and the Linux kernel. I've had issues in the past with OpenELEC, but this is a different issue.

But since nothing seems to show up with the logs and it seems that I'm the only person using this hardware it seems like a big hill to climb.

Quote

November 11, 201510 yr

Author

Ok...

I've switched all my data drives from ReiserFS to XFS.

This MAY have solved some things. Not 100% sure yet as only time will tell.

I did see an odd Reiser error on my terminal when I had a crash:

"REISERFS error (device MD2): vs-4010 is_reusable block number is out of range"

but running the scan in Maintenance mode found nothing, so I moved all my data disk by disk and changed to XFS. Ran a parity check and now everyone is humming along.

Quote

November 20, 201510 yr

Author

It seems that converting my data disks to XFS has solved the issues. I'm on my 14th day up without issue or interruption.

Anyone have advise on how to swap the cache drive from Reiser without messing up my dockers?

Quote

November 20, 201510 yr

It seems that converting my data disks to XFS has solved the issues. I'm on my 14th day up without issue or interruption.

A number of people have reported similar situations.

Quote

November 20, 201510 yr

Author

I have notice it seems to be a very similar situation.

Would love to know why, though.

Quote

Found The Trigger for a crash

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)