April 2, 20179 yr I've been hitting some hangs with the recent unRAID versions, currently 6.3.2 so only one release behind. Basically ReiserFS crashes and/or hangs indefinitely. As a result smbd processes start to multiply all blocking e.g. nobody 63826 0.0 0.1 302444 13944 ? D 08:50 0:00 /usr/sbin/smbd -D nobody 64081 0.0 0.1 302868 15872 ? D 08:51 0:00 /usr/sbin/smbd -D nobody 64124 0.0 0.1 303140 14464 ? D 11:31 0:00 /usr/sbin/smbd -D nobody 64172 0.0 0.1 303088 14092 ? D 08:52 0:00 /usr/sbin/smbd -D nobody 64341 0.0 0.1 302640 15028 ? D 08:52 0:00 /usr/sbin/smbd -D nobody 64591 0.0 0.1 302492 14388 ? D 08:53 0:00 /usr/sbin/smbd -D nobody 64671 0.0 0.1 303792 14060 ? S Apr01 0:00 /usr/sbin/smbd -D nobody 64689 0.0 0.1 303088 14064 ? D 08:54 0:00 /usr/sbin/smbd -D There are currently 371 of these and accordingly my load average is about 375 and it just keeps going up and up. The system is fine and responsive for most things except currently my /mnt/disk3 (ReiserFS). Any attempt to access it hangs whatever task accesses it so /mnt/user and /mnt/user0 hang. Other disks are fine. I cannot kill any of the blocked tasks and ultimately any process that is accessing or tries to access anything related to /mnt/disk3 hangs indefinitely. There is no way to recover that I can find short of a hard reset - it simply can't be shut down. I can't stop the array, unmount etc either. It's not always the same disk. I've added 2 drives recently and put them in as XFS, but have 3 old ReiserFS disks that have been in use for quite a while (from at least unRAID 4.x days) running on an older MicroServer Current config HP Microserver Gen 8 with 10GB ECC RAM parity - 4TB disk 1 - 4TB ReiserFS disk 2 - 4TB ReiserFS disk 3 - 4TB ReiserFS cache - Samsung 830 256GB SSD btrfs (on CD ROM SATA port) Marvell 88SE9230 PCIe eSATA controller External enclosure Port multiplier enclosure disk 4 - 4TB XFS disk 5 - 4TB XFS Two other unused/old disks Note the problem had occurred previous before the PCIe eSATA controller and external box so not entirely sure they could be the problem. Aside from this, the hardware has been very reliable and the problem has only started in the past few months i.e. it seems semi related to recent unRAID distribution versions. I've had a number of ReiserFS kernel panics e.g. this was shortly after a hard reset due to the same problem Mar 30 17:15:54 Mars kernel: REISERFS warning (device md1): journal-1409 journal_mark_dirty: returning because j_wcount was 0 Mar 30 17:15:54 Mars kernel: general protection fault: 0000 [#1] PREEMPT SMP Mar 30 17:15:54 Mars kernel: Modules linked in: veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat md_mod tg3 ptp pps_core x86_pkg_temp_thermal coretemp kvm_intel kvm ahci libahci ipmi_si pcc_cpufreq acpi_cpufreq [last unloaded: pps_core] Mar 30 17:15:54 Mars kernel: CPU: 0 PID: 28248 Comm: smbd Not tainted 4.9.10-unRAID #1 Mar 30 17:15:54 Mars kernel: Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 07/16/2015 Mar 30 17:15:54 Mars kernel: task: ffff88022dc48000 task.stack: ffffc90002c5c000 Mar 30 17:15:54 Mars kernel: RIP: 0010:[<ffffffff8107dc6f>] [<ffffffff8107dc6f>] native_queued_spin_lock_slowpath+0x12d/0x17e Mar 30 17:15:54 Mars kernel: RSP: 0018:ffffc90002c5fab0 EFLAGS: 00010286 Mar 30 17:15:54 Mars kernel: RAX: 000000000000135f RBX: ffff88010c78d1a0 RCX: fff805b7ffb37055 Mar 30 17:15:54 Mars kernel: RDX: ffff880280218580 RSI: 0000000000040000 RDI: ffff88010c78d1a0 Mar 30 17:15:54 Mars kernel: RBP: ffffc90002c5fab0 R08: 0000000000000001 R09: 0000000000000001 Mar 30 17:15:54 Mars kernel: R10: ffffc90002c5f9c0 R11: 0000000000000000 R12: ffff88010c78d120 Mar 30 17:15:54 Mars kernel: R13: ffff88010c78d120 R14: ffff880261948000 R15: 0000000000001000 Mar 30 17:15:54 Mars kernel: FS: 00002af121cc1e40(0000) GS:ffff880280200000(0000) knlGS:0000000000000000 Mar 30 17:15:54 Mars kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 30 17:15:54 Mars kernel: CR2: 00002b68bc908570 CR3: 00000000b66d7000 CR4: 00000000001406f0 Mar 30 17:15:54 Mars kernel: Stack: Mar 30 17:15:54 Mars kernel: ffffc90002c5fac0 ffffffff8167ce20 ffffc90002c5fae8 ffffffff811252e6 Mar 30 17:15:54 Mars kernel: 0000000000010000 ffff88027b46f800 0000000000000000 ffffc90002c5fb48 Mar 30 17:15:54 Mars kernel: ffffffff8117c75e 224cf80c00000001 ffffc90003991000 ffffc90002c5fc88 Mar 30 17:15:54 Mars kernel: Call Trace: Mar 30 17:15:54 Mars kernel: [<ffffffff8167ce20>] _raw_spin_lock+0x21/0x25 Mar 30 17:15:54 Mars kernel: [<ffffffff811252e6>] inode_sub_bytes+0x1e/0x38 Mar 30 17:15:54 Mars kernel: [<ffffffff8117c75e>] _reiserfs_free_block+0x161/0x17b Mar 30 17:15:54 Mars kernel: [<ffffffff8117c862>] __discard_prealloc+0x52/0xb1 Mar 30 17:15:54 Mars kernel: [<ffffffff8117c929>] reiserfs_discard_all_prealloc+0x48/0x51 Mar 30 17:15:54 Mars kernel: [<ffffffff81198f66>] do_journal_end+0x3e5/0xc54 Mar 30 17:15:54 Mars kernel: [<ffffffff81199d29>] journal_end+0xad/0xb0 Mar 30 17:15:54 Mars kernel: [<ffffffff81181327>] reiserfs_create+0x15d/0x17b Mar 30 17:15:54 Mars kernel: [<ffffffff8119c61c>] ? reiserfs_permission+0xf/0x14 Mar 30 17:15:54 Mars kernel: [<ffffffff8112dcf3>] path_openat+0x7c5/0xca8 Mar 30 17:15:54 Mars kernel: [<ffffffff8113f301>] ? __vfs_getxattr+0x2/0x6e Mar 30 17:15:54 Mars kernel: [<ffffffff8112e21e>] do_filp_open+0x48/0x9e Mar 30 17:15:54 Mars kernel: [<ffffffff8110b33f>] ? kmem_cache_alloc+0xe8/0xf6 Mar 30 17:15:54 Mars kernel: [<ffffffff811207d1>] do_sys_open+0x137/0x1c6 Mar 30 17:15:54 Mars kernel: [<ffffffff811207d1>] ? do_sys_open+0x137/0x1c6 Mar 30 17:15:54 Mars kernel: [<ffffffff8113f55e>] ? path_getxattr+0x5c/0x7f Mar 30 17:15:54 Mars kernel: [<ffffffff81120879>] SyS_open+0x19/0x1b Mar 30 17:15:54 Mars kernel: [<ffffffff8167d2b7>] entry_SYSCALL_64_fastpath+0x1a/0xa9 Mar 30 17:15:54 Mars kernel: Code: e8 10 66 87 47 02 c1 e0 10 74 6b 48 89 c1 c1 e8 12 48 c1 e9 0c ff c8 83 e1 30 48 98 48 81 c1 80 85 01 00 48 03 0c c5 60 62 9b 81 <48> 89 11 8b 42 08 85 c0 75 04 f3 90 eb f5 48 8b 0a 48 85 c9 74 Mar 30 17:15:54 Mars kernel: RIP [<ffffffff8107dc6f>] native_queued_spin_lock_slowpath+0x12d/0x17e Mar 30 17:15:54 Mars kernel: RSP <ffffc90002c5fab0> Mar 30 17:15:54 Mars kernel: ---[ end trace c4673157ae974a54 ]--- Mar 30 17:15:54 Mars kernel: note: smbd[28248] exited with preempt_count 1 reiserfsck of /dev/md1 returned no problems after the kernel panic after the system was hard reset in this case. The kernel panics usually occur and then the problem starts but not always. Currently all devices are accessible except for /dev/md3 i.e. /mnt/disk3 but this time there was no kernel panic although it does appear to have started midway through the mover task running last night (see thread below) I'm trying to get away from ReiserFS as it seems to have fallen out of favour and seems related but does anyone have any ideas? Seems similar to:http://lime-technology.com/oldforum/index.php?topic=42875 Edited April 2, 20179 yr by Shonky
April 2, 20179 yr Author Also just noticed shfs has gone 100% CPU too. Don't think that's happened before. top - 15:32:31 up 2 days, 6:51, 5 users, load average: 417.03, 417.49, 417.84 Tasks: 662 total, 2 running, 659 sleeping, 0 stopped, 1 zombie %Cpu(s): 2.5 us, 50.9 sy, 0.0 ni, 46.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 10199176 total, 2051320 free, 1120044 used, 7027812 buff/cache KiB Swap: 0 total, 0 free, 0 used. 7747956 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2867 root 20 0 221096 1536 780 S 100.0 0.0 669:22.41 shfs 3584 root 20 0 74808 16980 6928 S 5.3 0.2 0:12.71 iotop 4722 root 20 0 16980 3492 2348 R 0.7 0.0 0:00.30 top 3330 root 20 0 0 0 0 S 0.3 0.0 0:00.39 kworker/0:2 1 root 20 0 4360 748 684 S 0.0 0.0 0:11.31 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.04 kthreadd
April 2, 20179 yr Author No not yet. I would have expected reiserfs support to be pretty solid by now though and older versions didn't have this issue. Is work still going on on reiserfs?
April 2, 20179 yr The issue is that development on ReiserFS itself is very low and doesn't keep up with the latest Linux kernels. It looks like it is heading on its way out (my personal view). My advice would be to convert your data disks to XFS. unRAID v6.3.3 did revert back some of the latest RFS modifications, which broke tooling things.
April 2, 20179 yr Author Ok, well I've upgraded to 6.3.3 then. I think it happened before 6.3.2 as well but didn't take note at the time. In the mean time though I've finished moving everything off two of the reiserfs disks now and converted them to XFS. Only md3/disk3 remains to be converted.
April 2, 20179 yr Community Expert There are problems with reiserfsprogs, i.e., reiserfsck, included with all v6.3 releases before v6.3.3. There also another common issue that affects some users with at least one reiserfs disk, like shfs using 100% cpu, unRAID hanging after a few days, etc, if you experience these it's recommended to convert all remaining disks to XFS (IMO you should convert anyway, because it's a filesystem on its way out with terrible performance in some situations)
Archived
This topic is now archived and is closed to further replies.