ReiserFS Hangs v6.3.x

April 2, 20179 yr

I've been hitting some hangs with the recent unRAID versions, currently 6.3.2 so only one release behind.

Basically ReiserFS crashes and/or hangs indefinitely. As a result smbd processes start to multiply all blocking e.g.

nobody   63826 0.0 0.1 302444 13944 ?        D    08:50   0:00 /usr/sbin/smbd -D
nobody   64081 0.0 0.1 302868 15872 ?        D    08:51   0:00 /usr/sbin/smbd -D
nobody   64124 0.0 0.1 303140 14464 ?        D    11:31   0:00 /usr/sbin/smbd -D
nobody   64172 0.0 0.1 303088 14092 ?        D    08:52   0:00 /usr/sbin/smbd -D
nobody   64341 0.0 0.1 302640 15028 ?        D    08:52   0:00 /usr/sbin/smbd -D
nobody   64591 0.0 0.1 302492 14388 ?        D    08:53   0:00 /usr/sbin/smbd -D
nobody   64671 0.0 0.1 303792 14060 ?        S    Apr01   0:00 /usr/sbin/smbd -D
nobody   64689 0.0 0.1 303088 14064 ?        D    08:54   0:00 /usr/sbin/smbd -D

There are currently 371 of these and accordingly my load average is about 375 and it just keeps going up and up. The system is fine and responsive for most things except currently my /mnt/disk3 (ReiserFS). Any attempt to access it hangs whatever task accesses it so /mnt/user and /mnt/user0 hang. Other disks are fine. I cannot kill any of the blocked tasks and ultimately any process that is accessing or tries to access anything related to /mnt/disk3 hangs indefinitely. There is no way to recover that I can find short of a hard reset - it simply can't be shut down. I can't stop the array, unmount etc either. It's not always the same disk.

I've added 2 drives recently and put them in as XFS, but have 3 old ReiserFS disks that have been in use for quite a while (from at least unRAID 4.x days) running on an older MicroServer

Current config
HP Microserver Gen 8 with 10GB ECC RAM
parity - 4TB
disk 1 - 4TB ReiserFS
disk 2 - 4TB ReiserFS
disk 3 - 4TB ReiserFS
cache - Samsung 830 256GB SSD btrfs (on CD ROM SATA port)
Marvell 88SE9230 PCIe eSATA controller

External enclosure
Port multiplier enclosure
disk 4 - 4TB XFS
disk 5 - 4TB XFS
Two other unused/old disks

Note the problem had occurred previous before the PCIe eSATA controller and external box so not entirely sure they could be the problem. Aside from this, the hardware has been very reliable and the problem has only started in the past few months i.e. it seems semi related to recent unRAID distribution versions.

I've had a number of ReiserFS kernel panics e.g. this was shortly after a hard reset due to the same problem

Mar 30 17:15:54 Mars kernel: REISERFS warning (device md1): journal-1409 journal_mark_dirty: returning because j_wcount was 0
Mar 30 17:15:54 Mars kernel: general protection fault: 0000 [#1] PREEMPT SMP
Mar 30 17:15:54 Mars kernel: Modules linked in: veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat md_mod tg3 ptp pps_core x86_pkg_temp_thermal coretemp kvm_intel kvm ahci libahci ipmi_si pcc_cpufreq acpi_cpufreq [last unloaded: pps_core]
Mar 30 17:15:54 Mars kernel: CPU: 0 PID: 28248 Comm: smbd Not tainted 4.9.10-unRAID #1
Mar 30 17:15:54 Mars kernel: Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 07/16/2015
Mar 30 17:15:54 Mars kernel: task: ffff88022dc48000 task.stack: ffffc90002c5c000
Mar 30 17:15:54 Mars kernel: RIP: 0010:[<ffffffff8107dc6f>] [<ffffffff8107dc6f>] native_queued_spin_lock_slowpath+0x12d/0x17e
Mar 30 17:15:54 Mars kernel: RSP: 0018:ffffc90002c5fab0 EFLAGS: 00010286
Mar 30 17:15:54 Mars kernel: RAX: 000000000000135f RBX: ffff88010c78d1a0 RCX: fff805b7ffb37055
Mar 30 17:15:54 Mars kernel: RDX: ffff880280218580 RSI: 0000000000040000 RDI: ffff88010c78d1a0
Mar 30 17:15:54 Mars kernel: RBP: ffffc90002c5fab0 R08: 0000000000000001 R09: 0000000000000001
Mar 30 17:15:54 Mars kernel: R10: ffffc90002c5f9c0 R11: 0000000000000000 R12: ffff88010c78d120
Mar 30 17:15:54 Mars kernel: R13: ffff88010c78d120 R14: ffff880261948000 R15: 0000000000001000
Mar 30 17:15:54 Mars kernel: FS: 00002af121cc1e40(0000) GS:ffff880280200000(0000) knlGS:0000000000000000
Mar 30 17:15:54 Mars kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 30 17:15:54 Mars kernel: CR2: 00002b68bc908570 CR3: 00000000b66d7000 CR4: 00000000001406f0
Mar 30 17:15:54 Mars kernel: Stack:
Mar 30 17:15:54 Mars kernel: ffffc90002c5fac0 ffffffff8167ce20 ffffc90002c5fae8 ffffffff811252e6
Mar 30 17:15:54 Mars kernel: 0000000000010000 ffff88027b46f800 0000000000000000 ffffc90002c5fb48
Mar 30 17:15:54 Mars kernel: ffffffff8117c75e 224cf80c00000001 ffffc90003991000 ffffc90002c5fc88
Mar 30 17:15:54 Mars kernel: Call Trace:
Mar 30 17:15:54 Mars kernel: [<ffffffff8167ce20>] _raw_spin_lock+0x21/0x25
Mar 30 17:15:54 Mars kernel: [<ffffffff811252e6>] inode_sub_bytes+0x1e/0x38
Mar 30 17:15:54 Mars kernel: [<ffffffff8117c75e>] _reiserfs_free_block+0x161/0x17b
Mar 30 17:15:54 Mars kernel: [<ffffffff8117c862>] __discard_prealloc+0x52/0xb1
Mar 30 17:15:54 Mars kernel: [<ffffffff8117c929>] reiserfs_discard_all_prealloc+0x48/0x51
Mar 30 17:15:54 Mars kernel: [<ffffffff81198f66>] do_journal_end+0x3e5/0xc54
Mar 30 17:15:54 Mars kernel: [<ffffffff81199d29>] journal_end+0xad/0xb0
Mar 30 17:15:54 Mars kernel: [<ffffffff81181327>] reiserfs_create+0x15d/0x17b
Mar 30 17:15:54 Mars kernel: [<ffffffff8119c61c>] ? reiserfs_permission+0xf/0x14
Mar 30 17:15:54 Mars kernel: [<ffffffff8112dcf3>] path_openat+0x7c5/0xca8
Mar 30 17:15:54 Mars kernel: [<ffffffff8113f301>] ? __vfs_getxattr+0x2/0x6e
Mar 30 17:15:54 Mars kernel: [<ffffffff8112e21e>] do_filp_open+0x48/0x9e
Mar 30 17:15:54 Mars kernel: [<ffffffff8110b33f>] ? kmem_cache_alloc+0xe8/0xf6
Mar 30 17:15:54 Mars kernel: [<ffffffff811207d1>] do_sys_open+0x137/0x1c6
Mar 30 17:15:54 Mars kernel: [<ffffffff811207d1>] ? do_sys_open+0x137/0x1c6
Mar 30 17:15:54 Mars kernel: [<ffffffff8113f55e>] ? path_getxattr+0x5c/0x7f
Mar 30 17:15:54 Mars kernel: [<ffffffff81120879>] SyS_open+0x19/0x1b
Mar 30 17:15:54 Mars kernel: [<ffffffff8167d2b7>] entry_SYSCALL_64_fastpath+0x1a/0xa9
Mar 30 17:15:54 Mars kernel: Code: e8 10 66 87 47 02 c1 e0 10 74 6b 48 89 c1 c1 e8 12 48 c1 e9 0c ff c8 83 e1 30 48 98 48 81 c1 80 85 01 00 48 03 0c c5 60 62 9b 81 <48> 89 11 8b 42 08 85 c0 75 04 f3 90 eb f5 48 8b 0a 48 85 c9 74
Mar 30 17:15:54 Mars kernel: RIP [<ffffffff8107dc6f>] native_queued_spin_lock_slowpath+0x12d/0x17e
Mar 30 17:15:54 Mars kernel: RSP <ffffc90002c5fab0>
Mar 30 17:15:54 Mars kernel: ---[ end trace c4673157ae974a54 ]---
Mar 30 17:15:54 Mars kernel: note: smbd[28248] exited with preempt_count 1

reiserfsck of /dev/md1 returned no problems after the kernel panic after the system was hard reset in this case.

The kernel panics usually occur and then the problem starts but not always. Currently all devices are accessible except for /dev/md3 i.e. /mnt/disk3 but this time there was no kernel panic although it does appear to have started midway through the mover task running last night (see thread below)

I'm trying to get away from ReiserFS as it seems to have fallen out of favour and seems related but does anyone have any ideas?

Edited April 2, 20179 yr by Shonky

Quote

April 2, 20179 yr

Author

Also just noticed shfs has gone 100% CPU too. Don't think that's happened before.

top - 15:32:31 up 2 days,  6:51,  5 users,  load average: 417.03, 417.49, 417.84
Tasks: 662 total,   2 running, 659 sleeping,   0 stopped,   1 zombie
%Cpu(s):  2.5 us, 50.9 sy,  0.0 ni, 46.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 10199176 total,  2051320 free,  1120044 used,  7027812 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  7747956 avail Mem
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2867 root      20   0  221096   1536    780 S 100.0  0.0 669:22.41 shfs
 3584 root      20   0   74808  16980   6928 S   5.3  0.2   0:12.71 iotop
 4722 root      20   0   16980   3492   2348 R   0.7  0.0   0:00.30 top
 3330 root      20   0       0      0      0 S   0.3  0.0   0:00.39 kworker/0:2
    1 root      20   0    4360    748    684 S   0.0  0.0   0:11.31 init
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.04 kthreadd

Quote

April 2, 20179 yr

Have you tried unRAID version 6.3.3 ?

Quote

April 2, 20179 yr

Author

No not yet. I would have expected reiserfs support to be pretty solid by now though and older versions didn't have this issue. Is work still going on on reiserfs?

Quote

April 2, 20179 yr

The issue is that development on ReiserFS itself is very low and doesn't keep up with the latest Linux kernels. It looks like it is heading on its way out (my personal view).

My advice would be to convert your data disks to XFS.

unRAID v6.3.3 did revert back some of the latest RFS modifications, which broke tooling things.

Quote

April 2, 20179 yr

Author

Ok, well I've upgraded to 6.3.3 then. I think it happened before 6.3.2 as well but didn't take note at the time.

In the mean time though I've finished moving everything off two of the reiserfs disks now and converted them to XFS. Only md3/disk3 remains to be converted.

Quote

April 2, 20179 yr

Community Expert

There are problems with reiserfsprogs, i.e., reiserfsck, included with all v6.3 releases before v6.3.3.

There also another common issue that affects some users with at least one reiserfs disk, like shfs using 100% cpu, unRAID hanging after a few days, etc, if you experience these it's recommended to convert all remaining disks to XFS (IMO you should convert anyway, because it's a filesystem on its way out with terrible performance in some situations)

Quote

ReiserFS Hangs v6.3.x

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)