Frequent crashes since 6.12.0 appear to have corrupted cache & parity

enmesh-parisian-latest · July 17, 2023

Hey, I've been having issues since 6.12.0 (now on 6.12.3). The system was regularly crashing which I posted about here. While attempting to apply a recommended fix, it became clear that the docker image was corrupted, leading to me realise the problem was bigger and the cache drive partition was corrupted (was only operating in read only). I cleared and reformatted the cache drives and then began transferring my data back, when I noticed the parity drive was now not readable.

I couldn't generate a SMART report or perform any checks on the parity drive, so I shut down, checked cables and connections and rebooted, the parity drive was no longer visible in the UI, so as an experiment I switched the parity drive bay with another drive, now the parity drive is back, can generate SMART reports but needs to be formatted and parity rebuilt.

I'm now rebuilding the parity, but I get the feeling I might be missing some bigger issue. I've attached diagnostics and a SMART report for the parity drive, is there anything here I should be worried about?

As a small side note, I noticed that FCP is reporting the "Write Cache is disabled" on the parity and drives 1-22, however I have 23 disks in my array (plus the parity), it seems odd that one disk would not be reporting the same "Write Cache is disabled"...

tobor-server-diagnostics-20230717-1503.zip WD161KFGX-68AFSM_2VGD275B_3500605ba011718e8-20230717-1500.txt

Edited July 17, 2023 by enmesh-parisian-latest
grammar

JorgeB · July 17, 2023

2 hours ago, enmesh-parisian-latest said:

but needs to be formatted

Parity is never formatted, what is the current issue? Logs look normal and cache is mounting.

enmesh-parisian-latest · July 17, 2023

3 hours ago, JorgeB said:

Parity is never formatted, what is the current issue? Logs look normal and cache is mounting.

It's true everything appears to be working now, but for my cache drives and parity to fail within two days of each other, I feel like something bigger is the problem, I'm only addressing the symptoms but haven't found the cause of the problems. I am hoping the diagnostics and logs can help identify this problem. Attached is the system log, however it's missing the period where my parity failed.

syslog-10.0.0.200.log

JorgeB · July 17, 2023

If any more issues grab and post diags before rebooting.

enmesh-parisian-latest · July 19, 2023

On 7/18/2023 at 12:25 AM, JorgeB said:

If any more issues grab and post diags before rebooting.

I'm still rebuilding parity, but I noticed some kernel errors in the system log last night:

Jul 19 02:38:07 tobor-server kernel: PMS LoudnessCmd[31931]: segfault at 0 ip 000014da6a0d7060 sp 000014da658460d8 error 4 in libswresample.so.4[14da6a0cf000+18000] likely on CPU 47 (core 13, socket 1)
Jul 19 02:38:07 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
Jul 19 02:38:08 tobor-server kernel: PMS LoudnessCmd[32119]: segfault at 0 ip 0000150d92c2f060 sp 0000150d8e5b80d8 error 4 in libswresample.so.4[150d92c27000+18000] likely on CPU 23 (core 13, socket 1)
Jul 19 02:38:08 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
Jul 19 02:38:08 tobor-server kernel: PMS LoudnessCmd[32151]: segfault at 0 ip 00001498864b8900 sp 0000149881cd00d8 error 4 in libswresample.so.4[1498864b0000+18000] likely on CPU 16 (core 4, socket 1)
Jul 19 02:38:08 tobor-server kernel: Code: cc cc cc cc cc cc cc cc cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 7c 66 2e 0f 1f 84 00 00 00 00 00 <f3> 0f 10 06 f3 0f 5a c0 f2 0f 11 07 f3 0f 10 04 06 48 01 c6 f3 0f
Jul 19 02:38:40 tobor-server kernel: PMS LoudnessCmd[32179]: segfault at 0 ip 000014ae7be78060 sp 000014ae779440d8 error 4 in libswresample.so.4[14ae7be70000+18000] likely on CPU 11 (core 13, socket 0)
Jul 19 02:38:40 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
Jul 19 02:39:22 tobor-server kernel: PMS LoudnessCmd[34204]: segfault at 0 ip 000014b820278060 sp 000014b81bf970d8 error 4 in libswresample.so.4[14b820270000+18000] likely on CPU 47 (core 13, socket 1)
Jul 19 02:39:22 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
Jul 19 02:39:23 tobor-server kernel: PMS LoudnessCmd[36896]: segfault at 0 ip 000014e50e890060 sp 000014e50a00b0d8 error 4 in libswresample.so.4[14e50e888000+18000] likely on CPU 42 (core 8, socket 1)
Jul 19 02:39:23 tobor-server kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 22 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06

Is this some clue to the original problem?

tobor-server-diagnostics-20230719-1146.zip

JorgeB · July 19, 2023

Those appear to be normal, I'm used to seeing them in multiple diags.

Frette · December 14, 2023

I am having a similar issue where being on 6.12.6 and pervious versions of 6.12 are causing my Unraid Server to crash. I've gone though and cleaned up what I thought might be the issue like NVIDIA drivers etc. I've even removed hardware thinking that might be an issue but still the server keeps locking up and I cant even log in to the physical device.

My next option is to down grade to a more stable version everything worked fine with 6.11.6

syslog-192.168.1.34.log

nrem-diagnostics-20231214-1049.zip

Edited December 14, 2023 by Frette

enmesh-parisian-latest · December 15, 2023

7 hours ago, Frette said:

I am having a similar issue where being on 6.12.6 and pervious versions of 6.12 are causing my Unraid Server to crash. I've gone though and cleaned up what I thought might be the issue like NVIDIA drivers etc. I've even removed hardware thinking that might be an issue but still the server keeps locking up and I cant even log in to the physical device.

My next option is to down grade to a more stable version everything worked fine with 6.11.6

syslog-192.168.1.34.log 258.32 kB · 2 downloads

nrem-diagnostics-20231214-1049.zip 194.23 kB · 1 download

Did you switch the docker network type to ipvlan?

Frette · December 15, 2023

looks like its set already i didnt change anything

Frequent crashes since 6.12.0 appear to have corrupted cache & parity

Recommended Posts

enmesh-parisian-latest

Link to comment

JorgeB

Link to comment

enmesh-parisian-latest

Link to comment

JorgeB

Link to comment

enmesh-parisian-latest

Link to comment

JorgeB

Link to comment

Frette

Link to comment

enmesh-parisian-latest

Link to comment

Frette

Link to comment

Join the conversation