raid 6.12.2 hangs after one day. No ping

JorgeB · July 1, 2023

Quote

the dashboard still shows me the ZFS storage at 100%.

This is normal.

Enable the syslog server and post that after a crash.

dada051 · July 3, 2023

Same here. Since 6.12.1 (same with 6.12.2, but didn't tried 6.12.0), my unraid server stop being accessible after a "random" time (between 1h and 2d).

ch3vr0n5 · July 5, 2023

Adding my voice to this issue. I decided to update to 6.12.2 from 6.11.5 while I was upgrading some disks over the weekend. After the parity rebuild everything seemed to be fine until I turned back on all my docker containers. At some point last night the server went unresponsive and I had to manually power cycle the box. After a reboot today, same thing happened about 5 hours later. Syslog unfortunately didn't grab anything remotely during the crash, so I have it mirroring to the flash drive at the moment. I will dig through and post what I find when it happens again.

dada051 · July 5, 2023

It's probably related to docker macvlan. I decided to shutdown pi-hole docker that is connected via br0. No problem for the moment, crossing fingers

ch3vr0n5 · July 6, 2023

On 7/5/2023 at 1:55 PM, dada051 said:

It's probably related to docker macvlan. I decided to shutdown pi-hole docker that is connected via br0. No problem for the moment, crossing fingers

I decided to switch docker to use ipvlan yesterday just in case it would solve this and it just now crashed again.

Syslog didn't capture anything from the crash. Not sure how I am supposed to get the full kernel panic. It just jumps from the last normal log line (sSMTP) to the starting bootup line from when I had to hard-boot it.

Diagnostics attached.

brandonserver-diagnostics-20230706-1626.zip

ChatNoir · July 7, 2023

6 hours ago, ch3vr0n5 said:

Syslog didn't capture anything from the crash. Not sure how I am supposed to get the full kernel panic.

did you

On 7/1/2023 at 1:31 PM, JorgeB said:

Enable the syslog server and post that after a crash.

tjsyl · July 7, 2023

So, I've been going at this for 3 weeks now and my poor disks are on their (IDK for sure) 10th parity check... After doing all I can gather from forums, checking hardware and changing from MACVLAN to IPVLAN the issue still persists. I've moved to new hardware (HP-Z820 to a Supermicro X9DRi-F) and replaced the flash drive. New flash drive started with 6.12.0 and upgraded to .1 and .2 to see if helps resolve this issue but to no avail.

Original config was 9 drives. 7- 4TB drives in the array and 2- 8TB drives in a mirror. I've replaced the 2 parity drives with 8TB drives and 2 of the 4TB drives with 8tb drives(one at a time letting it rebuild the drive). So as it sits I've got 11 drives in a 44tb array and the 2 original 8tb drives in the mirror still. I am trying to use UNBALANCE(stopped all docker containers) to move the data from the mirror to the array then plan on adding those 2 drives to the array. It keeps locking up at some point during the process of the move and I am forced to hard reboot. I am still able to access via IPMI but at the physical console or IPMI console Unraid is unresponsive.

I would love to see this issue get figured out but I am considering copying the config folder and reloading the flash drive with 6.11.5.

I just enabled syslog and mirror to flash and now going to start the copy again. Yesterday I came back from a lock up and hard reboot and about 45% into the parity sync it happened again. I've got the Diag from one of the instances but they don't have shit, I dug through each file and found nothing while I was waiting for someone to chime in on another topic I created and now found this one to tag along in. If and when it happens again I'll post the diagnostics. Only docker container I have running is the TwinGate connector. It's really useful with this because I get the notification when it goes offline.

ch3vr0n5 · July 7, 2023

8 hours ago, ChatNoir said:

did you

Yes. I have syslog enabled and pointed to itself for a copy on an array share. I also have it mirrored to the flash drive to try and diagnose this issue. Neither version captured the kernel panic. As I mentioned It goes from the last logged line of the SMTP service (sending me an email warning for a drive temp due to in-progress parity check) straight to the boot-up line from when I had to hard-boot it. The only piece of the trace I have is from the monitor I keep hooked up to it just in case. If it crashes again and I can't capture a trace I will likely just roll back to 6.11.5.

tjsyl · July 7, 2023

Got the diag from the most recent incident.

ur0-diagnostics-20230707-1136.zip

joggs · July 9, 2023

Same problems here. 100% zfs usage and nothing in syslogserver about the crash. This started happening a couple of days after converting my nvme cache that stores vm's and dockers to zfs. I'm dedicating 64GB of my 128GB to ZFS but the total ram usage was around 80% when crashing

Edited July 9, 2023 by joggs

JorgeB · July 10, 2023

11 hours ago, joggs said:

100% zfs usage

This part is normal, it just means the ARC if being fully used, as it's supposed to be.

ch3vr0n5 · July 13, 2023

On 7/7/2023 at 9:10 AM, ch3vr0n5 said:

Yes. I have syslog enabled and pointed to itself for a copy on an array share. I also have it mirrored to the flash drive to try and diagnose this issue. Neither version captured the kernel panic. As I mentioned It goes from the last logged line of the SMTP service (sending me an email warning for a drive temp due to in-progress parity check) straight to the boot-up line from when I had to hard-boot it. The only piece of the trace I have is from the monitor I keep hooked up to it just in case. If it crashes again and I can't capture a trace I will likely just roll back to 6.11.5.

Well... After this last crash I let the parity sync again and then started one container, Plex. It ran for about 3 days without issue. I started up the rest of my containers and so far it has been up for just shy of 7 days now. Perhaps the macvlan->ipvlan switch worked but needed a reboot to stick. Initially I just made the change and started the array/docker service which I thought would suffice. I will report back if anything changes.

tjsyl · July 15, 2023

On 7/6/2023 at 11:58 PM, tjsyl said:

So, I've been going at this for 3 weeks now and my poor disks are on their (IDK for sure) 10th parity check... After doing all I can gather from forums, checking hardware and changing from MACVLAN to IPVLAN the issue still persists. I've moved to new hardware (HP-Z820 to a Supermicro X9DRi-F) and replaced the flash drive. New flash drive started with 6.12.0 and upgraded to .1 and .2 to see if helps resolve this issue but to no avail.

Original config was 9 drives. 7- 4TB drives in the array and 2- 8TB drives in a mirror. I've replaced the 2 parity drives with 8TB drives and 2 of the 4TB drives with 8tb drives(one at a time letting it rebuild the drive). So as it sits I've got 11 drives in a 44tb array and the 2 original 8tb drives in the mirror still. I am trying to use UNBALANCE(stopped all docker containers) to move the data from the mirror to the array then plan on adding those 2 drives to the array. It keeps locking up at some point during the process of the move and I am forced to hard reboot. I am still able to access via IPMI but at the physical console or IPMI console Unraid is unresponsive.

I would love to see this issue get figured out but I am considering copying the config folder and reloading the flash drive with 6.11.5.

I just enabled syslog and mirror to flash and now going to start the copy again. Yesterday I came back from a lock up and hard reboot and about 45% into the parity sync it happened again. I've got the Diag from one of the instances but they don't have shit, I dug through each file and found nothing while I was waiting for someone to chime in on another topic I created and now found this one to tag along in. If and when it happens again I'll post the diagnostics. Only docker container I have running is the TwinGate connector. It's really useful with this because I get the notification when it goes offline.

Well, found out one of the 16GB ram sticks was indeed bad. $14.13 later and its been solid for well over 24 hours. One parity sync down and 2.2 of 5.5 TB moved so far.

raid 6.12.2 hangs after one day. No ping

User Feedback

Recommended Comments

JorgeB 7,545

Link to comment

dada051 27

Link to comment

ch3vr0n5 0

Link to comment

dada051 27

Link to comment

ch3vr0n5 0

Link to comment

ChatNoir 738

Link to comment

tjsyl 0

Link to comment

ch3vr0n5 0

Link to comment

tjsyl 0

Link to comment

joggs 5

Link to comment

JorgeB 7,545

Link to comment

ch3vr0n5 0

Link to comment

tjsyl 0

Link to comment

Join the conversation