Jump to content
  • raid 6.12.2 hangs after one day. No ping


    Rockikone
    • Minor

    Hey,

    my Unraid server has hung after 1 day under version 6.12.2. IPVlan was set as docker network.

    After a hard reboot I switched to MACVlan. Here I get a kernel error.

    Furthermore, the dashboard still shows me the ZFS storage at 100%. Have two pools each 2x 1 TB with (mirror). ZFS memory is allocated 8 GB.

    Hope you can do something with it.

    I go back to 6.12. There the server ran without problems!

    homeserver-diagnostics-20230701-1309.zip




    User Feedback

    Recommended Comments

    Same here. Since 6.12.1 (same with 6.12.2, but didn't tried 6.12.0), my unraid server stop being accessible after a "random" time (between 1h and 2d).

    • Upvote 1
    Link to comment

    Adding my voice to this issue. I decided to update to 6.12.2 from 6.11.5 while I was upgrading some disks over the weekend. After the parity rebuild everything seemed to be fine until I turned back on all my docker containers. At some point last night the server went unresponsive and I had to manually power cycle the box. After a reboot today, same thing happened about 5 hours later. Syslog unfortunately didn't grab anything remotely during the crash, so I have it mirroring to the flash drive at the moment. I will dig through and post what I find when it happens again.

    Link to comment

    It's probably related to docker macvlan. I decided to shutdown pi-hole docker that is connected via br0. No problem for the moment, crossing fingers :)  

    Link to comment
    On 7/5/2023 at 1:55 PM, dada051 said:

    It's probably related to docker macvlan. I decided to shutdown pi-hole docker that is connected via br0. No problem for the moment, crossing fingers :)  


    I decided to switch docker to use ipvlan yesterday just in case it would solve this and it just now crashed again.

    Syslog didn't capture anything from the crash. Not sure how I am supposed to get the full kernel panic. It just jumps from the last normal log line (sSMTP) to the starting bootup line from when I had to hard-boot it.

    Diagnostics attached.

    brandonserver-diagnostics-20230706-1626.zip

    Link to comment
    6 hours ago, ch3vr0n5 said:

    Syslog didn't capture anything from the crash. Not sure how I am supposed to get the full kernel panic.

    did you

    On 7/1/2023 at 1:31 PM, JorgeB said:

    Enable the syslog server and post that after a crash.

     

    Link to comment

    So, I've been going at this for 3 weeks now and my poor disks are on their (IDK for sure) 10th parity check...  After doing all I can gather from forums, checking hardware and changing from MACVLAN to IPVLAN the issue still persists.  I've moved to new hardware (HP-Z820 to a Supermicro X9DRi-F) and replaced the flash drive.  New flash drive started with 6.12.0 and upgraded to .1 and .2 to see if helps resolve this issue but to no avail.  

     

    Original config was 9 drives. 7- 4TB drives in the array and 2- 8TB drives in a mirror. I've replaced the 2 parity drives with 8TB drives and 2 of the 4TB drives with 8tb drives(one at a time letting it rebuild the drive).  So as it sits I've got 11 drives in a 44tb array and the 2 original 8tb drives in the mirror still.  I am trying to use UNBALANCE(stopped all docker containers) to move the data from the mirror to the array then plan on adding those 2 drives to the array.  It keeps locking up at some point during the process of the move and I am forced to hard reboot.  I am still able to access via IPMI but at the physical console or IPMI console Unraid is unresponsive. 

     

    I would love to see this issue get figured out but I am considering copying the config folder and reloading the flash drive with 6.11.5.

     

    I just enabled syslog and mirror to flash and now going to start the copy again.  Yesterday I came back from a lock up and hard reboot and about 45% into the parity sync it happened again.  I've got the Diag from one of the instances but they don't have shit, I dug through each file and found nothing while I was waiting for someone to chime in on another topic I created and now found this one to tag along in.  If and when it happens again I'll post the diagnostics.  Only docker container I have running is the TwinGate connector. It's really useful with this because I get the notification when it goes offline. 

    Link to comment
    8 hours ago, ChatNoir said:

    did you

     

    Yes. I have syslog enabled and pointed to itself for a copy on an array share. I also have it mirrored to the flash drive to try and diagnose this issue. Neither version captured the kernel panic. As I mentioned It goes from the last logged line of the SMTP service (sending me an email warning for a drive temp due to in-progress parity check) straight to the boot-up line from when I had to hard-boot it. The only piece of the trace I have is from the monitor I keep hooked up to it just in case. If it crashes again and I can't capture a trace I will likely just roll back to 6.11.5.
     

    PXL_20230706_211553792.jpg

    Link to comment

    Same problems here. 100% zfs usage and nothing in syslogserver about the crash. This started happening a couple of days after converting my nvme cache that stores vm's and dockers to zfs. I'm dedicating 64GB of my 128GB to ZFS but the total ram usage was around 80% when crashing

    Edited by joggs
    Link to comment
    11 hours ago, joggs said:

    100% zfs usage

    This part is normal, it just means the ARC if being fully used, as it's supposed to be.

    Link to comment
    On 7/7/2023 at 9:10 AM, ch3vr0n5 said:

    Yes. I have syslog enabled and pointed to itself for a copy on an array share. I also have it mirrored to the flash drive to try and diagnose this issue. Neither version captured the kernel panic. As I mentioned It goes from the last logged line of the SMTP service (sending me an email warning for a drive temp due to in-progress parity check) straight to the boot-up line from when I had to hard-boot it. The only piece of the trace I have is from the monitor I keep hooked up to it just in case. If it crashes again and I can't capture a trace I will likely just roll back to 6.11.5.

    Well... After this last crash I let the parity sync again and then started one container, Plex. It ran for about 3 days without issue. I started up the rest of my containers and so far it has been up for just shy of 7 days now. Perhaps the macvlan->ipvlan switch worked but needed a reboot to stick. Initially I just made the change and started the array/docker service which I thought would suffice. I will report back if anything changes.

    Link to comment
    On 7/6/2023 at 11:58 PM, tjsyl said:

    So, I've been going at this for 3 weeks now and my poor disks are on their (IDK for sure) 10th parity check...  After doing all I can gather from forums, checking hardware and changing from MACVLAN to IPVLAN the issue still persists.  I've moved to new hardware (HP-Z820 to a Supermicro X9DRi-F) and replaced the flash drive.  New flash drive started with 6.12.0 and upgraded to .1 and .2 to see if helps resolve this issue but to no avail.  

     

    Original config was 9 drives. 7- 4TB drives in the array and 2- 8TB drives in a mirror. I've replaced the 2 parity drives with 8TB drives and 2 of the 4TB drives with 8tb drives(one at a time letting it rebuild the drive).  So as it sits I've got 11 drives in a 44tb array and the 2 original 8tb drives in the mirror still.  I am trying to use UNBALANCE(stopped all docker containers) to move the data from the mirror to the array then plan on adding those 2 drives to the array.  It keeps locking up at some point during the process of the move and I am forced to hard reboot.  I am still able to access via IPMI but at the physical console or IPMI console Unraid is unresponsive. 

     

    I would love to see this issue get figured out but I am considering copying the config folder and reloading the flash drive with 6.11.5.

     

    I just enabled syslog and mirror to flash and now going to start the copy again.  Yesterday I came back from a lock up and hard reboot and about 45% into the parity sync it happened again.  I've got the Diag from one of the instances but they don't have shit, I dug through each file and found nothing while I was waiting for someone to chime in on another topic I created and now found this one to tag along in.  If and when it happens again I'll post the diagnostics.  Only docker container I have running is the TwinGate connector. It's really useful with this because I get the notification when it goes offline. 

    Well, found out one of the 16GB ram sticks was indeed bad. $14.13 later and its been solid for well over 24 hours. One parity sync down and 2.2 of 5.5 TB moved so far.  

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...