• Parity Sync Errors after every reboot?


    kubed_zero
    • Minor

    I tried searching around but couldn't find much on this. As long as I don't reboot Unraid, it won't have parity errors. Most recently, I ran the parity three times in a row without rebooting, and it reported 0 sync errors each time. I then rebooted (safely, stopping the array and then rebooting) and started another parity check, and now there are errors. I see a consistent 1025 errors on this particular Unraid box when errors do pop up, which is suspicious. Looking at the parity check history, there are only ever errors after a reboot. 

     

    Parity History, showing the last three runs had zero errors:

    994657288_2023-09-1423_13_37-STORAGE-UNRAID_Main.thumb.png.3cd5b7fab05dd1e99ced735f0963a237.png

    I then immediately rebooted and started another parity check, taking this screenshot:

    124796878_2023-09-1423_13_51-STORAGE-UNRAID_Main.png.9e985d3d83e4f034bd4c6fd251351366.png

     

     

    This is a Supermicro motherboard with ECC RAM. Unraid is running as an ESXi 6.7 VM, with the Intel SATA controller passed through to the VM

    I have a second Unraid server with the same setup (albeit newer hardware and newer ESXi), and it does something similar: 0 errors on repeated Parity Check operations, but the second I reboot and try to run a parity check, it'll start finding errors. 

    Both Unraid systems have been running as VMs for a few years, and did not always have this issue.

     

    1882553466_2023-09-1423_44_40-Poorbox_Main.thumb.png.105a8c4f35e2b563ea8400002ea00b02.png

    I've tried running New Permissions just in case something wacky happened to some of the files, but that did not help.

     

    Diagnostics from both systems attached. The parity check errors can be seen in one of them. I can grab new/better diagnostics later if need be. 

     

    I'm looking for help in troubleshooting next steps, as this leaves me less confident in restoring valid data should a drive fail. 

    I did find this blog post https://blog.insanegenius.com/2020/01/10/unraid-repeat-parity-errors-on-reboot/ which has the same errors I did, "Jan 3 10:03:07 Server-2 kernel: md: recovery thread: P corrected, sector=1962934168" 

    but I don't think it's relevant in this case as I'm just using the SATA ports directly on the motherboards, without using any LSI or SAS/HBA cards. 

    diagnostics.zip




    User Feedback

    Recommended Comments

    6 hours ago, JorgeB said:

    Does the same thing happen if you run Unraid baremetal?

    I can't say for certain, this is not a scenario I can test another VM on the machine is providing me network access to Unraid. 

    I want to emphasize though that in both of these cases, Unraid ran fine as a VM for 3-5 years, and it's only in the past few months that I've been seeing this. 

    Link to comment

    Virtualizing Unraid is not officially supported, AFAIK no other users reported a similar issue, so unlikely that this can be replicated by LT, and without confirming if it's still happens baremetal, not sure much can be done, but it's just my opinion, I can't speak for LT.

    • Confused 1
    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.