• BTRFS error


    henrik38
    • Solved

    Hey,

    My SSD pool is somewhat broken and set itself to read only; please help. 😅
     

    I changed my RAM today, and it worked fine afterward. However, I switched it back to the old RAM. Now, the server is in a state in which it has been working without any issues for days. The only thing I did today besides the RAM change was installing a Docker.

     

    Henrik

    unraidserver-diagnostics-20240130-1608.zip




    User Feedback

    Recommended Comments

    The pool filesystem is corrupt, suggesting backing up and reformatting.

     

    If you have been changing the RAM it would also be a good idea to run memtest.

     

    P.S. this is not a bug but a general support issue, next time please use the general support forum instead.

    Link to comment

    Oh, sorry, I didn't know about the other forum; I'll note it down for future requests.
     

    Isn't there a way to just delete the corrupted file or fix it somehow else? Copying the appdata stuff takes ages.
     

    I thought having a redundant cache would protect me against drive or filesystem failures.

    Thx for your help!

    Link to comment

    Redundant pool helps with a failed device, not with filesystem corruption, for the current errors I would recommend copying everything and reformatting.

     

     

    Link to comment

    Yeah, I'm attempting to transfer the files to my PC, but it appears that many files are corrupted. I'm unable to copy or open them, although I can open the same file from a backup made yesterday.

     

    I have a backup for the crucial files, but not for my AppData folder, which somewhat important data like game save files, and all my configs.

     

    Can't I simply run this '--repair' command and have the filesystem repair itself? After all, it seems there's only one bad "sector" or whatever its called.

    Link to comment
    2 minutes ago, henrik38 said:

    Can't I simply run this '--repair' command and have the filesystem repair itself?

    You can try as a last resort, but note that it may do more harm than good.

    Link to comment

    Is there no other option?
     

    Either I don't have the files, or I use this command without understanding what it does, which could potentially cause more harm.
     

    Neither option seems appealing.

    Link to comment

    For everyone seeing this in the future: Accessing the cache pool as a disk share eliminated 95% of the "corrupted" files. I simply copied them to my PC, formatted the pool, and then copied them back onto the server. Everything seems to work now.

    -----
     

    First of all, thank you for your help. I'm not sure if you moderators get paid or do this voluntarily, but either way, I really appreciate the support provided through this forum.
     

    A few weeks ago, you assisted me swiftly and effectively with a problem. However, I must admit that this time I was quite disappointed.


    When I reach out in this forum, I have a problem and expect to be communicating with experts who can guide me or offer direct assistance. Consequently, I trust what you say nearly 100%.


    For this reason, I find it really unfortunate that your only solution was to suggest copying the files, especially since I told you that many files were corrupted and you didn’t offer any alternative ideas, like using the disk share instead of the appdata share.


    If I hadn't tested it myself, I might have lost many files or caused further damage with the repair command. The idea of just copying everything is something I could have come up with on my own. Receiving advice on how to salvage the damaged files would have been much more helpful.
     

    Especially when it comes to data loss, I can’t comprehend why the responses are extremely short, sporadically and lack substantial solutions, as well as dos and don'ts to prevent further damage.
     

    Moreover, I don't appreciate that you simply marked the ticket as "closed." Whether it's in the right category or not is irrelevant to me at that moment. You're more than welcome to move it to the correct category or notify me about it! But closing the topic when it's obviously not resolved is, to put it mildly, really unacceptable.

     

    Despite this, I'm very grateful for the help from you and the forum in general.

    Link to comment

    Moderators are just unpaid volunteers, not much different from your other fellow unraid users who try to help people on this forum, except we also volunteered for the extra work of moderation.

     

    Sometimes we don't see your post immediately because we are in different time zones, and life gets in the way.

     

    I hadn't seen any of this until just now. If you had requested additional explanation, and waited for it, you probably would have gotten it. Maybe sometimes you have to ask again

     

    We read a lot of posts and might be busy with someone else with even bigger problems. I know I have been lately.

     

    Glad you got it sorted out.

    Link to comment

    @henrik38

     

    You should consider the appdata backup plugin.

     

    Also

    7 hours ago, JorgeB said:

    a good idea to run memtest.

    did you?

    Link to comment
    Quote

    Moderators are just unpaid volunteers, not much different from your other fellow unraid users who try to help people on this forum, except we also volunteered for the extra work of moderation.

    Oh, interesting. I had assumed that the moderators were part of the official Unraid support team and worked for Unraid.
     

    I'm not angry or anything! The solution was ultimately the right one, and everything important is working again. I was just a bit frustrated and wanted to give feedback. Additionally, it's quite different since you are doing it voluntarily.
     

    As I said, I am very grateful for the forum. Without it, I would have encountered problems more often.🤣
     


     

    Quote

    did you?

    Yeah, I will let it run overnight; 64GB will probably take a few hours from what ive read online. I've never used MemTest before.


     

    Quote

    You should consider the appdata backup plugin.

     

    Will look into it. I'm currently using LuckyBackup, but backing up all the small 50k+ files from Plex, for example, doesn't work very well.




     

    I have a few questions. You don't have to answer them directly. Feel free to send a link to an explanation or something similar. Im just not quite sure what to search for to get the answeres from google, haha:
     

    - What "exactly" was damaged? Was it a single file, a bit that was different from what it should have been?
     

    - Why did I have a significantly higher number of corrupted files when I accessed them via "/mnt/usr/appdata", but far fewer when I used the disk share directly?
     

    - How can such an error occur? What can I do differently or better to prevent this problem from happening again?
     

    - I've now encountered a problem with the btrfs filesystem for the second time. Is there a better alternative that also supports RAID 1?
     





    Maybe a bug report, I don't know:
    When I started the array today, it went straight into a parity check while (probably) the issue with the Btrfs cache pool existed.
     

    When I then accessed the interface, I could see that the CPU usage was pinned at 100%. After a few seconds, the server completely crashed. In the log, I could see that there was a "CPU Stall." The server could only be shut down by unplugging it.
     

    I was only able to break out of this loop when I spammed the cancel button for the parity check at the array start.




    Sorry for the wall of text.
    And again, thank you, I find it really remarkable that you do this voluntarily!

    Link to comment

    The filesystem can be thought of as data about the data, or "metadata". It is not really the contents of the files, but instead, data about how the folders and files are organized and represented as bytes on the disks.

     

    When files are written, this metadata is also updated so it can be used to retrieve the files. If something causes disk writes to fail, such as a bad connection or bad disk, not only can the file be wrong, but the metadata can be wrong. Wrong metadata is usually what is meant by corruption. It can cause problems retrieving existing files, and problems writing new files. Often the filesystem will be made readonly to prevent further damage.

     

    Your diagnostics were without the array started, so I couldn't see anything about where your appdata was. Possibly some of it wasn't on the "disk share" you looked at. You can see how much of each disk/pool is used by each user share by clicking Compute... for the share on the User Shares page.

     

    An automatic parity check when you boot is due to "unclean shutdown". Here is a link to a post which explains this in more detail:

    https://forums.unraid.net/topic/86385-docker-containers-have-the-be-off-when-you-power-off-the-array/?do=findComment&comment=801379

    and another link to the sticky thread in General Support discussing things to do if you are having problems shutting down cleanly:

    https://forums.unraid.net/topic/69868-dealing-with-unclean-shutdowns/

     

    Link to comment

    btrfs seems to be especially prone to break if you fill it up too much. In general, whatever the filesystem, you should always keep some free space on each of your disks/pools.

     

    You have no Minimum Free set for either of your pools. If a pool has less than Minimum Free, it will overflow to the array for shares that have Secondary storage.

    Link to comment
    9 hours ago, henrik38 said:

    especially since I told you that many files were corrupted and you didn’t offer any alternative ideas

    Only now saw this, you need to have some patience, we are not all in the same time zone, glad you've managed to recover most of your data.

    Link to comment
    8 hours ago, henrik38 said:

    I've now encountered a problem with the btrfs filesystem for the second time. Is there a better alternative that also supports RAID 1?

    That can be the result of an underlying hardware issue, if memtest doesn't find anything suggest trying zfs, if you later also have issues with zfs it suggests some hardware problem exists.

    Link to comment
    12 hours ago, henrik38 said:

    Accessing the cache pool as a disk share eliminated 95% of the "corrupted" files

    BTW:  With the Unraid 6.12.x releases you can achieve the same results on a User Share as using a disk share if the share in question is all on one device/pool and you have enabled the Exclusive share option under Settings->Global Share settings.

    Link to comment

    Thanks for all the answers.

     

    The new RAM sticks passed MemTest with three passes and have been working fine since then.

     

    I've also installed the AppData backup plugin, and it works really great.

     

     

    Quote

    That can be the result of an underlying hardware issue, if memtest doesn't find anything suggest trying zfs, if you later also have issues with zfs it suggests some hardware problem exists.

    I set it up again yesterday with Btrfs, but if I encounter another problem, I will try ZFS.

    Now that I have a backup of everything, it's much less of a problem if something like this happens again.


     

    Quote

    btrfs seems to be especially prone to break if you fill it up too much. In general, whatever the filesystem, you should always keep some free space on each of your disks/pools.

    The only thing that was full at some point was the RAM; it was at 100% because of one Docker, and the whole server crashed. Maybe that caused it, I don't know.

    Apart from that, I have set a minimum free space as well.


    @itimpi Thanks, that's good to know!

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.