• Unable to format new drives, system locks up (RC5)


    je82
    • Solved Urgent

    Hello,

    Ever since i updated to RC5 my system is unable to format new drives. Here's the log after preclear finished email was sent:

     

    This failure can be reproduced on my system each time a preclear is done and i click format it happens in RC5, i had no trouble in the current stable build.

     

    If you need more data please let me know, i figured i'd drop the report here in case no one had seen the issue. Any idea if this is patched in RC6?

     

    Thanks.

    syslog.txt

    • Thanks 1



    User Feedback

    Recommended Comments



    Done, sorry about that.

     

    I upgraded from RC5 to RC6 and formating appears to work, but it's still very strange, here's a quick compare of the same type of disk/model 8tb:

     

    image.png

     

    Disk 9 was formated using XFS Encrypted through 6.7.2 STABLE while Disk 10 was formated using XFS Encrypted through 6.8.0-RC6, check the usage value... something is wrong with RC and formating using XFS Encrypted?

     

    Ive tried formating it multiple times and it always comes up as 55.8gb rather then the usual 8gb.

    Edited by je82
    Link to comment

    Appears there's something wrong with the RC releases, when formating a new drive to xfs encrypted in 6.8.0-RC6.

     

    See this thread for more information, i ended up downgrading to 6.7.2 stable and now formating works properly.

     

     

     

    Link to comment

    Its funny that you noticed this behavior as I had the same issue happen to me last night.  I had just received 2 new drives and threw them into the server and started a preclear on both.  After the preclear finished, I went to format the two drives and all hell broke loose.

     

    I first noticed that the format operation was taking too long, but walked away and came back maybe 15 minutes later and saw it was still formatting (according the the GUI via web). tried refreshing the page and the server wasn't responding.  I then noticed that some open shares on another computer disconnected.  Check the console (it runs semi-headless) and saw a massive amount of text flying through the screen - too much/fast to read any of it.

     

    I rebooted the server, and has since formatted the drives and they are part of the pool.... but I remember the "used" capacity after formatting being more than usual as well (~4gb for a 4TB drive if i remember right, dont remember what these 2 show now).

     

    I'm running RC5 right now but have no diagnostics [yet] as i thought it was possibly a hardware failure (lost some sleep over this) so didn't think to grab anything.  I just wanted to throw my hat in and say its not just you

    Link to comment
    2 minutes ago, bonienl said:

    With or without encryption?

    Sorry forgot to mention that - without encryption.

     

    I'm not hurting for storage space right now so I can remove those 2 drives and start the process over with logging to get some diagnostics.  I'm at work now so it'll have to wait until this evening

    Link to comment
    8 hours ago, je82 said:

    Ive tried formating it multiple times and it always comes up as 55.8gb rather then the usual 8gb.

    The increase in metadata size is probably due to several enhancements done with xfs over the past year.  I don't think this is out of the ordinary.

     

    re: your fist post: the syslog shows a crash but I need more context that is provided by attaching diagnostics.zip from Tools/Diagnostics page.

    Link to comment
    5 hours ago, limetech said:

    The increase in metadata size is probably due to several enhancements done with xfs over the past year.  I don't think this is out of the ordinary.

     

    re: your fist post: the syslog shows a crash but I need more context that is provided by attaching diagnostics.zip from Tools/Diagnostics page.

    When it crashed i could not generate diagnostics, but this problem seems easily recreatable on RC5/6 as it happens to other people as well (see other person in thread who wrote they had the same issue).

     

    My diagnostics and more details is available here:

     

    I cannot diagnose the problem anymore, i rolled back to the stable version as RC was far to shaky for me at this point

    Link to comment

    I can enable logging to flash and *try* adding the drive to the pool again this evening (preclearing and formatting) to see what happens but I'm hesitant since I just got my parity rebuilt. I'm actually leaning towards finding an old 1TB drive to test with so I'm not waiting to preclear an entire 4TB drive just for testing.  Any thoughts?

    Link to comment
    2 hours ago, civic95man said:

    I can enable logging to flash and *try* adding the drive to the pool again this evening (preclearing and formatting) to see what happens but I'm hesitant since I just got my parity rebuilt. I'm actually leaning towards finding an old 1TB drive to test with so I'm not waiting to preclear an entire 4TB drive just for testing.  Any thoughts?

    I have only had my system crash when doing the preclear before format, if you only do format (on already precleared drive) the system does not crash, but you'll see that the XFS partition is taking up alot more then regular, at least when you're doing XFS encrypted.

     

    If you check your logs you'll see the drive is not encrypted with xfs encrypted even though the gui tells you it is. I have not tested what happens if you put data on this faulty formated parition though.

     

    If you downgrade to the stable build the RC formated drives shows up as unknown filesystem and needs to be formated in order to use them. I wouldn't want to do these experiments on a production build, i already had my 100tb array crash twice, parity checks takes me 20 hours, i don't want to do that again, running stable from now on!

    Edited by je82
    Link to comment
    8 minutes ago, je82 said:

    but you'll see that the XFS partition is taking up alot more then regular, at least when you're doing XFS encrypted.

    This is kernel related, it also takes much more space when unencrypted, e.g. on a 500GB drive it was around 500MB on older kernel, now it's 3.7GB.

    Link to comment
    Just now, johnnie.black said:

    This is kernel related, it also takes much more space when unencrypted, e.g. on a 500GB drive it was around 500MB on older kernel, now it's 3.7GB.

    so this is actually working as intended? any idea what new features the massive overhead brings?

    Link to comment
    Just now, je82 said:

    any idea what new features the massive overhead brings?

    No, I don't follow xfs development, but I know they are working on a lot of new features, like scrubs, snapshots, etc, and this extra overhead might be preparation for those and more.

    Link to comment
    15 minutes ago, je82 said:

    If you downgrade to the stable build the RC formated drives shows up as unknown filesystem and needs to be formated in order to use them

    Thinking more about this. It could well be that the older version of XFS in Unraid 6.7 doesn't understand all new features introduced in the latest version of XFS in Unraid 6.8. In other words no backward compatiblity.

     

    Link to comment
    3 minutes ago, bonienl said:

    Thinking more about this. It could well be that the older version of XFS in Unraid 6.7 doesn't understand all new features introduced in the latest version of XFS in Unraid 6.8. In other words no backward compatiblity.

     

    yep i realize this too, so the only bug is probably the crash that comes right after preclearing and then formating.

     

    if anyone has a test rigg they could try recreating the issue by doing that, i've already disassembled my test build but maybe i can put it together this weekend and see if i can log the problem.

    Link to comment
    2 minutes ago, je82 said:

    so the only bug is probably the crash that comes right after preclearing and then formating.

    This is happening to you if you preclear a disk with the plugin and then format xfs encrypted?

    Link to comment
    Just now, johnnie.black said:

    This is happening to you if you preclear a disk with the plugin and then format xfs encrypted?

    Don't know, the two times it happened for me was:

    1. I have an already working array, with 2x cache btrfs drives and 2x parity drives.

    2. I add a totally new drive

    3. Start array, preclear starts..

    4. Click format, that's when it starts to crash. Webgui will respond in the beginning but the longer it goes the less things are responding and it seems to be stuck formating, the format never completes it just says "Formating...".

    5. You eventually try to do a graceful shutdown, it wont work.

     

    I've only let it run for like 20 minutes during the format process though, maybe something comes out of it if you let it run longer.

    Link to comment
    1 minute ago, je82 said:

    2. I add a totally new drive

    3. Start array, preclear starts..

    That's a "clear", "preclear" is when the clearing is done before adding to the array, e.g. using the preclear plugin.

     

    I'll make a test when I can.

    Link to comment
    14 minutes ago, bonienl said:

    Thinking more about this. It could well be that the older version of XFS in Unraid 6.7 doesn't understand all new features introduced in the latest version of XFS in Unraid 6.8. In other words no backward compatiblity.

    I just tested and Unraid v6.7.2 mounted a xfs drive formatted with v6.8, though it still uses the extra overhead, I would be surprised if it wasn't possible to mount with earlier kernel, at least recent releases.

    Link to comment
    3 minutes ago, bonienl said:

    Not sure if encryption plays a role here. Possible to test?

     

    The other guy who posted in this thread that this happened to used XFS but not encrypted as far as i know so probably no

     

    On 11/18/2019 at 6:38 PM, civic95man said:

    Sorry forgot to mention that - without encryption.

     

    Link to comment
    11 minutes ago, bonienl said:

    Not sure if encryption plays a role here. Possible to test?

    Yes, with encryption it doesn't mount, because it's not detecting the disk is encrypted.

    Link to comment
    Just now, johnnie.black said:

    Yes, with encryption it doesn't mount, because it's not detecting the disk is encrypted.

    no that's not the problem here, the reason its not detecting as the disk is encrypted is because 6.7.2 with the old kernel doesn't understand the modern xfs format. please forget anything said about the disk size being to big, this issue is essentially only about the hard crash that occurs on RC builds after a preclear has finished and you click format.

    Link to comment
    1 minute ago, je82 said:

    no that's not the problem here, the reason its not detecting as the disk is encrypted is because 6.7.2 with the old kernel doesn't understand the modern xfs format.

    Like I said:

     

    18 minutes ago, johnnie.black said:

    I just tested and Unraid v6.7.2 mounted a xfs drive formatted with v6.8, though it still uses the extra overhead, I would be surprised if it wasn't possible to mount with earlier kernel, at least recent releases.

     

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.