Third time my disk became unmountable, and 3rd time it was "fixed". Should I replace it?


12 posts in this topic Last Reply

Recommended Posts

This is the 3rd time in 6 months that my disk #3 (out of 5) has become "unmountable".  Each time, I go into maintenence mode, and did a check/repair with option -L.  Here is what it showed after this last -L repair.  Do I need to replace the drive?  This latest instance of becoming unmountable happened after a power cycle.  I do not do the -n anymore, but I can next time, it you think it is needed to know what is going on.

 

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
Log inconsistent (didn't find previous header)
failed to find log head
zero_log: cannot find log head/tail (xlog_find_tail=5)
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Metadata CRC error detected at 0x45d399, xfs_dir3_leafn block 0x161508f48/0x1000
bad directory leaf magic # 0x6f93 for directory inode 6442451074 block 8388609
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 0
        - agno = 3
        - agno = 2
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
Metadata corruption detected at 0x45553e, xfs_da3_node block 0x161508f48/0x1000
unknown magic number 0x6f93 for block 8388609 in directory inode 6442451074
rebuilding directory inode 6442451074
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (4:65268) is ahead of log (1:2).
Format log to cycle 7.
done

Thanks for your thoughts.

 

Link to post
6 hours ago, xrqp said:

Do I need to replace the drive?

 

You've only provided information about the file system, not the actual drive so it isn't possible to say. Your diagnostics might shed some light. Tools -> Diagnostics and post the zip file.

 

Power problems could cause file system corruption so you might want to check the power cable to Disk 3 and replace/remove any splitters.

Link to post
  • 2 weeks later...
Posted (edited)

Problems again.  Does not say "not mounted" this time.   Here is what "fix common problems" showed:

image.thumb.png.0deaf63cc2d28a6e63041747ce14589c.png

But it is not full or read only.  It does say "disk 3 not defined/installed in the array".  Which sounds similar to unmountable.

 

Here is my main screen 

image.thumb.png.edd04f8bf962e521e612a343c9c2897f.png

It looks normal.  When I get unmmountable error, it does not look normal like this.  

 

Here is the diagnostic zip before repair.

tower-diagnostics-20210503-0046.zip

Edited by xrqp
Link to post

Filesystem is corrupt again,

On 4/19/2021 at 11:07 AM, JorgeB said:

if it keeps happening to the same filesystem only it might be worth backing up and the re-creating that filesystem.

 

Link to post
17 hours ago, JorgeB said:

re-creating that filesystem.

Do you mean the disk repair that is done with -L  in the "Check file system" ?

 

Did you see anything in the diagnostic.zip?

Link to post
Posted (edited)

I am starting to think it is normal and common to Now I am getting orange warnings for the unassigned disk I just started using.  I got simialr mesages for other disks.  get these mild warnings (yellow, not red).  Bad orange, Good green, repeat, repeat.  

image.png.f9831e132d3986dba59c8fafc9b8df77.png

 

I tried improving the power cables to the disks.  Next I may try to use more motherboard sata connections, and fewer of SAS to SATA i have using the approved "Dell H310 6Gbps SAS HBA w/ LSI 9211-8i P20 IT Mode for unRAID" which has 8 SAS 

I got SAS cables on ebay cheap ($10 each) 2x Mini SAS to 4-SATA SFF-8087.   Do you think the SAS cables are often a problem?

I will keep testing things, including replacing a disk or two, until I get fewer orange messages.

 

Lately, I have problems with the array not showing files that exist, untiI reboot, then they show up again but only for a few minutes sometimes.

Edited by xrqp
Link to post
4 hours ago, xrqp said:

Do you mean the disk repair that is done with -L  in the "Check file system" ?

No, I meant reformatting the disk.

 

 

 

 

Link to post

I was going to replace disk 3, but before I could, now my disk 1 was disabled by Unraid.  So I ran file system check with -L, but that changed nothing.  Then I ran file system check (-n) and got this:


Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 2
        - agno = 7
        - agno = 0
        - agno = 4
        - agno = 6
        - agno = 5
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

 

Next ran file system check with no switches (no -n, and no -L) gets:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 1
        - agno = 5
        - agno = 7
        - agno = 2
        - agno = 6
        - agno = 4
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

 

Then I stopped array, then restarted it in normal mode.  Disk 1 is still disabled and emulated. Reboot did not help either.  I will try to replcae disk 1 now.

 

 

Link to post
1 hour ago, xrqp said:

I was going to replace disk 3, but before I could, now my disk 1 was disabled by Unraid.  So I ran file system check with -L, but that changed nothing.  Then I ran file system check (-n) and got this:



Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 2
        - agno = 7
        - agno = 0
        - agno = 4
        - agno = 6
        - agno = 5
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

 

Next ran file system check with no switches (no -n, and no -L) gets:


Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 1
        - agno = 5
        - agno = 7
        - agno = 2
        - agno = 6
        - agno = 4
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

 

Then I stopped array, then restarted it in normal mode.  Disk 1 is still disabled and emulated. Reboot did not help either.  I will try to replcae disk 1 now.

 

 

Running a file system repair does not re-enable a disabled disk.   Since the disk is disabled UnRaid will be ‘emulating’ it against the repair would have run against the emulated drive rather than the physical drive.  It is most commonly used to clear an’ unmountable’ state resulting from file system corruption.

 

the standard way to clear a disabled state is to rebuild the contents of the ‘emulated’ drive onto a physical drive.   Before doing so you should check that the contents of the emulated drive are what you expect as what you see on the emulated drive is what you will see on the rebuilt one.

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.