Jump to content

Disk errors


dzyuba86
Go to solution Solved by dzyuba86,

Recommended Posts

4 hours ago, dzyuba86 said:

no smart errors on dashboard

spin up disk6 and look again.

 

Serial Number:    WD-WCC4N0PYELJN
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   193   193   140    -    207
No self-tests have been logged.

Run an extended SMART self-test on disk6

 

On 12/1/2022 at 9:02 AM, trurl said:

Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected?

 

Link to comment
12 minutes ago, trurl said:

spin up disk6 and look again.

 

Serial Number:    WD-WCC4N0PYELJN
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   193   193   140    -    207
No self-tests have been logged.

Run an extended SMART self-test on disk6

 

 

I'll check in a few hours once parity check finishes. I check my dashboard 5+ times a week but don't have email alerts set up. Might need to set something up at this point. Also picked up 2 new drives to replace the RMA 10TB drive and the 8TB that's having issues. 

Link to comment

Claims data rebuild was complete but disk 7 still showing same error on dashboard. I pulled a diagnostic log. Going to shut down and re-balance the power load on the wires to see if it'll help. And I can put the new 10TB drive in. Looks like the 3TB drive has too many reallocated sectors so it's in a pre-fail stage. I'll use the RMA drive I'll get in a few weeks to swap that one out.

plexnas-diagnostics-20221202-1842.zip

Link to comment
2 hours ago, dzyuba86 said:

disk 7 still showing same error on dashboard

There is an Unraid webUI page actually named Dashboard, but I think you may be using that word differently.

 

Looks like disk7 dropped and reconnected as a different device so Unraid lost track of it again.

 

Disabled/emulated disk7 is also unmountable. You should repair filesystem on the emulated disk before attempting to rebuild to the same disk. Better yet, rebuild to another disk and keeping the original will give more options to recover.

 

 

Link to comment
16 minutes ago, trurl said:

There is an Unraid webUI page actually named Dashboard, but I think you may be using that word differently.

 

Looks like disk7 dropped and reconnected as a different device so Unraid lost track of it again.

 

Disabled/emulated disk7 is also unmountable. You should repair filesystem on the emulated disk before attempting to rebuild to the same disk. Better yet, rebuild to another disk and keeping the original will give more options to recover.

 

 

I currently have brand new 10TB Seagate ironwolfs in disk 1 and 7 and it's running a rebuild. Disk 7 is still showing unmountable: unknown or no file system message though. Seems like my disk 7 is really messed up. 

Link to comment
5 hours ago, dzyuba86 said:

Disk 7 is still showing unmountable: unknown or no file system message though.

A rebuild will not correct this state - if an emulated disk is showing as unmountable then the rebuilt disk will be the same.

 

the correct handling of unmountable disks is covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page.

Link to comment
19 hours ago, itimpi said:

A rebuild will not correct this state - if an emulated disk is showing as unmountable then the rebuilt disk will be the same.

 

the correct handling of unmountable disks is covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page.

I've done a check with -nv parameter several times now. Not sure what else to do to make it stop showing as an unmountable disk.

Link to comment
2 hours ago, dzyuba86 said:

I've done a check with -nv parameter several times now. Not sure what else to do to make it stop showing as an unmountable disk.

You need to run without the -n option if you want any changes to be made - the -n (no modify) option makes it a read check only.   If you post the output of a run using nv we can give you an idea of how well the repair would go.

Link to comment
8 hours ago, itimpi said:

You need to run without the -n option if you want any changes to be made - the -n (no modify) option makes it a read check only.   If you post the output of a run using nv we can give you an idea of how well the repair would go.


Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
sb_icount 64, counted 32
sb_ifree 61, counted 29
sb_fdblocks 1952984849, counted 1952984853
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 3
        - agno = 2
        - agno = 7
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
SB summary counter sanity check failed
Metadata corruption detected at 0x47a15b, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1
SB summary counter sanity check failed
Metadata corruption detected at 0x47a15b, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!

fatal error -- File system metadata writeout failed, err=117.  Re-run xfs_repair.

Link to comment

This is disk 6 repair output

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

Link to comment

That repair log initially started off very well in a form that would normally indicate a repair without any data loss.   However you then got errors in Phase 7 that I have never seen before and the xfs_repair aborted.   Not sure what this means.  maybe somebody else might know?
 

Just in case it is RAM related it might be worth running memtest if you have not already done so.

Link to comment
Just now, dzyuba86 said:

This is disk 6 repair output

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

This is what I expect when the repair has run perfectly and there is no potential data loss.

Link to comment
  • Solution
7 hours ago, itimpi said:

That repair log initially started off very well in a form that would normally indicate a repair without any data loss.   However you then got errors in Phase 7 that I have never seen before and the xfs_repair aborted.   Not sure what this means.  maybe somebody else might know?
 

Just in case it is RAM related it might be worth running memtest if you have not already done so.

4 hours in now. No errors found yet. It's on pass#4 now. I doubt it'll find anything now. At this point I think my only option is to reformat the drive. lose the data and have radarr/sonarr re-acquire what I lose on that drive. I really have no other idea what I can do now. Also thinking that the 8TB drive that was in that slot is still good. SO I'll use it to replace disk 6 which is showing old age pre-fail errors.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...