Disk errors

dzyuba86 · December 2, 2022

Adding 2 more drives I needed to add a new splitter. I'm probably overloading the port on the power supply. When I get home I'll add another modular plug and split the power load between 2 and see if that helps. I believe currently 12 drives are on one mod plug.

trurl · December 2, 2022

4 hours ago, dzyuba86 said:

no smart errors on dashboard

spin up disk6 and look again.

Serial Number:    WD-WCC4N0PYELJN
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   193   193   140    -    207
No self-tests have been logged.

Run an extended SMART self-test on disk6

On 12/1/2022 at 9:02 AM, trurl said:

Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected?

dzyuba86 · December 2, 2022

12 minutes ago, trurl said:
spin up disk6 and look again.
Serial Number:    WD-WCC4N0PYELJN
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   193   193   140    -    207
No self-tests have been logged.
Run an extended SMART self-test on disk6

I'll check in a few hours once parity check finishes. I check my dashboard 5+ times a week but don't have email alerts set up. Might need to set something up at this point. Also picked up 2 new drives to replace the RMA 10TB drive and the 8TB that's having issues.

dzyuba86 · December 2, 2022

Smart test for Disk 6.

plexnas-smart-20221202-1537.zip

dzyuba86 · December 2, 2022

Claims data rebuild was complete but disk 7 still showing same error on dashboard. I pulled a diagnostic log. Going to shut down and re-balance the power load on the wires to see if it'll help. And I can put the new 10TB drive in. Looks like the 3TB drive has too many reallocated sectors so it's in a pre-fail stage. I'll use the RMA drive I'll get in a few weeks to swap that one out.

plexnas-diagnostics-20221202-1842.zip

trurl · December 3, 2022

2 hours ago, dzyuba86 said:

Smart test for Disk 6.

That is only the short test

trurl · December 3, 2022

2 hours ago, dzyuba86 said:

disk 7 still showing same error on dashboard

There is an Unraid webUI page actually named Dashboard, but I think you may be using that word differently.

Looks like disk7 dropped and reconnected as a different device so Unraid lost track of it again.

Disabled/emulated disk7 is also unmountable. You should repair filesystem on the emulated disk before attempting to rebuild to the same disk. Better yet, rebuild to another disk and keeping the original will give more options to recover.

dzyuba86 · December 3, 2022

16 minutes ago, trurl said:

There is an Unraid webUI page actually named Dashboard, but I think you may be using that word differently.

Looks like disk7 dropped and reconnected as a different device so Unraid lost track of it again.

Disabled/emulated disk7 is also unmountable. You should repair filesystem on the emulated disk before attempting to rebuild to the same disk. Better yet, rebuild to another disk and keeping the original will give more options to recover.

I currently have brand new 10TB Seagate ironwolfs in disk 1 and 7 and it's running a rebuild. Disk 7 is still showing unmountable: unknown or no file system message though. Seems like my disk 7 is really messed up.

itimpi · December 3, 2022

5 hours ago, dzyuba86 said:

Disk 7 is still showing unmountable: unknown or no file system message though.

A rebuild will not correct this state - if an emulated disk is showing as unmountable then the rebuilt disk will be the same.

the correct handling of unmountable disks is covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page.

dzyuba86 · December 4, 2022

19 hours ago, itimpi said:

A rebuild will not correct this state - if an emulated disk is showing as unmountable then the rebuilt disk will be the same.

the correct handling of unmountable disks is covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page.

I've done a check with -nv parameter several times now. Not sure what else to do to make it stop showing as an unmountable disk.

itimpi · December 4, 2022

2 hours ago, dzyuba86 said:

I've done a check with -nv parameter several times now. Not sure what else to do to make it stop showing as an unmountable disk.

You need to run without the -n option if you want any changes to be made - the -n (no modify) option makes it a read check only. If you post the output of a run using nv we can give you an idea of how well the repair would go.

dzyuba86 · December 4, 2022

Rebuild finished. Here's the diag log. Doing the check now.

plexnas-diagnostics-20221204-0954.zip

dzyuba86 · December 4, 2022

8 hours ago, itimpi said:

You need to run without the -n option if you want any changes to be made - the -n (no modify) option makes it a read check only. If you post the output of a run using nv we can give you an idea of how well the repair would go.

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
sb_icount 64, counted 32
sb_ifree 61, counted 29
sb_fdblocks 1952984849, counted 1952984853
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 4
- agno = 5
- agno = 6
- agno = 3
- agno = 2
- agno = 7
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
SB summary counter sanity check failed
Metadata corruption detected at 0x47a15b, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1
SB summary counter sanity check failed
Metadata corruption detected at 0x47a15b, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!

fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair.

dzyuba86 · December 4, 2022

Disk 6 claims to have 158 errors on the "Main" dashboard. Disk 7 is green light now but still saying unmountable no file system present.

dzyuba86 · December 4, 2022

This is disk 6 repair output

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

itimpi · December 4, 2022

That repair log initially started off very well in a form that would normally indicate a repair without any data loss. However you then got errors in Phase 7 that I have never seen before and the xfs_repair aborted. Not sure what this means. maybe somebody else might know?

Just in case it is RAM related it might be worth running memtest if you have not already done so.

itimpi · December 4, 2022

Just now, dzyuba86 said:

This is disk 6 repair output

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

This is what I expect when the repair has run perfectly and there is no potential data loss.

dzyuba86 · December 4, 2022

7 hours ago, itimpi said:

That repair log initially started off very well in a form that would normally indicate a repair without any data loss. However you then got errors in Phase 7 that I have never seen before and the xfs_repair aborted. Not sure what this means. maybe somebody else might know?

Just in case it is RAM related it might be worth running memtest if you have not already done so.

4 hours in now. No errors found yet. It's on pass#4 now. I doubt it'll find anything now. At this point I think my only option is to reformat the drive. lose the data and have radarr/sonarr re-acquire what I lose on that drive. I really have no other idea what I can do now. Also thinking that the 8TB drive that was in that slot is still good. SO I'll use it to replace disk 6 which is showing old age pre-fail errors.

Disk errors

Recommended Posts

dzyuba86

Link to comment

trurl

Link to comment

dzyuba86

Link to comment

dzyuba86

Link to comment

dzyuba86

Link to comment

trurl

Link to comment

trurl

Link to comment

dzyuba86

Link to comment

itimpi

Link to comment

dzyuba86

Link to comment

itimpi

Link to comment

dzyuba86

Link to comment

dzyuba86

Link to comment

dzyuba86

Link to comment

dzyuba86

Link to comment

itimpi

Link to comment

itimpi

Link to comment

dzyuba86

Link to comment

Join the conversation