Disk errors


dzyuba86
Go to solution Solved by dzyuba86,

Recommended Posts

I recently added 2 new 10TB drives then moved 2 drives to a cooler location in my tower since I had the time to take the array offline and fiddle with it. Now I have 2 drives saying they're unmountable: wrong or no file system. One drive says device disable contents emulated and the other one says normal operation. Red X and green dot respectively. I am at a loss as to why this happened. I already shut down and tried new Sata cables and tried a different sata port. I still get the same error message. I have the diagnostic log attached. Not sure how to fix this.

plexnas-diagnostics-20221130-0919.zip

Link to comment
9 minutes ago, trurl said:

SMART report for both disks looks OK but no SMART tests have been run on either. Possibly you already fixed the hardware problem when you worked on the connections.

 

Check filesystem on each. Be sure to capture the output so you can post it.

 

Not sure if it'll have it there but in short it said something about a bunch of errors but didn't give me any options on what steps to take for repair.

 

plexnas-diagnostics-20221130-0951.zip

Link to comment

Mostly looks like this


Phase 1 - find and verify superblock...
        - block cache size set to 686216 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 119596 tail block 119596
        - scan filesystem freespace and inode maps...
Metadata CRC error detected at 0x43d440, xfs_agf block 0x1/0x200
agf has bad CRC for ag 0
Metadata CRC error detected at 0x468740, xfs_agi block 0x2/0x200
agi has bad CRC for ag 0
bad magic # 0x0 for agf 0
bad version # 0 for agf 0
bad length 0 for agf 0, should be 268435455
bad uuid 7e8511b0-2013-0ac6-166a-e877aae91f03 for agf 0
bad magic # 0x0 for agi 0
bad version # 0 for agi 0
bad length # 0 for agi 0, should be 268435455
bad uuid 7e8511b0-2013-0ac6-166a-e877aae91f03 for agi 0
would reset bad agf for ag 0
would reset bad agi for ag 0
bad uncorrected agheader 0, skipping ag...
sb_icount 11840, counted 11008
sb_fdblocks 50924179, counted 49052402
root inode chunk not found
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
Metadata corruption detected at 0x4379a3, xfs_inode block 0x80/0x4000
Metadata corruption detected at 0x4379a3, xfs_inode block 0xa0/0x4000
bad CRC for inode 128
bad magic number 0x0 on inode 128
bad version number 0x0 on inode 128
bad next_unlinked 0x0 on inode 128
inode identifier 0 mismatch on inode 128
bad CRC for inode 129...........

bad CRC for inode 191
bad magic number 0x0 on inode 191
bad version number 0x0 on inode 191
bad next_unlinked 0x0 on inode 191
inode identifier 0 mismatch on inode 191
bad CRC for inode 128, would rewrite
bad magic number 0x0 on inode 128, would reset magic number
bad version number 0x0 on inode 128, would reset version number
bad next_unlinked 0x0 on inode 128, would reset next_unlinked
inode identifier 0 mismatch on inode 128
would clear root inode 128.........

would have cleared inode 158
bad CRC for inode 159, would rewrite
bad magic number 0x0 on inode 159, would reset magic number
bad version number 0x0 on inode 159, would reset version number
bad next_unlinked 0x0 on inode 159, would reset next_unlinked
inode identifier 0 mismatch on inode 159
would have cleared inode 159
imap claims inode 160 is present, but inode cluster is sparse, correcting imap
imap claims inode 161 is present, but inode cluster is sparse, correcting imap
imap claims inode 162 is present, but inode cluster is sparse, correcting imap
imap claims inode 163 is present, but inode cluster is sparse, correcting imap......


 

No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Wed Nov 30 06:49:18 2022

Phase        Start        End        Duration
Phase 1:    11/30 06:46:57    11/30 06:47:14    17 seconds
Phase 2:    11/30 06:47:14    11/30 06:47:15    1 second
Phase 3:    11/30 06:47:15    11/30 06:49:17    2 minutes, 2 seconds
Phase 4:    11/30 06:49:17    11/30 06:49:17
Phase 5:    Skipped
Phase 6:    11/30 06:49:17    11/30 06:49:18    1 second
Phase 7:    11/30 06:49:18    11/30 06:49:18

Total run time: 2 minutes, 21 seconds


I ran the check as a -nv

Link to comment

The other drive seems worse. 


Phase 1 - find and verify superblock...
        - block cache size set to 701112 entries
Phase 2 - using internal log
        - zero log...
totally zeroed log
zero_log: head block 0 tail block 0
        - scan filesystem freespace and inode maps...
Metadata CRC error detected at 0x44108d, xfs_bnobt block 0x8/0x1000
btree block 0/1 is suspect, error -74
bad magic # 0 in btbno block 0/1
Metadata CRC error detected at 0x44108d, xfs_cntbt block 0x10/0x1000
btree block 0/2 is suspect, error -74
bad magic # 0 in btcnt block 0/2
Metadata CRC error detected at 0x4728bd, xfs_refcountbt block 0x28/0x1000
btree block 0/5 is suspect, error -74
bad magic # 0 in refcount btree block 0/5
bad refcountbt block count 0, saw 1
agf_freeblks 268435437, counted 0 in ag 0
agf_longest 268435431, counted 0 in ag 0
Metadata CRC error detected at 0x46fd5d, xfs_inobt block 0x18/0x1000
btree block 0/3 is suspect, error -74
bad magic # 0 in inobt block 0/3
Metadata CRC error detected at 0x46fd5d, xfs_finobt block 0x20/0x1000
btree block 0/4 is suspect, error -74
bad magic # 0 in finobt block 0/4
agi_count 64, counted 0 in ag 0
agi_freecount 61, counted 0 in ag 0
agi_freecount 61, counted 0 in ag 0 finobt
sb_icount 64, counted 0
sb_ifree 61, counted 0
sb_fdblocks 1952984849, counted 1684549412
root inode chunk not found
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
Metadata corruption detected at 0x4379a3, xfs_inode block 0x80/0x4000
Metadata corruption detected at 0x4379a3, xfs_inode block 0xa0/0x4000
bad CRC for inode 128
bad magic number 0x0 on inode 128
bad version number 0x0 on inode 128
bad next_unlinked 0x0 on inode 128...
 

bad CRC for inode 158, would rewrite
bad magic number 0x0 on inode 158, would reset magic number
bad version number 0x0 on inode 158, would reset version number
bad next_unlinked 0x0 on inode 158, would reset next_unlinked
inode identifier 0 mismatch on inode 158
would have cleared inode 158
bad CRC for inode 159, would rewrite
bad magic number 0x0 on inode 159, would reset magic number
bad version number 0x0 on inode 159, would reset version number
bad next_unlinked 0x0 on inode 159, would reset next_unlinked
inode identifier 0 mismatch on inode 159
would have cleared inode 159
would rebuild corrupt refcount btrees.
No modify flag set, skipping phase 5
Inode allocation btrees are too corrupted, skipping phases 6 and 7
Maximum metadata LSN (1:161150) is ahead of log (0:0).
Would format log to cycle 4.
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Wed Nov 30 06:50:22 2022

Phase        Start        End        Duration
Phase 1:    11/30 06:50:22    11/30 06:50:22
Phase 2:    11/30 06:50:22    11/30 06:50:22
Phase 3:    11/30 06:50:22    11/30 06:50:22
Phase 4:    11/30 06:50:22    11/30 06:50:22
Phase 5:    Skipped
Phase 6:    Skipped
Phase 7:    Skipped
 

Link to comment
1 hour ago, trurl said:

Possibly you already fixed the hardware problem when you worked on the connections.

Apparently not.

...
Nov 30 06:48:28 PLEXNAS kernel: ata18: hard resetting link
Nov 30 06:48:28 PLEXNAS kernel: ata19: found unknown device (class 0)
Nov 30 06:48:32 PLEXNAS kernel: ata19: softreset failed (1st FIS failed)
...

and lots more.

 

Check connections, SATA and power, both ends, including splitters.

 

Then reboot and post new diagnostics.

Link to comment
3 hours ago, dzyuba86 said:

Would it make sense to pull the drive out, format through my sata enclosure with my laptop and put back in to rebuild or no?

No would not make sense. Doesn't matter how it behaves in another system, and doesn't matter whether a disk is formatted or cleared or completely full since rebuild will completely overwrite the disk.

 

We need to see diagnostics taken when the problems occur. Those earlier diagnostics I quoted from seemed to indicate some sort of connection issue.

Link to comment
2 minutes ago, trurl said:

Doesn't matter how it behaves in another system

I guess if it works in another system that would at least tell you something about the disk but not about the problem you are having with it on Unraid. No point in formatting it though. The format done on another system can't be used on Unraid, and rebuild doesn't care whether or how a drive is formatted as mentioned since it will completely overwrite whatever format is there.

 

A better idea would be to run diagnostics from the drive manufacturer, if your enclosure will support that.

Link to comment

I downloaded and ran seagate diagnostics tools. The 8TB drive passes the smart test and my laptop can see the drive no problem. Then 10TB drive comes up as uninitialized right away, fails to pass any smart test and I get a failed CRC error when I try to initialize the disc. Will check with warranty support tomorrow about a replacement. I have 1 more year of warranty on it. Still need to figure out what to do about the 8TB and why it isn't working.

Link to comment
10 hours ago, trurl said:

Try to use it again in Unraid and post diagnostics

 

 

It's currently running a re-build on the 8TB. The 10 is shot and I'm sending it out for a RMA. I'll pick one up at the local tech shop to rebuild since it'll take 1-2 weeks to get the warranty replacement. One of my 3TB drives is throwing pre-fail errors so I'll swap it with a 10. If rebuild fails I'll pull a diagnostic log and post.

All I did was deselect the disk from the slot then reassign it and it started the rebuild. Not sure if the smart test reset something on the drive or my deselect, spin up and select to rebuild did the trick.

Link to comment
1 hour ago, dzyuba86 said:

All I did was deselect the disk from the slot then reassign it and it started the rebuild.

If the array is started without a disk, then the disk is reassigned, it considers it a replacement for rebuild.

 

1 hour ago, dzyuba86 said:

One of my 3TB drives is throwing pre-fail errors

Which disk was that? Could compromise rebuild, and might be a reason emulation is unmountable.

 

Parity by itself can recover nothing. Parity just allows the contents of a missing disk to be calculated from the contents of all other disks.

 

https://wiki.unraid.net/Manual/Overview#Parity-Protected_Array

 

1 hour ago, dzyuba86 said:

1-2 weeks to get the warranty replacement

Since you have multiple disks with problems I suggest you not wait on that replacement and get another disk immediately to get your array stable again.

 

1 hour ago, dzyuba86 said:

If rebuild fails I'll pull a diagnostic log

Might be worth seeing diagnostics even if it succeeds since you have multiple issues.

 

Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected?

 

Do you have backups of anything important and irreplaceable?

Link to comment
27 minutes ago, dzyuba86 said:

Yes. I'm using three 1 to 4 power splitter. 

Molex or SATA splitters? Molex is better, handles more current. 

 

Rebuilds, parity checks require all disks simultaneously.

 

11 minutes ago, dzyuba86 said:

forgot which disk.

Do any disks show SMART warnings on the Dashboard page? Which? 

Link to comment
25 minutes ago, trurl said:

Molex or SATA splitters? Molex is better, handles more current. 

 

Rebuilds, parity checks require all disks simultaneously.

 

Do any disks show SMART warnings on the Dashboard page? Which? 

Sata splitter. I'm thinking to maybe go to the computer shop and see if they have better modular plugs for my power supply. No, no smart errors on dashboard which is super odd. Unless the error corrected itself?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.