Superblock failure


cybrey

Recommended Posts

Have a new issue with my unraid server. 

 

I was away for a while so I powered down my server. On powering it up Disk9 (sdl) was unmountable and the shares weren't available. 

 

I ran a disk check on the missing disk and got the following; 

Phase 1 - find and verify superblock...
superblock read failed, offset 562633482240, size 131072, ag 9, rval -1

fatal error -- Input/output error

 

Reading around the forum it seems the suggested fix for this is to create a new config but I'd like to just double check this before I move any further.

 

Diagnostics attached below. 

 

Many thanks in advance. 

 

 

 

 

 

 

tower-diagnostics-20180212-1409.zip

Link to comment

I was wrong, only looked at the file size, Disk9 has SMART disable, you need to enable it first:

 

smartctl -s on /dev/sdl

 

Then grab and post new diags.

 

 

Disk4 is failing, this will make the rebuild of disk9 challenging, do you have notifications enable?

 

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   199   199   051    Pre-fail  Always       -       49109
  5 Reallocated_Sector_Ct   0x0033   057   057   140    Pre-fail  Always   FAILING_NOW 1143
 96 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       627
197 Current_Pending_Sector  0x0032   199   192   000    Old_age   Always       -       185
198 Offline_Uncorrectable   0x0030   200   198   000    Old_age   Offline      -       1
200 Multi_Zone_Error_Rate   0x0008   190   001   000    Old_age   Offline      -       2135

Disk8 needs a new SATA cable:

 

 

  • Like 1
Link to comment
14 minutes ago, cybrey said:

and new diagnostics. 

SMART is still disable.

 

10 minutes ago, cybrey said:

Just out of interest, how do you need 8 needs a new SATA cable?

These:

 

Feb 12 14:49:30 Tower kernel: ata9.00: exception Emask 0x10 SAct 0x0 SErr 0x400000 action 0x6 frozen
Feb 12 14:49:30 Tower kernel: ata9.00: irq_stat 0x08000000, interface fatal error
Feb 12 14:49:30 Tower kernel: ata9: SError: { Handshk }
Feb 12 14:49:30 Tower kernel: ata9.00: failed command: WRITE DMA
Feb 12 14:49:30 Tower kernel: ata9.00: cmd ca/00:10:d0:d2:00/00:00:00:00:00/e0 tag 20 dma 8192 out
Feb 12 14:49:30 Tower kernel:         res 50/00:00:df:d2:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)
Feb 12 14:49:30 Tower kernel: ata9.00: status: { DRDY }
Feb 12 14:49:30 Tower kernel: ata9: hard resetting link
Feb 12 14:49:30 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 12 14:49:30 Tower kernel: ata9.00: configured for UDMA/133
Feb 12 14:49:30 Tower kernel: ata9: EH complete

plus this:

 

199 UDMA_CRC_Error_Count    0x0032   200   198   000    Old_age   Always       -       295

Makes it very likely it's a cable/connection issue, but keep monitoring that attribute, if it increases there's a problem.

 

  • Like 1
Link to comment

Disk looks fine, there's one CRC error, not a big deal but might indicate the cable is a problem and reason why it got disabled.

 

The problem is that since you only have one parity disk and disk4 is failing the rebuild will most likely result on some (or a lot) of corrupted files, and you should't rebuild on top of the old disk, only on a new disk, alternatively and if no new files are on that disk since it got disabled the best way forward would be a new config without disk4 (or a new one in its place), then copying everything you can from old disk4.

 

 

 

 

 

 

Link to comment

I'm a little confused as to why Disk9 is disabled yet the issues are appearing on Disk4?

 

The machine has been off since this issue appeared and I've not been able to access the shares, so no new files should have been created. 

 

Disk 4 is pretty small so I'm happy to lose it completely. Whats are the steps for pulling disk4 out of the array and then attempting to recover data from it?

Link to comment
3 minutes ago, cybrey said:

I'm a little confused as to why Disk9 is disabled yet the issues are appearing on Disk4?

Two separate and unrelated issues.

 

3 minutes ago, cybrey said:

Whats are the steps for pulling disk4 out of the array and then attempting to recover data from it?

-Tools -> New Config -> Retain current configuration: All -> Apply
-assign any missing disk(s)

-unassign disk4 (you can leave slot 4 empty or assign one of the other disks to it)
-start array to begin parity sync

 

Disk9 will likely mount, but if still unmountable don't format, wait for the sync to finish and run xfs_repair in the end.

 

When done use the UD plugin to mount disk4 and copy everything you can to the array.

  • Like 1
Link to comment

Except for disk4 which is really failing most of your issues appear to be cable related, recommend you update to v6.4.1, make sure notifications are enable, an you'll be warned about CRC errors, usually a sign of a bad cable, acknowledge any existing values since this attribute never resets, and if it increases for any disk there's still a problem, likely the SATA cable but it can also be the backplane, controller port or in very rare cases the disk itself.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.