Jump to content

Superblock failure


cybrey

Recommended Posts

Posted

Have a new issue with my unraid server. 

 

I was away for a while so I powered down my server. On powering it up Disk9 (sdl) was unmountable and the shares weren't available. 

 

I ran a disk check on the missing disk and got the following; 

Phase 1 - find and verify superblock...
superblock read failed, offset 562633482240, size 131072, ag 9, rval -1

fatal error -- Input/output error

 

Reading around the forum it seems the suggested fix for this is to create a new config but I'd like to just double check this before I move any further.

 

Diagnostics attached below. 

 

Many thanks in advance. 

 

 

 

 

 

 

tower-diagnostics-20180212-1409.zip

Posted

I was wrong, only looked at the file size, Disk9 has SMART disable, you need to enable it first:

 

smartctl -s on /dev/sdl

 

Then grab and post new diags.

 

 

Disk4 is failing, this will make the rebuild of disk9 challenging, do you have notifications enable?

 

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   199   199   051    Pre-fail  Always       -       49109
  5 Reallocated_Sector_Ct   0x0033   057   057   140    Pre-fail  Always   FAILING_NOW 1143
 96 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       627
197 Current_Pending_Sector  0x0032   199   192   000    Old_age   Always       -       185
198 Offline_Uncorrectable   0x0030   200   198   000    Old_age   Offline      -       1
200 Multi_Zone_Error_Rate   0x0008   190   001   000    Old_age   Offline      -       2135

Disk8 needs a new SATA cable:

 

 

Posted
14 minutes ago, cybrey said:

and new diagnostics. 

SMART is still disable.

 

10 minutes ago, cybrey said:

Just out of interest, how do you need 8 needs a new SATA cable?

These:

 

Feb 12 14:49:30 Tower kernel: ata9.00: exception Emask 0x10 SAct 0x0 SErr 0x400000 action 0x6 frozen
Feb 12 14:49:30 Tower kernel: ata9.00: irq_stat 0x08000000, interface fatal error
Feb 12 14:49:30 Tower kernel: ata9: SError: { Handshk }
Feb 12 14:49:30 Tower kernel: ata9.00: failed command: WRITE DMA
Feb 12 14:49:30 Tower kernel: ata9.00: cmd ca/00:10:d0:d2:00/00:00:00:00:00/e0 tag 20 dma 8192 out
Feb 12 14:49:30 Tower kernel:         res 50/00:00:df:d2:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)
Feb 12 14:49:30 Tower kernel: ata9.00: status: { DRDY }
Feb 12 14:49:30 Tower kernel: ata9: hard resetting link
Feb 12 14:49:30 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 12 14:49:30 Tower kernel: ata9.00: configured for UDMA/133
Feb 12 14:49:30 Tower kernel: ata9: EH complete

plus this:

 

199 UDMA_CRC_Error_Count    0x0032   200   198   000    Old_age   Always       -       295

Makes it very likely it's a cable/connection issue, but keep monitoring that attribute, if it increases there's a problem.

 

Posted

Disk looks fine, there's one CRC error, not a big deal but might indicate the cable is a problem and reason why it got disabled.

 

The problem is that since you only have one parity disk and disk4 is failing the rebuild will most likely result on some (or a lot) of corrupted files, and you should't rebuild on top of the old disk, only on a new disk, alternatively and if no new files are on that disk since it got disabled the best way forward would be a new config without disk4 (or a new one in its place), then copying everything you can from old disk4.

 

 

 

 

 

 

Posted

I'm a little confused as to why Disk9 is disabled yet the issues are appearing on Disk4?

 

The machine has been off since this issue appeared and I've not been able to access the shares, so no new files should have been created. 

 

Disk 4 is pretty small so I'm happy to lose it completely. Whats are the steps for pulling disk4 out of the array and then attempting to recover data from it?

Posted
3 minutes ago, cybrey said:

I'm a little confused as to why Disk9 is disabled yet the issues are appearing on Disk4?

Two separate and unrelated issues.

 

3 minutes ago, cybrey said:

Whats are the steps for pulling disk4 out of the array and then attempting to recover data from it?

-Tools -> New Config -> Retain current configuration: All -> Apply
-assign any missing disk(s)

-unassign disk4 (you can leave slot 4 empty or assign one of the other disks to it)
-start array to begin parity sync

 

Disk9 will likely mount, but if still unmountable don't format, wait for the sync to finish and run xfs_repair in the end.

 

When done use the UD plugin to mount disk4 and copy everything you can to the array.

Posted

Disk9 mounted without any issues and the parity is rebuilding. 

 

Many thanks as ever for all your help, greatly appreciated. 

 

Wish I could work out why I've been having so many issues recently. 

 

Posted

Except for disk4 which is really failing most of your issues appear to be cable related, recommend you update to v6.4.1, make sure notifications are enable, an you'll be warned about CRC errors, usually a sign of a bad cable, acknowledge any existing values since this attribute never resets, and if it increases for any disk there's still a problem, likely the SATA cable but it can also be the backplane, controller port or in very rare cases the disk itself.

Posted

You need to post on the UD plugin support thread, with the diagnostcis.

 

P.S. there are still ATA errors on disk8 and CRC error count increased from 295 to 300, likely a bad SATA cable:

 

199 UDMA_CRC_Error_Count    -O--CK   200   198   000    -    300

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...