cybrey Posted February 12, 2018 Posted February 12, 2018 Have a new issue with my unraid server. I was away for a while so I powered down my server. On powering it up Disk9 (sdl) was unmountable and the shares weren't available. I ran a disk check on the missing disk and got the following; Phase 1 - find and verify superblock... superblock read failed, offset 562633482240, size 131072, ag 9, rval -1 fatal error -- Input/output error Reading around the forum it seems the suggested fix for this is to create a new config but I'd like to just double check this before I move any further. Diagnostics attached below. Many thanks in advance. tower-diagnostics-20180212-1409.zip
JorgeB Posted February 12, 2018 Posted February 12, 2018 There's what looks like a problem with the disk, it dropped offline and was disabled, so there's no SMART report, reboot and post new diags
cybrey Posted February 12, 2018 Author Posted February 12, 2018 rebooted and attached. tower-diagnostics-20180212-1457.zip
JorgeB Posted February 12, 2018 Posted February 12, 2018 I was wrong, only looked at the file size, Disk9 has SMART disable, you need to enable it first: smartctl -s on /dev/sdl Then grab and post new diags. Disk4 is failing, this will make the rebuild of disk9 challenging, do you have notifications enable? Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 199 199 051 Pre-fail Always - 49109 5 Reallocated_Sector_Ct 0x0033 057 057 140 Pre-fail Always FAILING_NOW 1143 96 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 627 197 Current_Pending_Sector 0x0032 199 192 000 Old_age Always - 185 198 Offline_Uncorrectable 0x0030 200 198 000 Old_age Offline - 1 200 Multi_Zone_Error_Rate 0x0008 190 001 000 Old_age Offline - 2135 Disk8 needs a new SATA cable:
cybrey Posted February 12, 2018 Author Posted February 12, 2018 and new diagnostics. Just out of interest, how do you need 8 needs a new SATA cable? Its in the same cage as 9 which is really inaccessible. tower-diagnostics-20180212-1528.zip
JorgeB Posted February 12, 2018 Posted February 12, 2018 14 minutes ago, cybrey said: and new diagnostics. SMART is still disable. 10 minutes ago, cybrey said: Just out of interest, how do you need 8 needs a new SATA cable? These: Feb 12 14:49:30 Tower kernel: ata9.00: exception Emask 0x10 SAct 0x0 SErr 0x400000 action 0x6 frozen Feb 12 14:49:30 Tower kernel: ata9.00: irq_stat 0x08000000, interface fatal error Feb 12 14:49:30 Tower kernel: ata9: SError: { Handshk } Feb 12 14:49:30 Tower kernel: ata9.00: failed command: WRITE DMA Feb 12 14:49:30 Tower kernel: ata9.00: cmd ca/00:10:d0:d2:00/00:00:00:00:00/e0 tag 20 dma 8192 out Feb 12 14:49:30 Tower kernel: res 50/00:00:df:d2:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error) Feb 12 14:49:30 Tower kernel: ata9.00: status: { DRDY } Feb 12 14:49:30 Tower kernel: ata9: hard resetting link Feb 12 14:49:30 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Feb 12 14:49:30 Tower kernel: ata9.00: configured for UDMA/133 Feb 12 14:49:30 Tower kernel: ata9: EH complete plus this: 199 UDMA_CRC_Error_Count 0x0032 200 198 000 Old_age Always - 295 Makes it very likely it's a cable/connection issue, but keep monitoring that attribute, if it increases there's a problem.
cybrey Posted February 12, 2018 Author Posted February 12, 2018 Really strange.. the command worked; and the GUI confirms it's enabled; I've powered the box off and on, and attached new diagnostics. tower-diagnostics-20180212-1554.zip
JorgeB Posted February 12, 2018 Posted February 12, 2018 hmm, maybe the diags don't work because this disk is disabled, try this instead: smartctl -s on /dev/sdl Then: smartctl -a /dev/sdl and post the output
cybrey Posted February 12, 2018 Author Posted February 12, 2018 Apologies, several screenshots. Couldn't think of an easy way of getting a file off the machine;
JorgeB Posted February 12, 2018 Posted February 12, 2018 Disk looks fine, there's one CRC error, not a big deal but might indicate the cable is a problem and reason why it got disabled. The problem is that since you only have one parity disk and disk4 is failing the rebuild will most likely result on some (or a lot) of corrupted files, and you should't rebuild on top of the old disk, only on a new disk, alternatively and if no new files are on that disk since it got disabled the best way forward would be a new config without disk4 (or a new one in its place), then copying everything you can from old disk4.
cybrey Posted February 12, 2018 Author Posted February 12, 2018 I'm a little confused as to why Disk9 is disabled yet the issues are appearing on Disk4? The machine has been off since this issue appeared and I've not been able to access the shares, so no new files should have been created. Disk 4 is pretty small so I'm happy to lose it completely. Whats are the steps for pulling disk4 out of the array and then attempting to recover data from it?
JorgeB Posted February 12, 2018 Posted February 12, 2018 3 minutes ago, cybrey said: I'm a little confused as to why Disk9 is disabled yet the issues are appearing on Disk4? Two separate and unrelated issues. 3 minutes ago, cybrey said: Whats are the steps for pulling disk4 out of the array and then attempting to recover data from it? -Tools -> New Config -> Retain current configuration: All -> Apply -assign any missing disk(s) -unassign disk4 (you can leave slot 4 empty or assign one of the other disks to it) -start array to begin parity sync Disk9 will likely mount, but if still unmountable don't format, wait for the sync to finish and run xfs_repair in the end. When done use the UD plugin to mount disk4 and copy everything you can to the array.
cybrey Posted February 12, 2018 Author Posted February 12, 2018 Disk9 mounted without any issues and the parity is rebuilding. Many thanks as ever for all your help, greatly appreciated. Wish I could work out why I've been having so many issues recently.
JorgeB Posted February 12, 2018 Posted February 12, 2018 Except for disk4 which is really failing most of your issues appear to be cable related, recommend you update to v6.4.1, make sure notifications are enable, an you'll be warned about CRC errors, usually a sign of a bad cable, acknowledge any existing values since this attribute never resets, and if it increases for any disk there's still a problem, likely the SATA cable but it can also be the backplane, controller port or in very rare cases the disk itself.
cybrey Posted February 13, 2018 Author Posted February 13, 2018 Raid parity rebuilt successfully and I've updated to the latest version. However the UD plugin is stuck at this;
JorgeB Posted February 13, 2018 Posted February 13, 2018 Try rebooting and make sure you have the latest plugin version, it still the same post in the UD support thread, don't forget to post your diags.
cybrey Posted February 14, 2018 Author Posted February 14, 2018 Just completed reboot, still in the same state. Diagnostics attached. tower-diagnostics-20180214-0933.zip
JorgeB Posted February 14, 2018 Posted February 14, 2018 You need to post on the UD plugin support thread, with the diagnostcis. P.S. there are still ATA errors on disk8 and CRC error count increased from 295 to 300, likely a bad SATA cable: 199 UDMA_CRC_Error_Count -O--CK 200 198 000 - 300
cybrey Posted February 14, 2018 Author Posted February 14, 2018 ok, will post on the UD plugin support page and get that cable swapped out.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.