woke up to 2 drives reporting errors at the same time. what to do next

lennygman · November 5, 2017

Hello,

Got a report about 2 drives having errors.. at the same time I stopped the array. Wondering what should be my next steps be? should I try to pre-clear? I think both drives show disabled. One was empty i think and other had few hundred Mb of data. Only have 1 Parity Dr.

tower-diagnostics-20171105-1504.zip

JorgeB · November 5, 2017

Both disks dropped offline, power down, check/replace cables on both, power back up and post new diags.

lennygman · November 5, 2017

Well, I just unplugged and replugged same cables and rebooted.. both drives still showing disabled. Let me try replace cables

tower-diagnostics-20171105-1611.zip

JorgeB · November 5, 2017

Only one is disable, the other is missing, unRAID doesn't disable more disks than the number of parity disks, but both are still not being detected, so either both died at the same time which is unlikely, or you have a cable, power supply or controller problem.

lennygman · November 5, 2017

Ok. Replaced the cables.. Looking much better... disk 5 is up and green no errors but disk 4, while showing no errors still shows with red x and i guess disabled

Disk 4 • WDC_WD20EADS-00S2B0_WD-WCAVY5770247 (sdf)

How do i enable it? Do i need to start array and rebuild it?

tower-diagnostics-20171105-1629.zip

lennygman · November 5, 2017

Smart Report for that unmountable disk 4

tower-smart-20171105-1630.zip

lennygman · November 5, 2017

I guess this is my only option to get disk4 going?

Unmountable disk present:

Disk 4 • WDC_WD20EADS-00S2B0_WD-WCAVY5770247 (sdf)

Format will create a file system in all Unmountable disks, discarding all data currently on those disks.
Yes I want to do this

JorgeB · November 5, 2017

wait a minute and don't format disk4

lennygman · November 5, 2017

this bad disk4 seems ok as far as Smarts. No errors when i spun it up and run short self-test.

Johnnie.black, btw thank you very much for helping me out. so do I format and start array to rebuild? or other options to just spin up into array as it was

tower-smart-20171105-1633.zip

JorgeB · November 5, 2017

1 minute ago, johnnie.black said:

wait a minute and don't format disk4

lennygman · November 5, 2017

I saw.. not formatting.. thank you

JorgeB · November 5, 2017

Before rebuilding check filesystem on the emulated disk4 (md4):

https://wiki.lime-technology.com/Check_Disk_Filesystems#Drives_formatted_with_XFS

If after this the emulated disk mounts and contents look correct rebuild to the same disk.

lennygman · November 5, 2017

root@Tower:/dev# xfs_repair -v /dev/md4
Phase 1 - find and verify superblock...
- block cache size set to 1503368 entries
Phase 2 - using internal log
- zero log...
* ERROR: mismatched uuid in log
* SB : fbf0b82a-a72f-4e97-b8c6-bbd3be245bf9
* log: 3481cfee-2864-430f-aa40-01ade1573116
zero_log: head block 503295 tail block 503291
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

But I think i screwed up.. I misread the instructions. I thought i needed to substitute with linux based drive ID.. in my case it is "sdf" for disk4. ( I only recently installed Ver6.0).

JorgeB · November 5, 2017

Use -L:

xfs_repair -vL /dev/md4

lennygman · November 5, 2017

There was a bunch of these messages about "Metadata corruption detected"

Metadata corruption detected at xfs_dir3_block block 0x135438/0x1000
libxfs_writebufr: write verifer failed on xfs_dir3_block bno 0x135438/0x1000
Metadata corruption detected at xfs_dir3_block block 0x658/0x1000
libxfs_writebufr: write verifer failed on xfs_dir3_block bno 0x658/0x1000
Metadata corruption detected at xfs_dir3_block block 0x650/0x1000
libxfs_writebufr: write verifer failed on xfs_dir3_block bno 0x650/0x1000
Metadata corruption detected at xfs_dir3_block block 0x58/0x1000
libxfs_writebufr: write verifer failed on xfs_dir3_block bno 0x58/0x1000
cache_purge: shake on cache 0x6b5080 left 3 nodes!?

XFS_REPAIR Summary Sun Nov 5 17:11:37 2017

Phase Start End Duration
Phase 1: 11/05 17:10:16 11/05 17:10:16
Phase 2: 11/05 17:10:16 11/05 17:10:38 22 seconds
Phase 3: 11/05 17:10:38 11/05 17:10:45 7 seconds
Phase 4: 11/05 17:10:45 11/05 17:10:45
Phase 5: 11/05 17:10:45 11/05 17:10:45
Phase 6: 11/05 17:10:45 11/05 17:10:45
Phase 7: 11/05 17:10:45 11/05 17:10:45

Total run time: 29 seconds
done
root@Tower:/dev/disk#

Drive still shows with red x / disabled / contents emulated

JorgeB · November 5, 2017

19 minutes ago, lennygman said:

Drive still shows with red x / disabled / contents emulated

That's expected, does the emulated disk mount now and do contents appear correct?

lennygman · November 5, 2017

Dont think so

when i start Array in regular mode still see this:

Unmountable disk present:

Disk 4 • WDC_WD20EADS-00S2B0_WD-WCAVY5770247 (sdf)

Format will create a file system in all Unmountable disks, discarding all data currently on those disks.
Yes I want to do this

lennygman · November 5, 2017

I started maintenance again and rerun the command

here is full output

md4.txt

JorgeB · November 5, 2017

There is a lot of metadata corruption, start the array one more time and if still unmountable it's likely unfixable, then best bet is to do a new config:

-Tools -> New Config -> Retain current configuration: All -> Apply

-if needed assign any missing disk(s)
-check "parity is already valid" before starting the array

-Start the array, check that all disks mount, if yes run a correcting parity check

lennygman · November 5, 2017

Sorry.. I want to confirm this and doublecheck to what I am doing.

There is a message:

This is a utility to reset the array disk configuration so that all disks appear as "New" disks, as if it were a fresh new server.

This is useful when you have added or removed multiple drives and wish to rebuild parity based on the new configuration.

Use the 'Retain current configuration' selection to populate the desired disk slots after the array has been reset. By default no disk slots are populated.

DO NOT USE THIS UTILITY THINKING IT WILL REBUILD A FAILED DRIVE - it will have the opposite effect of making it impossible to rebuild an existing failed drive - you have been warned!

So if disk4 is corrupted, is this the right process to rebuild parity and restore data on disk4? would format disk 4 and rebuild party be the same thing?

JorgeB · November 5, 2017

4 minutes ago, lennygman said:

would format disk 4 and rebuild party be the same thing?

No formatting is never part of a rebuild, it will delete all data on disk4 and update parity to reflect that.

New config also isn't usually part of a rebuild, but if the filesystem on the emulated disk can't be repaired the same will happen if you rebuild disk4 as is, so, and assuming disk4 is OK and it was disabled because of a cable or similar problem a new config will restore disk4 to how it was before being disabled, it's also currently your only option to get that data back.

lennygman · November 5, 2017

Yeah!! Disk 4 mounted showing green.....but i just noticed it has disk errors incrementing attached is a copy of diag

tower-diagnostics-20171105-1830.zip

JorgeB · November 5, 2017

SMART still looks good, but in case it's really a bad disk try to copy everything important to another disk(s), start with the most important data.

Copy from disk to disk, not from share to disk.

lennygman · November 5, 2017

Keeps showing 27 errors.. not really incrementing anymore. I spun down spun up.. did some reads from shares hosted there....still showing 27

I run Smarts self test it does show completed with errors.. so I guess failing smarts?

I got 2 empty, new drives of same size available.

tower-smart-20171105-1850.zip

JorgeB · November 6, 2017

5 minutes ago, lennygman said:

I run Smarts self test it does show completed with errors.. so I guess failing smarts?

There are no errors on SMART, at least not yet on the one you posted, still the disk could be failing, or it's a bad cable or whatever the initial problem was, either way and since you currently can't rebuild it best option would be to copy all important data to another disk and only after that try to confirm where the problem is.

woke up to 2 drives reporting errors at the same time. what to do next

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived