Disk unmountable after rebuild


Recommended Posts

Help!

 

I noticed that one of the data disks was showing a red cross. I ran an extended SMART test with no faults detected. After rebuilding the drive, I noticed that the parity drive had become disabled (red cross) at some point during the rebuild process. The original data disk is now not mountable. The file system is XFS with a single parity drive and about 40TB in total capacity. No other drive is showing any issues. As a precaution, I re-seated all disks, connectors and HBA cards.

 

I received the following when I ran XFS_repair from the GUI:

 

>>>

ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

>>>

 

This is from the read/parity check history.

 

>>>>

Date                            /Duration    /Speed    /Status  /Errors

2021-04-09  15:55:44  27 min, 4 sec  3.7, GB/s  OK 1465130625

2021-02-01  06:56:03  18 hr, 3 min, 48 sec  92.3 MB/s  OK  0

2020-10-27  09:12:39  17 hr, 32 min, 20 sec  95.0 MB/s  OK  0

2020-09-22  04:44:56  16 hr, 58 min, 34 sec  98.2 MB/s  OK  0

>>>

 

That's a 6T drive which is about 95% full. Other than data that might have been written to the log from the rebuild process, nothing else has been written there for weeks. Not sure what my options are if the drive can't be mounted and the log replayed as suggested but, there is a substantial amount of data that I'd really like to keep if possible.

 

I've included the diagnostics from before the rebuild.

 

Thanks!

nas1-diagnostics-20210409-1217.zip

Link to comment

Yep, there were write errors on parity before the rebuild even begun:

 

Apr  9 15:28:31 NAS1 kernel: md: disk0 write error, sector=1953567832
Apr  9 15:28:31 NAS1 kernel: md: disk0 write error, sector=1953567840
Apr  9 15:28:31 NAS1 kernel: md: disk0 write error, sector=1953567848

 

So parity got disabled and the rebuild would be 100% corrupt:

 

Apr  9 15:28:40 NAS1 kernel: md: recovery thread: recon D12 ...
Apr  9 15:28:40 NAS1 kernel: md: recovery thread: multiple disk errors, sector=8
Apr  9 15:28:40 NAS1 kernel: md: recovery thread: multiple disk errors, sector=16

etc

 

Not clear to me how parity is enable again and disk12 disable in the 1st diags you posted, or are they old diags? If not, what happened after this:

 

On 4/11/2021 at 1:15 PM, Craigb said:

After rebuilding the drive, I noticed that the parity drive had become disabled (red cross) at some point during the rebuild process.

 

Link to comment

The rebuild completed with errors at 1555.

 

The first diagnostic file (nas1-diagnostics-20210409-1217) was after disk 12 was disabled, prior to the rebuild attempt.

 

The second diagnostic file (nas1-diagnostics-20210409-1458) is a continuation of the first diagnostic's syslog, after disk 12 was unassigned and reassigned, but still prior to the rebuild attempt.

 

The third file (syslog-20210409-161331) is the syslog from immediately after the rebuild. I did not get a diagnostic file at this point.

 

As of this morning, disk 12 is assigned but unmountable. The parity disk is disabled, red cross.

 

I'm doing a byte level image of the data disk to preserve whatever data might still be recoverable and barring any suggestions from the forum, will attempt to run XFS_repair.

 

All the syslogs from before the failure through to today are available.

 

Thanks!

Link to comment

Parity appears OK, if you haven't yet you should replace cables then you can try re-enabling parity to see if the emulated disk is better then the current one:

 

-Tools -> New Config -> Retain current configuration: All -> Apply
-Check all assignments are correct
-IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked)
-Stop array
-Unassign disk12
-Start array (in normal mode now), ideally the emulated disk will now mount and contents look correct, if it doesn't you should run a filesystem check on the emulated disk
-If the emulated disk mounts and contents look correct stop the array
-Data on current disk12 should be in very bad shape, but if you want to check it later rebuild using a new disk.

 

Link to comment

Success!

 

The array started without problem. The faulty disk is being emulated and the data appears to be completely intact. The replacement drive goes in this morning along with a second parity drive.

 

Many thanks for your assistance! Very much appreciated!

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.