Metadata CRC error detected...Unmount and run xfs_repair


Recommended Posts

I hope someone can assist me with this problem. Note that I am NOT very good with Linux, but I am fairly technical overall. I am very comfortable using Putty to SSH into my system and move around, using MC, and other basic stuff.

 

If you don't care about the entire E! True Hollywood back story, skip ahead to the quoted section.

 

I am a dum-dum and didn't bother to properly power my 4x Norco SS-550 5-in-3 units. I am not 100% sure that that's what started the problem, but I would bet on it. I was running great with 2 parity disks and 8 data disks for a while, then I added a 9th data disk. After a few days, I had a disk drop out of the array as I was trying to shut the array down to restart the system. I didn't think anything of it and re-enabled the disk and brought the array back online and it started a rebuild. About halfway into the rebuild of the disk, it dropped out again. I assumed the disk was failing and though, "no biggie, I've got two parity disks protecting my ass." I disabled the disk and ran the array degraded, down one disk. I immediately ordered a replacement disk (8 TB WD RED) and had it delivered same day (go Amazon!).

 

So then, COCKILY, I started writing new files to the array thinking it was no big deal. Almost immediately, party disk 2 dropped out of the array and parity 1 started reporting read errors!  I noticed that the activity lights on the parity disks were off, but the file kept writing to the array and I saw disk 9 with its activity light on. I immediately cancelled the file copy and then panicked. I should point out that all three disks (the failed disk 1 and both parity disks) were installed in the same 5-in-3. I would say this was likely a power issue.

 

Anywuts, I assumed that I was going to lose a small amount of data on disk 1 since it was running in an emulated state, a file was being written to disk 9, and neither parity disk was functioning. Sure enough, I did some testing and found that a bunch of TV episode files were missing in one season folder (routinely save off directory and file listings on all disks just in case something like this ever happened). After some playing around, I discovered that I could not write to that directory, or even delete it. The directory existed only on disk 1 BTW. I was able to move it and even rename it, but I cannot delete it. It keeps reappearing. I realize I may have some other files corrupted, but I will worry about all that later. It seems that the overwhelming amount of 7+ TB of movies and TV episodes are fine.

 

So I redid the power cabling to the Norco units so that there are no cables being split, and disks are evenly distributed across all four units. I also have a new PSU being delivered tomorrow (going from 550W to 750W). I received the replacement disk last night (even though the original failed disk 1 may actually be okay...that's also to be determined) and installed it to replace disk 1 and let the rebuild run all night and most of today (no, I did not do a preclear on it...I just wanted to get my data back on a data disk!). It ran with parity 1 active and parity 2 still out of the array (I even pulled it from my system temporarily so it didn't draw power). I was nervous all day wondering if it would complete and if I would lose all that data (in case the failed disk 1 that I pulled really is damaged). Thankfully it finished (20 hours!), and I am almost back in business. I am now protected with one parity disk.

 

So now I need to know how to fix the corruption issue. My syslog.txt is reporting it over and over, and I have at least that one directory that I cannot delete. I don't know how to check or fix file system corruption. Can I do it without invalidating the parity disk?

 

Aug 29 08:24:45 unRAID kernel: XFS (md1): metadata I/O error: block 0x100000048 ("xfs_trans_read_buf_map") error 74 numblks 8

Aug 29 08:24:51 unRAID kernel: XFS (md1): Metadata corruption detected at xfs_dir3_data_reada_verify+0x73/0x76, xfs_dir3_data_reada block 0x100000048

Aug 29 08:24:51 unRAID kernel: XFS (md1): Unmount and run xfs_repair

 

I'm assuming that I need to log into my box and run some commands. I just don't want to mess this up anymore than I already have. Thanks for any help and sorry for writing so much. I tried to describe exactly what sequence of events led me here in case it matters. The bottom line is that I have corruption on my newly rebuilt disk 1.

unraid-diagnostics-20160829-2015.zip

Link to comment

NM. I searched and found the answer. I think I am back in business. I started in the array in Maintenance mode and went into the settings for Disk1 and ran a file system scan and then repair. Afterwards, I was able to delete that directory, and the five files that were missing appeared in a directory called "lost+found". I am feeling so much better right now.

 

Can someone confirm for me if using xfs_repair in the GUI in maintenance mode maintains parity? Do I need to rebuild parity? I would assume that if it didn't, there would be a strongly worded warning about it...

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.