[SOLVED] Changed Mobo and CPU. Two Data Drives Now Un-Mountable.

AnnHashaway · August 27, 2017

I switched from an older XEON setup to an AMD FX CPU and corresponding motherboard. After moving all the drives over, Unraid booted up as normal, but two of the disk drives are labeled as un-mountable. Every indicator is green, and there was no sign of corruption that I could tell.

Current Setup:

Parity - 8TB WD Red
Drive 1 - 4TB WD Red (UN-MOUNTABLE)
Drive 2 - 4 TB WD Red
Drive 3 - 4 TB WD Red (UN-MOUNTABLE)
Cache - 240GB Crucial SSD

UNRAID 6.2.4
GIGABYTE GA-990FXA-UD3 Motherboard
AMD FX-8370 Black Edition 8 Core CPU

After moving everything to the new box, I did have an additional SSD with a Windows 10 install on it. Win10 did boot up a few times while initially getting everything set up, so I don't know if Windows does anything to the drives in the process.

What I Have Done:

Swapped power cables to each drive
Swapped SATA cables to each drive
Run xfs_repair through the GUI, with and without the -L flag.

Error I Received Without -L :

Quote

verified secondary superblock...
writing modified primary superblock
- block cache size set to 1478152 entries
sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97
resetting superblock realtime bitmap ino pointer to 97
sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98
resetting superblock realtime summary ino pointer to 98
Phase 2 - using internal log
- zero log...
Log inconsistent (didn't find previous header)
failed to find log head
zero_log: cannot find log head/tail (xlog_find_tail=5)

fatal error -- ERROR: The log head and/or tail cannot be discovered. Attempt to mount the
filesystem to replay the log or use the -L option to destroy the log and
attempt a repair.

Error I Received With -L :

Quote

Phase 1 - find and verify superblock...
- block cache size set to 1478152 entries
sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97
resetting superblock realtime bitmap ino pointer to 97
sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98
resetting superblock realtime summary ino pointer to 98
Phase 2 - using internal log
- zero log...
Log inconsistent (didn't find previous header)
failed to find log head
zero_log: cannot find log head/tail (xlog_find_tail=5)
Log inconsistent (didn't find previous header)
empty log check failed

fatal error -- failed to clear log

Diagnostics Attached

Anyone know what my options are here? Thank you.

diagnostics-20170827-1038.zip

SSD · August 27, 2017

I would post the exact motherboard model you have in case anyone has any experience.

This is obviously very serious. You have two drives that are unmountable. The data on both of those drives is at risk.

I would recommend no writes to the array. You could run a short UNCORRECTING parity check. Just 1 minute is enough. I'd be curious if the disks are showing parity sync errors. Since it is a read-only operation, it should cause no harm.

There is a tool called "testdisk" that people have had some luck with in correcting corrupted partitions. I would expect that might be the way forward. But suggest waiting for others to confirm and give any other options.

I had an issue the other day and downloaded this package for Slackware, but I wound up not needing to use it. Other references on the forum require you boot some other Linux to run the tool, but I expect this will work just fine. The link I found is here:

Download from https://slackware.pkgs.org/14.2/slackers/testdisk-7.0-x86_64-8cf.txz.html

Testdisk wiki http://www.cgsecurity.org/wiki/TestDisk_Step_By_Step

Once you dl it, suggest putting on your flash, say in a "/packages" folder.

To install it run the command:

installpkg /boot/packages/testdisk-7.0-x86_64-8cf.txz

I installed it successfully yesterday.

But while writing this post, I just tried to run testdisk and found it is missing a library. Maybe someone can explain how to install this library (see message below).

testdisk: error while loading shared libraries: libewf.so.2: cannot open shared object file: No such file or directory

I mentioned I didn't need it, and thought I would explain why in case future readers are interested. I had a disk that was showing unmountable but reason was it was already mounted elsewhere and had an open file on it. (I was moving an unassigned device that I had prepared outside the array to the array, and forgot to close it out completely.) I was able to stop the array, kill the process that had the open file, and unmount the disk from its prior mountpoint. I then started the array and the disk mounted normally. So unmountable does not always mean corrupted, but in your case it sounds like you have partition corruption. Anyone reading with an unmountable situation, make sure that the disk is not unmountable for some other reason before trying testdisk!

Good luck. Referencing a few other forum experts that often chime in on these types of problems and may have another idea how to proceed ... ( @johnnie.black, @jonathanm @itimpi)

JorgeB · August 27, 2017

Post your diagnostics.

AnnHashaway · August 27, 2017

I have updated the post with additional information:

CPU and Motherboard
Diagnostics attached
Additional steps I have tried (swapping cables)

Still plugging way

AnnHashaway · August 27, 2017

1 hour ago, bjp999 said:

I would recommend no writes to the array. You could run a short UNCORRECTING parity check. Just 1 minute is enough. I'd be curious if the disks are showing parity sync errors. Since it is a read-only operation, it should cause no harm.

I ran an un-correcting parity check for 5 minutes. Here was the output:

Quote

Total size: 8 TB
Elapsed time: 5 minutes
Current position: 49.9 GB (0.6 %)
Estimated speed: 167.2 MB/sec
Estimated finish: 13 hours, 13 minutes
Sync errors detected: 24

I also received these two email notifications:

Quote

Event: unRAID Parity disk error
Subject: Alert [SERVER] - Parity disk in error state (disk missing)
Description: No device identification ()
Importance: alert

Quote

Event: unRAID Disk 1 error
Subject: Alert [DENALI] - Disk 1 in error state (disk missing)
Description: No device identification ()
Importance: alert

JorgeB · August 27, 2017

Both problem disks are connected on SATA ports configured for IDE mode, got to your bios and change all SATA ports to AHCI.

AnnHashaway · August 27, 2017

18 minutes ago, johnnie.black said:

Both problem disks are connected on SATA ports configured for IDE mode, got to your bios and change all SATA ports to AHCI.

The MOBO has six SATA Connections - 0,1,2,3,4,5. Problem disks were plugged into 4 and 5.

What I Tried:

In BIOS I found OnChip SATA Port 4/5 Type and changed it from IDE to As SATA TYPE (Only other option) - DID NOT WORK
Swapped one of the problem drives to SATA connection 2 - DID NOT WORK

I will continue trying BIOS options and post results.

JorgeB · August 27, 2017

Make sure all disks are on ports using AHCI (post new diags if in doubt), update to v6.3.5. since it has many xfs fixes, and run xfs_repair again.

AnnHashaway · August 27, 2017

New diagnostics attached.

diagnostics-20170827-1249.zip

JorgeB · August 27, 2017

All ports are configure correctly now, upgrade to v.6.3.5 and run xfs_repair again.

AnnHashaway · August 27, 2017

Update:

Moved one of the bad drives to SATA Connection 2
Changed OnChip SATA Port 4/5 Type and changed it from IDE to As SATA TYPE
Upgraded to 6.3.5
Ran xfs_repair with -v flag - Received this message:

Quote

Phase 1 - find and verify superblock...
- block cache size set to 1478144 entries
sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97
resetting superblock realtime bitmap ino pointer to 97
sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98
resetting superblock realtime summary ino pointer to 98
Phase 2 - using internal log
- zero log...
Log inconsistent (didn't find previous header)
failed to find log head
zero_log: cannot find log head/tail (xlog_find_tail=5)

fatal error -- ERROR: The log head and/or tail cannot be discovered. Attempt to mount the
filesystem to replay the log or use the -L option to destroy the log and
attempt a repair.

Now I am currently running xfs_repair -vL on one of the drives. Yesterday this took about 2 hours to complete. Will update.

Thank you very much for your assistance @johnnie.black

AnnHashaway · August 27, 2017

Update:

xfs_repair is currently running with -vL flag. Yesterday when running this on 6.2.4, the system read the entire disk and didn't write more than about 2000 according to the MAIN page. There seems to be a lot more happening today on 6.3.5 with the same flags. The total writes are about 60% of the total reads, and it appears to be writing the exact same number of times to the parity drive. Additionally, the output is MUCH more detailed than yesterday.

Here is a snapshot of the current output: https://pastebin.com/Adj6zMCm

Will update when complete.

JorgeB · August 27, 2017

Problem was mostly likely caused by the IDE mode, it should only be slower than AHCI but for it's not the first time I've seen it causing serious issues on those AMD boards, filesystem looks very corrupt, but maybe xfs_repair can still repair it, good luck!

AnnHashaway · August 27, 2017

SUCCESS!!!

All drives are up and running! I can access the shares, and will start performing tests to make sure all the data is available. You @johnnie.black are a scholar and a saint!

One side effect I am noticing, however, is all of my docker containers are gone. They have actually been gone since I migrated everything over, but I was hoping they would show back up when the drives mounted correctly. Short of reinstalling them with the old configs, is there something I should be doing to get them to populate, or do I need to reinstall them? It looks like all the config files are good, plus I have weekly backups I could pull from.

I will update the post with the solution, once I get the docker issue figured out.

Thank you.

JorgeB · August 27, 2017

If appdata is OK delete and recreate your docker image:

AnnHashaway · August 27, 2017

OK, so docker containers were relatively easy to restore as they were before.

The only issue I see at this point, is there seems to be quite a bit of data loss between the two drives. I can't tell if this is permanent, or just a glitch. There are lots of files missing from nearly every directly, and it seems to be centered around the drives that were impacted. Its certainly manageable, but before I close the book I just want to make sure I can't retrieve these.

Any ideas?

JorgeB · August 27, 2017

I'm not surprised given the level of corruption, look for a lost+found folder on those disks.

[SOLVED] Changed Mobo and CPU. Two Data Drives Now Un-Mountable.

Recommended Posts

AnnHashaway

Link to comment

SSD

Link to comment

JorgeB

Link to comment

AnnHashaway

Link to comment

AnnHashaway

Link to comment

JorgeB

Link to comment

AnnHashaway

Link to comment

JorgeB

Link to comment

AnnHashaway

Link to comment

JorgeB

Link to comment

AnnHashaway

Link to comment

AnnHashaway

Link to comment

JorgeB

Link to comment

AnnHashaway

Link to comment

JorgeB

Link to comment

AnnHashaway

Link to comment

JorgeB

Link to comment

Archived

Total size:	8 TB
Elapsed time:	5 minutes
Current position:	49.9 GB (0.6 %)
Estimated speed:	167.2 MB/sec
Estimated finish:	13 hours, 13 minutes
Sync errors detected:	24