Jump to content

[SOLVED] Changed Mobo and CPU. Two Data Drives Now Un-Mountable.


AnnHashaway

Recommended Posts

I switched from an older XEON setup to an AMD FX CPU and corresponding motherboard. After moving all the drives over, Unraid booted up as normal, but two of the disk drives are labeled as un-mountable. Every indicator is green, and there was no sign of corruption that I could tell.

 

Current Setup:

  • Parity - 8TB WD Red
  • Drive 1 - 4TB WD Red (UN-MOUNTABLE)
  • Drive 2 - 4 TB WD Red
  • Drive 3 - 4 TB WD Red (UN-MOUNTABLE)
  • Cache - 240GB Crucial SSD

 

  • UNRAID 6.2.4
  • GIGABYTE GA-990FXA-UD3 Motherboard
  • AMD FX-8370 Black Edition 8 Core CPU

 

After moving everything to the new box, I did have an additional SSD with a Windows 10 install on it. Win10 did boot up a few times while initially getting everything set up, so I don't know if Windows does anything to the drives in the process.

 

What I Have Done:

  • Swapped power cables to each drive
  • Swapped SATA cables to each drive
  • Run xfs_repair through the GUI, with and without the -L flag.

 

Error I Received Without -L :

Quote

 

verified secondary superblock...
writing modified primary superblock
        - block cache size set to 1478152 entries
sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97
resetting superblock realtime bitmap ino pointer to 97
sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98
resetting superblock realtime summary ino pointer to 98
Phase 2 - using internal log
        - zero log...
Log inconsistent (didn't find previous header)
failed to find log head
zero_log: cannot find log head/tail (xlog_find_tail=5)

fatal error -- ERROR: The log head and/or tail cannot be discovered. Attempt to mount the
filesystem to replay the log or use the -L option to destroy the log and
attempt a repair.

 

 

Error I Received With -L : 
 

Quote

 

Phase 1 - find and verify superblock...
        - block cache size set to 1478152 entries
sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97
resetting superblock realtime bitmap ino pointer to 97
sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98
resetting superblock realtime summary ino pointer to 98
Phase 2 - using internal log
        - zero log...
Log inconsistent (didn't find previous header)
failed to find log head
zero_log: cannot find log head/tail (xlog_find_tail=5)
Log inconsistent (didn't find previous header)
empty log check failed

fatal error -- failed to clear log

 

 

Diagnostics Attached

 

Anyone know what my options are here? Thank you.

diagnostics-20170827-1038.zip

Link to comment

I would post the exact motherboard model you have in case anyone has any experience.

 

This is obviously very serious. You have two drives that are unmountable. The data on both of those drives is at risk.

 

I would recommend no writes to the array. You could run a short UNCORRECTING parity check. Just 1 minute is enough. I'd be curious if the disks are showing parity sync errors. Since it is a read-only operation, it should cause no harm.

 

There is a tool called "testdisk" that people have had some luck with in correcting corrupted partitions. I would expect that might be the way forward. But suggest waiting for others to confirm and give any other options.

 

I had an issue the other day and downloaded this package for Slackware, but I wound up not needing to use it. Other references on the forum require you boot some other Linux to run the tool, but I expect this will work just fine. The link I found is here:

 

Download from https://slackware.pkgs.org/14.2/slackers/testdisk-7.0-x86_64-8cf.txz.html

Testdisk wiki http://www.cgsecurity.org/wiki/TestDisk_Step_By_Step

 

Once you dl it, suggest putting on your flash, say in a "/packages" folder.

 

To install it run the command: 

 

installpkg /boot/packages/testdisk-7.0-x86_64-8cf.txz

 

I installed it successfully yesterday.

 

But while writing this post, I just tried to run testdisk and found it is missing a library. Maybe someone can explain how to install this library (see message below).

testdisk: error while loading shared libraries: libewf.so.2: cannot open shared object file: No such file or directory

 

I mentioned I  didn't need it, and thought I would explain why in case future readers are interested. I had a disk that was showing unmountable but reason was it was already mounted elsewhere and had an open file on it. (I was moving an unassigned device that I had prepared outside the array to the array, and forgot to close it out completely.) I was able to stop the array, kill the process that had the open file, and unmount the disk from its prior mountpoint. I then started the array and the disk mounted normally. So unmountable does not always mean corrupted, but in your case it sounds like you have partition corruption. Anyone reading with an unmountable situation, make sure that the disk is not unmountable for some other reason before trying testdisk!

 

Good luck. Referencing a few other forum experts that often chime in on these types of problems and may have another idea how to proceed ... ( @johnnie.black, @jonathanm @itimpi)

Link to comment
1 hour ago, bjp999 said:

I would recommend no writes to the array. You could run a short UNCORRECTING parity check. Just 1 minute is enough. I'd be curious if the disks are showing parity sync errors. Since it is a read-only operation, it should cause no harm.

 

I ran an un-correcting parity check for 5 minutes. Here was the output:

Quote


Total size: 8 TB  
Elapsed time: 5 minutes  
Current position: 49.9 GB (0.6 %)  
Estimated speed: 167.2 MB/sec  
Estimated finish: 13 hours, 13 minutes  
Sync errors detected:  24

 

 

I also received these two email notifications:

 

Quote

Event: unRAID Parity disk error
Subject: Alert [SERVER] - Parity disk in error state (disk missing)
Description: No device identification ()
Importance: alert

 

Quote

Event: unRAID Disk 1 error
Subject: Alert [DENALI] - Disk 1 in error state (disk missing)
Description: No device identification ()
Importance: alert

 

Link to comment
18 minutes ago, johnnie.black said:

Both problem disks are connected on SATA ports configured for IDE mode, got to your bios and change all SATA ports to AHCI.

 

The MOBO has six SATA Connections - 0,1,2,3,4,5. Problem disks were plugged into 4 and 5. 

 

What I Tried: 

  • In BIOS I found OnChip SATA Port 4/5 Type and changed it from IDE to As SATA TYPE (Only other option) - DID NOT WORK
  • Swapped one of the problem drives to SATA connection 2 - DID NOT WORK

I will continue trying BIOS options and post results. 

Link to comment

Update:

  • Moved one of the bad drives to SATA Connection 2
  • Changed OnChip SATA Port 4/5 Type and changed it from IDE to As SATA TYPE
  • Upgraded to 6.3.5
  • Ran xfs_repair with -v flag - Received this message:
Quote


Phase 1 - find and verify superblock...
        - block cache size set to 1478144 entries
sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97
resetting superblock realtime bitmap ino pointer to 97
sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98
resetting superblock realtime summary ino pointer to 98
Phase 2 - using internal log
        - zero log...
Log inconsistent (didn't find previous header)
failed to find log head
zero_log: cannot find log head/tail (xlog_find_tail=5)

fatal error -- ERROR: The log head and/or tail cannot be discovered. Attempt to mount the
filesystem to replay the log or use the -L option to destroy the log and
attempt a repair.

 

 

 

Now I am currently running xfs_repair -vL on one of the drives. Yesterday this took about 2 hours to complete. Will update.

 

Thank you very much for your assistance @johnnie.black

Link to comment

Update:

 

xfs_repair is currently running with -vL flag. Yesterday when running this on 6.2.4, the system read the entire disk and didn't write more than about 2000 according to the MAIN page. There seems to be a lot more happening today on 6.3.5 with the same flags. The total writes are about 60% of the total reads, and it appears to be writing the exact same number of times to the parity drive. Additionally, the output is MUCH more detailed than yesterday.

 

Here is a snapshot of the current output: https://pastebin.com/Adj6zMCm

 

Will update when complete.

Link to comment

SUCCESS!!!

All drives are up and running! I can access the shares, and will start performing tests to make sure all the data is available. You @johnnie.black are a scholar and a saint!

 

One side effect I am noticing, however, is all of my docker containers are gone. They have actually been gone since I migrated everything over, but I was hoping they would show back up when the drives mounted correctly. Short of reinstalling them with the old configs, is there something I should be doing to get them to populate, or do I need to reinstall them? It looks like all the config files are good, plus I have weekly backups I could pull from.

 

I will update the post with the solution, once I get the docker issue figured out.

 

Thank you.

 

Link to comment

OK, so docker containers were relatively easy to restore as they were before.

 

The only issue I see at this point, is there seems to be quite a bit of data loss between the two drives. I can't tell if this is permanent, or just a glitch. There are lots of files missing from nearly every directly, and it seems to be centered around the drives that were impacted. Its certainly manageable, but before I close the book I just want to make sure I can't retrieve these.

 

Any ideas?

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...