AnnHashaway Posted August 27, 2017 Share Posted August 27, 2017 I switched from an older XEON setup to an AMD FX CPU and corresponding motherboard. After moving all the drives over, Unraid booted up as normal, but two of the disk drives are labeled as un-mountable. Every indicator is green, and there was no sign of corruption that I could tell. Current Setup: Parity - 8TB WD Red Drive 1 - 4TB WD Red (UN-MOUNTABLE) Drive 2 - 4 TB WD Red Drive 3 - 4 TB WD Red (UN-MOUNTABLE) Cache - 240GB Crucial SSD UNRAID 6.2.4 GIGABYTE GA-990FXA-UD3 Motherboard AMD FX-8370 Black Edition 8 Core CPU After moving everything to the new box, I did have an additional SSD with a Windows 10 install on it. Win10 did boot up a few times while initially getting everything set up, so I don't know if Windows does anything to the drives in the process. What I Have Done: Swapped power cables to each drive Swapped SATA cables to each drive Run xfs_repair through the GUI, with and without the -L flag. Error I Received Without -L : Quote verified secondary superblock... writing modified primary superblock - block cache size set to 1478152 entries sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97 resetting superblock realtime bitmap ino pointer to 97 sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98 resetting superblock realtime summary ino pointer to 98 Phase 2 - using internal log - zero log... Log inconsistent (didn't find previous header) failed to find log head zero_log: cannot find log head/tail (xlog_find_tail=5) fatal error -- ERROR: The log head and/or tail cannot be discovered. Attempt to mount the filesystem to replay the log or use the -L option to destroy the log and attempt a repair. Error I Received With -L : Quote Phase 1 - find and verify superblock... - block cache size set to 1478152 entries sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97 resetting superblock realtime bitmap ino pointer to 97 sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98 resetting superblock realtime summary ino pointer to 98 Phase 2 - using internal log - zero log... Log inconsistent (didn't find previous header) failed to find log head zero_log: cannot find log head/tail (xlog_find_tail=5) Log inconsistent (didn't find previous header) empty log check failed fatal error -- failed to clear log Diagnostics Attached Anyone know what my options are here? Thank you. diagnostics-20170827-1038.zip Link to comment
SSD Posted August 27, 2017 Share Posted August 27, 2017 I would post the exact motherboard model you have in case anyone has any experience. This is obviously very serious. You have two drives that are unmountable. The data on both of those drives is at risk. I would recommend no writes to the array. You could run a short UNCORRECTING parity check. Just 1 minute is enough. I'd be curious if the disks are showing parity sync errors. Since it is a read-only operation, it should cause no harm. There is a tool called "testdisk" that people have had some luck with in correcting corrupted partitions. I would expect that might be the way forward. But suggest waiting for others to confirm and give any other options. I had an issue the other day and downloaded this package for Slackware, but I wound up not needing to use it. Other references on the forum require you boot some other Linux to run the tool, but I expect this will work just fine. The link I found is here: Download from https://slackware.pkgs.org/14.2/slackers/testdisk-7.0-x86_64-8cf.txz.html Testdisk wiki http://www.cgsecurity.org/wiki/TestDisk_Step_By_Step Once you dl it, suggest putting on your flash, say in a "/packages" folder. To install it run the command: installpkg /boot/packages/testdisk-7.0-x86_64-8cf.txz I installed it successfully yesterday. But while writing this post, I just tried to run testdisk and found it is missing a library. Maybe someone can explain how to install this library (see message below). testdisk: error while loading shared libraries: libewf.so.2: cannot open shared object file: No such file or directory I mentioned I didn't need it, and thought I would explain why in case future readers are interested. I had a disk that was showing unmountable but reason was it was already mounted elsewhere and had an open file on it. (I was moving an unassigned device that I had prepared outside the array to the array, and forgot to close it out completely.) I was able to stop the array, kill the process that had the open file, and unmount the disk from its prior mountpoint. I then started the array and the disk mounted normally. So unmountable does not always mean corrupted, but in your case it sounds like you have partition corruption. Anyone reading with an unmountable situation, make sure that the disk is not unmountable for some other reason before trying testdisk! Good luck. Referencing a few other forum experts that often chime in on these types of problems and may have another idea how to proceed ... ( @johnnie.black, @jonathanm @itimpi) Link to comment
AnnHashaway Posted August 27, 2017 Author Share Posted August 27, 2017 I have updated the post with additional information: CPU and Motherboard Diagnostics attached Additional steps I have tried (swapping cables) Still plugging way Link to comment
AnnHashaway Posted August 27, 2017 Author Share Posted August 27, 2017 1 hour ago, bjp999 said: I would recommend no writes to the array. You could run a short UNCORRECTING parity check. Just 1 minute is enough. I'd be curious if the disks are showing parity sync errors. Since it is a read-only operation, it should cause no harm. I ran an un-correcting parity check for 5 minutes. Here was the output: Quote Total size: 8 TB Elapsed time: 5 minutes Current position: 49.9 GB (0.6 %) Estimated speed: 167.2 MB/sec Estimated finish: 13 hours, 13 minutes Sync errors detected: 24 I also received these two email notifications: Quote Event: unRAID Parity disk errorSubject: Alert [SERVER] - Parity disk in error state (disk missing)Description: No device identification ()Importance: alert Quote Event: unRAID Disk 1 errorSubject: Alert [DENALI] - Disk 1 in error state (disk missing)Description: No device identification ()Importance: alert Link to comment
JorgeB Posted August 27, 2017 Share Posted August 27, 2017 Both problem disks are connected on SATA ports configured for IDE mode, got to your bios and change all SATA ports to AHCI. Link to comment
AnnHashaway Posted August 27, 2017 Author Share Posted August 27, 2017 18 minutes ago, johnnie.black said: Both problem disks are connected on SATA ports configured for IDE mode, got to your bios and change all SATA ports to AHCI. The MOBO has six SATA Connections - 0,1,2,3,4,5. Problem disks were plugged into 4 and 5. What I Tried: In BIOS I found OnChip SATA Port 4/5 Type and changed it from IDE to As SATA TYPE (Only other option) - DID NOT WORK Swapped one of the problem drives to SATA connection 2 - DID NOT WORK I will continue trying BIOS options and post results. Link to comment
JorgeB Posted August 27, 2017 Share Posted August 27, 2017 Make sure all disks are on ports using AHCI (post new diags if in doubt), update to v6.3.5. since it has many xfs fixes, and run xfs_repair again. Link to comment
AnnHashaway Posted August 27, 2017 Author Share Posted August 27, 2017 New diagnostics attached. diagnostics-20170827-1249.zip Link to comment
JorgeB Posted August 27, 2017 Share Posted August 27, 2017 All ports are configure correctly now, upgrade to v.6.3.5 and run xfs_repair again. Link to comment
AnnHashaway Posted August 27, 2017 Author Share Posted August 27, 2017 Update: Moved one of the bad drives to SATA Connection 2 Changed OnChip SATA Port 4/5 Type and changed it from IDE to As SATA TYPE Upgraded to 6.3.5 Ran xfs_repair with -v flag - Received this message: Quote Phase 1 - find and verify superblock... - block cache size set to 1478144 entries sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97 resetting superblock realtime bitmap ino pointer to 97 sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98 resetting superblock realtime summary ino pointer to 98 Phase 2 - using internal log - zero log... Log inconsistent (didn't find previous header) failed to find log head zero_log: cannot find log head/tail (xlog_find_tail=5) fatal error -- ERROR: The log head and/or tail cannot be discovered. Attempt to mount the filesystem to replay the log or use the -L option to destroy the log and attempt a repair. Now I am currently running xfs_repair -vL on one of the drives. Yesterday this took about 2 hours to complete. Will update. Thank you very much for your assistance @johnnie.black Link to comment
AnnHashaway Posted August 27, 2017 Author Share Posted August 27, 2017 Update: xfs_repair is currently running with -vL flag. Yesterday when running this on 6.2.4, the system read the entire disk and didn't write more than about 2000 according to the MAIN page. There seems to be a lot more happening today on 6.3.5 with the same flags. The total writes are about 60% of the total reads, and it appears to be writing the exact same number of times to the parity drive. Additionally, the output is MUCH more detailed than yesterday. Here is a snapshot of the current output: https://pastebin.com/Adj6zMCm Will update when complete. Link to comment
JorgeB Posted August 27, 2017 Share Posted August 27, 2017 Problem was mostly likely caused by the IDE mode, it should only be slower than AHCI but for it's not the first time I've seen it causing serious issues on those AMD boards, filesystem looks very corrupt, but maybe xfs_repair can still repair it, good luck! Link to comment
AnnHashaway Posted August 27, 2017 Author Share Posted August 27, 2017 SUCCESS!!! All drives are up and running! I can access the shares, and will start performing tests to make sure all the data is available. You @johnnie.black are a scholar and a saint! One side effect I am noticing, however, is all of my docker containers are gone. They have actually been gone since I migrated everything over, but I was hoping they would show back up when the drives mounted correctly. Short of reinstalling them with the old configs, is there something I should be doing to get them to populate, or do I need to reinstall them? It looks like all the config files are good, plus I have weekly backups I could pull from. I will update the post with the solution, once I get the docker issue figured out. Thank you. Link to comment
JorgeB Posted August 27, 2017 Share Posted August 27, 2017 If appdata is OK delete and recreate your docker image: Link to comment
AnnHashaway Posted August 27, 2017 Author Share Posted August 27, 2017 OK, so docker containers were relatively easy to restore as they were before. The only issue I see at this point, is there seems to be quite a bit of data loss between the two drives. I can't tell if this is permanent, or just a glitch. There are lots of files missing from nearly every directly, and it seems to be centered around the drives that were impacted. Its certainly manageable, but before I close the book I just want to make sure I can't retrieve these. Any ideas? Link to comment
JorgeB Posted August 27, 2017 Share Posted August 27, 2017 I'm not surprised given the level of corruption, look for a lost+found folder on those disks. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.