spall Posted June 20, 2018 Share Posted June 20, 2018 Hi all, Migrated my second server over to new hardware. I had one disk that was bad on the old hardware which I replaced when I migrated. When I brought the array online, disk 5 is showing as unmountable. I started the array in maintenance mode and did a check on the filesystem: Phase 1 - find and verify superblock... - block cache size set to 751712 entries Phase 2 - using internal log - zero log... totally zeroed log zero_log: head block 0 tail block 0 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 3 - agno = 1 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:8366) is ahead of log (0:0). Format log to cycle 4. xfs_repair: libxfs_device_zero write failed: Input/output error I'm not sure what (if anything) to do at this point. Sadly, I don't have a backup of the data on that disk. Any help appreciated. data-diagnostics-20180619-2000.zip Link to comment
JorgeB Posted June 20, 2018 Share Posted June 20, 2018 Disk5 is failing, you need to replace it first then run xfs_repair again. Link to comment
JorgeB Posted June 20, 2018 Share Posted June 20, 2018 Though since disk2 is currently disabled and you only have single parity you'll need to rebuild that one first, and you're likely going to get some data corruption during that rebuild due to disk5 read errors. Link to comment
spall Posted June 20, 2018 Author Share Posted June 20, 2018 Disk 2 actually has no data on it. It was an empty disk that failed. Is there a way to go about this that would allow me to salvage disk 5? That aside, are you suggesting that I rebuild disk 2 and then replace and rebuild disk 5? Link to comment
pwm Posted June 20, 2018 Share Posted June 20, 2018 7 minutes ago, spall said: Disk 2 actually has no data on it. It was an empty disk that failed. Is there a way to go about this that would allow me to salvage disk 5? That aside, are you suggesting that I rebuild disk 2 and then replace and rebuild disk 5? Empty or not doesn't matter. Any data disk that have been added to the array will contribute to the parity. It's only empty as in 100% zeroed that doesn't contribute to the parity - but a zeroed disk doesn't have a file system or partition table so directly unRAID formats the drive it can't be all-zero anymore. Link to comment
JorgeB Posted June 20, 2018 Share Posted June 20, 2018 17 minutes ago, spall said: are you suggesting that I rebuild disk 2 and then replace and rebuild disk 5? It's the best option to recover as much as possible from disk5 Link to comment
spall Posted June 20, 2018 Author Share Posted June 20, 2018 10 hours ago, johnnie.black said: It's the best option to recover as much as possible from disk5 I'll give it a whirl and see what happens. Thanks. Link to comment
Warrentheo Posted June 20, 2018 Share Posted June 20, 2018 Just pointing out the simple answer, since they are the ones that always come back to bite me (I over think this stuff way to much when it happens to me)... make sure that your system is set to AHCI in Bios, and that you try unpluging power and data cables with another drive known to be working before you assume bad drive... Especially when you now seem to have two drives going bad at the same time, it makes it very unlikely to actually be bad drives... Not impossible... Just unlikely... Link to comment
spall Posted June 21, 2018 Author Share Posted June 21, 2018 2 hours ago, Warrentheo said: Just pointing out the simple answer, since they are the ones that always come back to bite me (I over think this stuff way to much when it happens to me)... make sure that your system is set to AHCI in Bios, and that you try unpluging power and data cables with another drive known to be working before you assume bad drive... Especially when you now seem to have two drives going bad at the same time, it makes it very unlikely to actually be bad drives... Not impossible... Just unlikely... Hey Warrentheo, Thanks.. yeah.. I hear you. It is set to AHCI. I actually checked that first. I'll change out the SATA cables and take a look. The drive is being powered via a 5 bay SuperMicro cage that is getting power from two molex coming off the PSU. I can swap which bay it is in, but otherwise if it's power it would probably be an issue with the backplane then. Link to comment
spall Posted June 21, 2018 Author Share Posted June 21, 2018 So I rebuilt drive 2 and then rebuilt drive 5. After repairing the file system on drive 5, I have data again. This drive contained a ton of movie files.. so I have no idea how to tell what (if anything) is corrupt. But at least some set of the data is accessible again. Link to comment
JorgeB Posted June 21, 2018 Share Posted June 21, 2018 1 hour ago, spall said: This drive contained a ton of movie files.. so I have no idea how to tell what (if anything) is corrupt. If there were read errors during disk2's rebuild there likely will be some some corrupt files(s) on disk5, you'd need to already have checksums to check them, but if it were only a few errors they should be mostly unnoticeable on video files, like a a couple of glitches or so during playback. Link to comment
spall Posted June 21, 2018 Author Share Posted June 21, 2018 There were about 20 or so read errors during both rebuilds from disk5. A handful of files being corrupt won't be the end of the world. At least most of the data was recovered Thanks for the help! Regarding checksums.. that would be useful. You have any tips on the best way to get started with that for files on my servers? Thanks again. Link to comment
remotevisitor Posted June 21, 2018 Share Posted June 21, 2018 ‘Dynamic File Integrity’ plug-in. Link to comment
pwm Posted June 21, 2018 Share Posted June 21, 2018 34 minutes ago, spall said: Regarding checksums.. that would be useful. Keeping checksums is really very, very useful for files that are seldom or never updated. Also very useful for files that are often updated - but then it is only practical if using a file system that computes new checksums inside the file system on every file write like btrfs does. In the end, it's good to be able to do a scrub and have a program report back that all file data is available and correct - or that lists specifically what files and/or file system blocks that contains incorrect content. Link to comment
spall Posted June 22, 2018 Author Share Posted June 22, 2018 Thanks for all the help guys. Data mostly recovered. Setting up Dynamix File Integrity to help in the future. Marking this as solved. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.