Jax Posted February 6, 2020 Share Posted February 6, 2020 Hi, As per the title, I have two drives in my array that are currently unusable. Array consists of 12 disks + Parity, and disk 4 is error'd out, while disk 8 is showing as "unmountable". This originally started when disk 8 was error'd out a couple of days ago - I stopped the array, removed the disk (which felt loose in the hot swap bay) and put on my desktop caddy for testing. It all came back fine, so I followed the procedure to re-introduce it into the array. (Ensuring that the drive had a good seat in the bay) The rebuild appeared to start fine, so I went to bed and when I woke up this morning, I see the array in the state that it's in now. Am I screwed? Diag attached - thanks for any assistance that can be provided. tc-nas-01-diagnostics-20200206-1259.zip Quote Link to comment
JorgeB Posted February 6, 2020 Share Posted February 6, 2020 There were read errors on disk4 early on the rebuild, so rebuilt disk will be mostly corrupt, this looks like one of the typical SASLP problems, but since disk4 dropped offline there's no SMART, reboot/power cycle server to see if disk4 comes back online then post SMART report, avoid starting the array for now. Quote Link to comment
Jax Posted February 6, 2020 Author Share Posted February 6, 2020 Thanks for looking. I've attached a SMART report for disk 4 after power cycling the server. ST4000VN008-2DR166_ZGY110J8-20200206-1530.txt Quote Link to comment
JorgeB Posted February 7, 2020 Share Posted February 7, 2020 Disk looks healthy, so the problem was most likely cause by the controller, SAS2LP are not recommend for a long time, you should replace them with LSI controllers, then we can try try re-enabling disk4 to rebuild disk8 again, or if you want we can try again with current controllers. 1 Quote Link to comment
Jax Posted February 8, 2020 Author Share Posted February 8, 2020 Thanks - I'll just bite the bullet and replace the controllers first. Looks like the most reasonable options are available overseas so I'll just keep the array shut down for a few weeks till the cards arrive. I will reach out to you once the cards are in and recognized by Unraid. Thanks again for your help. Quote Link to comment
Jax Posted February 22, 2020 Author Share Posted February 22, 2020 Hello, New LSI controllers have been installed and the system is powered back up and appears to be in the exact state it was left in. What would next steps be to try and recover disks 4 & 8? On 2/7/2020 at 3:03 AM, johnnie.black said: Disk looks healthy, so the problem was most likely cause by the controller, SAS2LP are not recommend for a long time, you should replace them with LSI controllers, then we can try try re-enabling disk4 to rebuild disk8 again, or if you want we can try again with current controllers. Quote Link to comment
JorgeB Posted February 22, 2020 Share Posted February 22, 2020 Please post new diags just to make sure everything is as expected. Quote Link to comment
Jax Posted February 22, 2020 Author Share Posted February 22, 2020 tc-nas-01-diagnostics-20200222-1857.zip 16 hours ago, johnnie.black said: Please post new diags just to make sure everything is as expected. Latest diag attached. Quote Link to comment
JorgeB Posted February 23, 2020 Share Posted February 23, 2020 Checking the original diags to refresh my memory on what happened here I just noticed that disk8 failed to mount even before there were read errors during the rebuild: Feb 5 22:47:43 TC-NAS-01 emhttpd: shcmd (184): mount -t xfs -o noatime,nodiratime /dev/md8 /mnt/disk8 Feb 5 22:47:43 TC-NAS-01 kernel: XFS (md8): Mounting V5 Filesystem Feb 5 22:47:43 TC-NAS-01 kernel: XFS (md8): Log inconsistent (didn't find previous header) Feb 5 22:47:43 TC-NAS-01 kernel: XFS (md8): failed to find log head Feb 5 22:47:43 TC-NAS-01 kernel: XFS (md8): log mount/recovery failed: error -5 Feb 5 22:47:43 TC-NAS-01 kernel: XFS (md8): log mount failed Feb 5 22:47:43 TC-NAS-01 root: mount: /mnt/disk8: can't read superblock on /dev/md8. Feb 5 22:47:43 TC-NAS-01 emhttpd: shcmd (184): exit status: 32 Feb 5 22:47:43 TC-NAS-01 emhttpd: /mnt/disk8 mount error: No file system Feb 5 22:47:43 TC-NAS-01 emhttpd: shcmd (185): umount /mnt/disk8 Feb 5 22:47:43 TC-NAS-01 root: umount: /mnt/disk8: not mounted. This suggests there were already filesystem issues, so you can still continue but success depends on how bad that corruption was, but it might be easily fixable by xfs_repair, everything else looks fine for now, the procedure is: -Tools -> New Config -> Retain current configuration: All -> Apply -Check all assignments and assign any missing disk(s) if needed. -Important - After checking the assignments leave the browser on that page, the "Main" page. -Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters): mdcmd set invalidslot 8 29 -Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk8 will start rebuilding, disk should mount immediately (it likely won't mount in this case) but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check 1 Quote Link to comment
Jax Posted February 23, 2020 Author Share Posted February 23, 2020 Thanks for your time to provide these instructions... very much appreciated. 🙂 I will try it in a few minutes and report back when it's done. Quote Link to comment
Jax Posted February 24, 2020 Author Share Posted February 24, 2020 (edited) Update: Disk 8 has finished rebuilding with 0 errors. (according to the GUI notification) I haven't refreshed the GUI, but it's still showing that drive 8 has an unmountable file system - I suspect that may change if I refresh the GUI, but I will leave it as is and wait for your next instructions. Thanks. Edited February 24, 2020 by Jax Quote Link to comment
JorgeB Posted February 24, 2020 Share Posted February 24, 2020 5 hours ago, Jax said: I suspect that may change if I refresh the GUI It won't, you'll need to run a filesystem check, remove -n or nothing will be done and if it asks for -L use it. 1 Quote Link to comment
Jax Posted February 24, 2020 Author Share Posted February 24, 2020 4 hours ago, johnnie.black said: It won't, you'll need to run a filesystem check, remove -n or nothing will be done and if it asks for -L use it. You're right - and interestingly, I don't have the option to run a check on this drive from the GUI - the menu section to do the check is missing completely: Here it's showing fine for disk 9: Is there a way to execute a check on it now if it's showing "No file system"? The filesystem on drive 8 was definitely xfs prior to this issue. Latest diag attached. tc-nas-01-diagnostics-20200224-0714.zip My continued thanks for all of your help. Quote Link to comment
JorgeB Posted February 24, 2020 Share Posted February 24, 2020 2 minutes ago, Jax said: Is there a way to execute a check on it now if it's showing "No file system"? Stop the array, click on disk8 and change filesystem from auto to xfs, that should do it, if it doesn't report back and I'll post instructions to run it from the CLI. 1 Quote Link to comment
Jax Posted February 24, 2020 Author Share Posted February 24, 2020 Thanks - it's running now (with -v option): Quote Link to comment
JorgeB Posted February 24, 2020 Share Posted February 24, 2020 That's not a good sign, and I was afraid of that since like mentioned in the invalid slot instructions post, the disk was already unmountable before the first rebuild attempt because no superblock was found, which suggest parity wasn't 100% valid, still wait for xfs_repair to finish searching the disk for a valid superblock, but I wouldn't keep my hopes up. Quote Link to comment
Jax Posted February 24, 2020 Author Share Posted February 24, 2020 9 hours ago, johnnie.black said: That's not a good sign, and I was afraid of that since like mentioned in the invalid slot instructions post, the disk was already unmountable before the first rebuild attempt because no superblock was found, which suggest parity wasn't 100% valid, still wait for xfs_repair to finish searching the disk for a valid superblock, but I wouldn't keep my hopes up. Well - I left it running and went into the office... just got home now and it has completed unsuccessfully as you had suggested it would. Here is the output from the status pane in the GUI (minus a gazillion "."'s for readability): Phase 1 - find and verify superblock... couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!! attempting to find secondary superblock... .found candidate secondary superblock... unable to verify superblock, continuing... ....found candidate secondary superblock... unable to verify superblock, continuing... .................................found candidate secondary superblock... unable to verify superblock, continuing... .........Sorry, could not find valid secondary superblock Exiting now. So is all the data on the drive toast? Quote Link to comment
JorgeB Posted February 25, 2020 Share Posted February 25, 2020 9 hours ago, Jax said: So is all the data on the drive toast? Unfortunately that's very likely, whatever happened it happened before the first rebuild you attempted, the filesystem was already corrupt at that time, but without prior logs can't guess why, most likely parity wasn't 100% valid. You could try a file recovery utility, like UFS explorer, but difficult to guess how successful it would be, they do have a trial if you want to give it a shot. 1 Quote Link to comment
Jax Posted February 25, 2020 Author Share Posted February 25, 2020 7 hours ago, johnnie.black said: Unfortunately that's very likely, whatever happened it happened before the first rebuild you attempted, the filesystem was already corrupt at that time, but without prior logs can't guess why, most likely parity wasn't 100% valid. You could try a file recovery utility, like UFS explorer, but difficult to guess how successful it would be, they do have a trial if you want to give it a shot. Gotcha, Well, thanks again for all of your help - I'll check out UFS Explorer. I'll start the array and see exactly what files are lost - At this point, I think it will be best to just reformat disk 8 after seeing what can be salvaged using a recovery tool... We'll see. Quote Link to comment
Jax Posted February 26, 2020 Author Share Posted February 26, 2020 Update: I've tried XFS Explorer and found hundreds of folders and files that were corrupted - thinking of purchasing the software as it's not very expensive and appears to be quite useful. That said - In looking at what remains on the array, there is nothing critical missing. While the disk 8 was being scanned from my desktop, I assigned a fresh drive in it's spot in the array. Of course it did the rebuild and came back with the same "unmountable" error. Since I am OK with losing the data that was on Drive 8 - would there be any danger in formatting this new drive 8 to be used in the array? Or is there a better way to make the drive 8 spot usable again? Quote Link to comment
JorgeB Posted February 26, 2020 Share Posted February 26, 2020 8 minutes ago, Jax said: would there be any danger in formatting this new drive 8 to be used in the array? No danger, it will create a new empty filesystem that can be used immediately. 1 Quote Link to comment
Jax Posted February 26, 2020 Author Share Posted February 26, 2020 5 minutes ago, johnnie.black said: No danger, it will create a new empty filesystem that can be used immediately. Excellent - thanks. We can consider this "closed" now. Thanks again for all of your time and help on this - exceptional! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.