MooTheKow Posted December 23, 2022 Share Posted December 23, 2022 Ok - apologies for posting what seems like has been posted many times before. I have a 7 drive array with 1 parity drive. I noticed a couple days one of my drives failed - so I ordered a replacement. Replacement came today so I shut down Unraid through the GUI, unplugged my old drive (Disk 1) and plugged in a new drive. New drive is bigger than all other drives - so was going to attempt a parity swap (https://wiki.unraid.net/The_parity_swap_procedure). Before getting to that -- I noticed that a 2nd disk (i.e. not the one I removed) was now reporting as unmountable. First screenshot is from prior to reboot, second screenshot is of after: Struggling to figure out what I should do here. I tried swapping cables to the drive - no difference. I tried restarting the array in maintenance mode, went to the drive and checked the 'Check' button in the 'Check Filesystem Status' area. The results were: Phase 1 - find and verify superblock... superblock read failed, offset 0, size 524288, ag 0, rval -1 fatal error -- Input/output error I did a 'SMART short self-test' and it returned 'Completed without error'. What other diagnostics can I generate to post in order to try to figure out how to proceed at this point? Quote Link to comment
MooTheKow Posted December 23, 2022 Author Share Posted December 23, 2022 So -- further info ... for the heck of it - tried reconnecting the drive that wasn't working. It is now picking it up (seemingly) fine. However - when I attempt to re-assign it to disk 1 it seems to think it is a new drive. It says "Start will start Parity-Sync and/or Data-Rebuild." and "Stopped. Replacement disk installed" next to the array start button. Any way to say 'no - just use this as it is'?? Quote Link to comment
MooTheKow Posted December 23, 2022 Author Share Posted December 23, 2022 (edited) kowunraid-diagnostics-20221222-2125.zip attaching diagnostics Smart report from 'drive 3': kowunraid-smart-20221222-2125.zip and smart report from the 'drive 1': kowunraid-smart-20221222-2021 - Drive 1.zip Edited December 23, 2022 by MooTheKow Quote Link to comment
trurl Posted December 23, 2022 Share Posted December 23, 2022 The reason it wants to rebuild disk1 is because you reassigned it. Unassign disk1, go to Disk Settings and turn off autostart. You're still having connection problems on disk3. Shutdown, check all connections again, SATA and power, including splitters. Reboot, start the array in normal mode with nothing assigned as disk1, then post new diagnostics and a screenshot of Main - Array Devices. Quote Link to comment
MooTheKow Posted December 23, 2022 Author Share Posted December 23, 2022 Tried re-checking all the connections to all the drives (unplugging/re-plugging them all in). Attaching latest diagnostics: kowunraid-diagnostics-20221222-2208.zip Screenshot after rebooting: Quote Link to comment
trurl Posted December 23, 2022 Share Posted December 23, 2022 Disks 1 and 3 both have pending sectors. Also, disk3 is WD. You should add attributes 1 and 200 for monitoring on any WD disk. SMART attribute 1 for that WD might also be reason for concern. Neither disk has had extended self-test run. Might be worth running extended test on both to see if one is worse than the other. There is a way to get it to rebuild disk3 instead of disk1 if it seems like a good idea. SMART for other disks looks OK. Lets see if whether the disks are mountable again after you recheck connections. Quote Link to comment
MooTheKow Posted December 23, 2022 Author Share Posted December 23, 2022 (edited) When attempting to re-start the array disk 3 was still in the same state as before best I could tell. Also - so far as Disk 1 goes - I'm unclear what I should actually try doing with that at the moment. Should I be attempting to click the 'Mount' button in the 'Unassigned Devices'? Also - hugely appreciate your assistance. Have had a sick feeling in my stomach the last several hours here trying to figure this out :). Edited December 23, 2022 by MooTheKow Quote Link to comment
trurl Posted December 23, 2022 Share Posted December 23, 2022 Looking at syslog again, looks like it is an actual disk problem with disk3. We might want to try to rebuild that one instead of disk1. And you might need to replace both but of course you can only rebuild one at a time with single parity. But your diagnostics and screenshot are without the array started. Can't tell whether filesystems are mountable until the array is started. 19 minutes ago, trurl said: start the array in normal mode with nothing assigned as disk1, then post new diagnostics and a screenshot of Main - Array Devices. Since you have 2 drives with SMART warnings, I have to ask. Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Don't let one unnoticed problem become multiple problems and possible data loss. Quote Link to comment
MooTheKow Posted December 23, 2022 Author Share Posted December 23, 2022 (edited) Thought I did have it set up to get emails - but I guess I either didn't get it, or didn't understand what the pop-ups were telling me. Just re-tested the email notifications in the setup - and looks like my authorization is failing now.. apparently need to re-set that up. (Looks like my gmail SMTP settings from a while back no longer work.. reading up on how to fix that now). UPDATE: generated a gmail app password and that seems to be working now. So - How can I go about attempting to rebuild disk 3? I mentioned earlier about Disk 1 and it wanting to treat it as a new drive -- I was unclear from your response what I can do about that. Edited December 23, 2022 by MooTheKow Quote Link to comment
trurl Posted December 23, 2022 Share Posted December 23, 2022 We are posting at the same time. Your new screenshot shows what we need. Probably a good idea to not allow any further writes to any disks in the array for now. Disable Docker and VM Manager in Settings. Doesn't look like you have a cache drive. If not then we don't have to worry about Mover running. Run an extended SMART self-test on disk3 and the disk formerly assigned as disk1. You can do both at the same time. It will take many hours unless the test fails before then. I will check back in the morning. Quote Link to comment
trurl Posted December 23, 2022 Share Posted December 23, 2022 6 minutes ago, MooTheKow said: So - How can I go about attempting to rebuild disk 3? I mentioned earlier about Disk 1 and it wanting to treat it as a new drive -- I was unclear from your response what I can do about that. We will get to that after the extended self-tests complete. Quote Link to comment
MooTheKow Posted December 23, 2022 Author Share Posted December 23, 2022 Starting tests - thank you so very much. Quote Link to comment
MooTheKow Posted December 23, 2022 Author Share Posted December 23, 2022 (edited) Disk 3 failed already - attaching smart report. kowunraid-smart-20221222-2230.zip Edited December 23, 2022 by MooTheKow Quote Link to comment
trurl Posted December 23, 2022 Share Posted December 23, 2022 2 hours ago, MooTheKow said: noticed a couple days one of my drives failed 7 minutes ago, MooTheKow said: How can I go about attempting to rebuild disk 3? Even though disk1 was disabled, it is emulated by updating parity. If there were any writes to emulated disk1 while you were waiting to replace it, those writes will be lost if you rebuild disk3 instead. And since disk1 is the disk most likely to be written on a new system, and you have no cache, disk1 is probably where all your docker and VM data is, and where new files were probably written. Also, any writes to emulated disk1 will have updated parity, which means physical disk1 and parity are out-of-sync, and so parity and physical disk1 together are also out-of-sync with disk3. All other disks are required to rebuild a disk, so this means rebuild of disk3 will be compromised. But, since disk3 has failed extended self-test, we may not have any choice but to rebuild it instead of disk1. That is assuming disk1 doesn't also fail extended self-test. We may have to resort to cloning these bad disks before we can recover anything, which might mean you need more spare disks. Do you have backups of anything important and irreplaceable? I will ping @JorgeB to get more help but it is already way past bedtime in his timezone. We will take a look in the morning to see the results of disk1 self-test and decide where to go from there. Quote Link to comment
JorgeB Posted December 23, 2022 Share Posted December 23, 2022 5 hours ago, trurl said: We will take a look in the morning to see the results of disk1 self-test and decide where to go from there. Yep, post this when available. Quote Link to comment
MooTheKow Posted December 23, 2022 Author Share Posted December 23, 2022 So.. still no results on Disk 1 .. woke up at 4am and found it said it was interrupted or something: kowunraid-smart-20221222-2231.zip Tried starting it again --- (whis would be roughly 2 and a half hours ago.. maybe 3?) .. just got up and checked and says 'self-test in progress, 10% complete' still.. Opened a new tab attempting to look at the disk and am seeing this -- that because the test is in progress? Also - spare disks were mentioned in an earlier post. Currently I have a couple 4TB external drives and a new 14TB internal drive. if I need to pick up additional internal drives I can/will. Quote Link to comment
trurl Posted December 23, 2022 Share Posted December 23, 2022 You may have to disable spindown on the disk to get smart test to complete. 1 hour ago, MooTheKow said: Opened a new tab attempting to look at the disk and am seeing this -- that because the test is in progress? Just to make sure everything is consistent, only open one browser to your server and see what it looks like. That screenshot suggests the test is no longer running and the disk can't be communicated with. Attach new diagnostics to your NEXT post in this thread. 9 hours ago, trurl said: We may have to resort to cloning these bad disks before we can recover anything, which might mean you need more spare disks. Quote Link to comment
trurl Posted December 23, 2022 Share Posted December 23, 2022 9 hours ago, trurl said: Do you have backups of anything important and irreplaceable? Quote Link to comment
MooTheKow Posted December 23, 2022 Author Share Posted December 23, 2022 Think so (backup of irreplaceable stuff). I can still access some files -- so going to try to backup a 'pictures' folder (though i think most are already backed up amazon photos) as well as a videos folder. The majority of the data is backups of media (blu-ray rips, tv show rips, etc). Have backups of old hard drives (just in case -- most of it things I've not accessed in years an may never need to). So (Thankfully) at this point most data loss would just fall into the 'huge inconvenience' category instead of the 'now i need to cry that it is lost forever' category. I did disable spin-down (Best I can tell) in the settings tab. kowunraid-diagnostics-20221223-0845.zip Quote Link to comment
trurl Posted December 23, 2022 Share Posted December 23, 2022 These latest diagnostics show no SMART report for disk1. I guess you could try connecting it again. It will have to be connected before we can attempt to clone it anyway. Quote Link to comment
MooTheKow Posted December 23, 2022 Author Share Posted December 23, 2022 Ok - should be showing up now (i think?) kowunraid-diagnostics-20221223-1143.zip Sorry for delay - was waiting for pictures/videos to copy off onto an external drive before rebooting. Quote Link to comment
JorgeB Posted December 23, 2022 Share Posted December 23, 2022 Try running a short SMART test, since there are pending sectors if should go directly to test those, if it fails disk is bad, if it passes try long test again. Quote Link to comment
MooTheKow Posted December 23, 2022 Author Share Posted December 23, 2022 (edited) Results from the short self-test: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Interrupted (host reset) 00% 36925 - # 2 Extended offline Interrupted (host reset) 00% 36919 - # 3 Short offline Completed without error 00% 17972 - kowunraid-smart-20221223-1150.zip going to attempt an extended self-test now. Edited December 23, 2022 by MooTheKow Quote Link to comment
JorgeB Posted December 23, 2022 Share Posted December 23, 2022 Short test also got interrupted, that's strange, short test is rather quick. Quote Link to comment
MooTheKow Posted December 23, 2022 Author Share Posted December 23, 2022 Got an 'Interrupted (host reset)' error. I've read I need to turn off spin-down timers -- is this that setting? (was already set to 'Never') Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.