Kudjo Posted July 31 Share Posted July 31 I upgraded disk1 from a 10TB HDD to an 18TB HDD. That went well. The contents were emulated properly (I was able to use plex, etc) and the data rebuild completed without any issues that I could tell. Got the green dot, everything seemed fine. I choose this time to upgrade unraid from 6.12.10 to 6.12.11. I rebooted and I thought everything was fine. A few hours later (I don't know exactly how much later) users were letting me know that several of my services were down and had been for a while. I logged on and found all my shares had disappeared. When I logged onto the unraid machine directly, I couldn't enter the /mnt/user directory. And then I realized that I couldn't access /mnt/disk1 or it's data either. (no longer being emulated). I read online that sometimes when the shares disappear, a reboot will fix it. So, I tried that and when the reboot was complete I COULD see the shares listed in the unraid webGUI. But only for a few minutes before they disappeared again. And at no time could I access any of the shares from terminal or from a file manager. The only way I can access anything is by going directly to the disks (e.g. /mnt/disk2/stuff/thing/file.me) on the unraid server itself. When my system rebooted, it also started a parity-check and immediatly started reporting LOTS of "Sync errors corrected:". I let the parity-check complete (took over a day) and the result was over 2.4 BILLION Sync errors corrected. (!!!) But when the parity check was finally finished, I still couldn't see anything in disk1 and another reboot didn't help either. (Still no shares in the webGUI after a few minutes, and still no access to /mnt/user or /mnt/disk1). I was getting quite worried at this point. I decided the best thing to do would be to rollback to 6.12.10 and hope I had just had the misfortune of finding a 6.12.11 bug. I finished the rollback (about an hour or two ago) and after the reboot I got the same behavior: No access to /mnt/user or /mnt/disk1, and no access to the shares in the webGUI (even though I can see the shares from my windows file explorer, I still cannot access them) I am terrified to do anything without direction now. I really don't want to lose 10TB of stuff. I've never had trouble like this with unraid before and have used it for MANY years. I've attached my latest Diagnostics. Please direct me of what to do. I'm happy to do the work, I'm just at a loss for how to proceed safely. P.S. And before anyone says so: yes, I know the importance of backups. This is 1 of 2 unraid servers. and part of a personal backup improvement project. I am literally in the process of backing up everything important when this hiccup happened. But the backups target the shares and I have no idea what was stored on disk1. unkudjo-diagnostics-20240731_0451.zip Quote Link to comment
JorgeB Posted July 31 Share Posted July 31 Check filesystem on disk1, run it without -n Also check/replace cables for cache1 Quote Link to comment
Kudjo Posted July 31 Author Share Posted July 31 I just noticed that I have more Diagnostics that were closer to all the described events. I'm posting them, in case they are helpful. unkudjo-diagnostics-20240731-1053.zip unkudjo-diagnostics-20240731-1000.zip unkudjo-diagnostics-20240729-1016.zip Quote Link to comment
Kudjo Posted July 31 Author Share Posted July 31 I just saw your post @JorgeB. I'll start that now. It'll take me a few mintues to read your linked article and make sure I do it correctly. I'll post back here when it's completed. Thank you, @JorgeB. Quote Link to comment
Kudjo Posted July 31 Author Share Posted July 31 Here is what it looked like before I started the Check filesystem operation(s): Stopping array: This took some time because it seemed to get stuck here... Should I force this or is it best to wait? (I've attached Diagnostics) unkudjo-diagnostics-20240731-1259.zip Quote Link to comment
JorgeB Posted July 31 Share Posted July 31 Disk1 is busy, also disk6, type reboot in the CLI, if it doesn't reboot after 5 minutes you will need to force it, then check filesystem. Quote Link to comment
Kudjo Posted August 1 Author Share Posted August 1 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Quote Link to comment
Kudjo Posted August 1 Author Share Posted August 1 I ran the operation without the "-n" and with "-L". I think the operation is complete, because the read/write count has stopped updating But everytime I try to load the page where I ran the check filesystem from, I get this, so I can't actually see what happened. I pulled Diagnostics, but don't know where to look to see instructions for what's next in them. According to what I'm reading online, others that were in a similar situation would reboot and start the array in normal mode (instead of Maintenance Mode, like I had it in for the check filesystem operation). Is that what I do next? Thank you very much for all of your help with this so far! I am very grateful. unkudjo-diagnostics-20240801-0917.zip Quote Link to comment
JorgeB Posted August 1 Share Posted August 1 type reboot in the CLI then post new diags after array start. Quote Link to comment
Kudjo Posted August 1 Author Share Posted August 1 (edited) Operation complete. Oh, Thank you, thank you, thank you! It looks like it worked!!! I can browse the shares again and disk1 looks like it has all it's data again! New diags posted. Did we do it?! unkudjo-diagnostics-20240801-0949.zip Edited August 1 by Kudjo Quote Link to comment
JorgeB Posted August 1 Share Posted August 1 21 hours ago, JorgeB said: Also check/replace cables for cache1 Disk1 looks OK, still seeing issues with cache1 Quote Link to comment
Kudjo Posted August 1 Author Share Posted August 1 I'll reseat it again and then change the cable if that doesn't help. Thank you. I'll post a summary and mark it as the solution. Thank you, @JorgeB. Quote Link to comment
Solution Kudjo Posted August 1 Author Solution Share Posted August 1 After reviewing the diagnostics, I followed the instructions on how to check filesystem and first ran without the "-n". Pulled diags again and then ran the check filesystem without the "-n" again, but included "-L" this time. Once the operation was complete (the WebGUI "Main" display no longer showed the disk read/writes increasing), we pulled diags again and then rebooted the machine and started the array again. At this point, the shares had been restored and the data on the target disk was accessible again. No data loss as far as I can tell. We pulled diags one more time to verify that all looks good. Thank you for all your help, @JorgeB! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.