Replaced HDD, rebuilt successfully, but then data disappeared and Shares disappeared.

July 31, 20241 yr

I upgraded disk1 from a 10TB HDD to an 18TB HDD. That went well. The contents were emulated properly (I was able to use plex, etc) and the data rebuild completed without any issues that I could tell. Got the green dot, everything seemed fine. I choose this time to upgrade unraid from 6.12.10 to 6.12.11. I rebooted and I thought everything was fine. A few hours later (I don't know exactly how much later) users were letting me know that several of my services were down and had been for a while. I logged on and found all my shares had disappeared.

When I logged onto the unraid machine directly, I couldn't enter the /mnt/user directory. And then I realized that I couldn't access /mnt/disk1 or it's data either. (no longer being emulated).

image.png.1797f52e7feffbbcb4b78c7bcdbba532.png

I read online that sometimes when the shares disappear, a reboot will fix it. So, I tried that and when the reboot was complete I COULD see the shares listed in the unraid webGUI. But only for a few minutes before they disappeared again. And at no time could I access any of the shares from terminal or from a file manager. The only way I can access anything is by going directly to the disks (e.g. /mnt/disk2/stuff/thing/file.me) on the unraid server itself.

When my system rebooted, it also started a parity-check and immediatly started reporting LOTS of "Sync errors corrected:". I let the parity-check complete (took over a day) and the result was over 2.4 BILLION Sync errors corrected. (!!!) But when the parity check was finally finished, I still couldn't see anything in disk1 and another reboot didn't help either. (Still no shares in the webGUI after a few minutes, and still no access to /mnt/user or /mnt/disk1). I was getting quite worried at this point. I decided the best thing to do would be to rollback to 6.12.10 and hope I had just had the misfortune of finding a 6.12.11 bug. I finished the rollback (about an hour or two ago) and after the reboot I got the same behavior: No access to /mnt/user or /mnt/disk1, and no access to the shares in the webGUI (even though I can see the shares from my windows file explorer, I still cannot access them)

I am terrified to do anything without direction now. I really don't want to lose 10TB of stuff. I've never had trouble like this with unraid before and have used it for MANY years.

I've attached my latest Diagnostics. Please direct me of what to do. I'm happy to do the work, I'm just at a loss for how to proceed safely.

P.S. And before anyone says so: yes, I know the importance of backups. This is 1 of 2 unraid servers. and part of a personal backup improvement project. I am literally in the process of backing up everything important when this hiccup happened. But the backups target the shares and I have no idea what was stored on disk1.

unkudjo-diagnostics-20240731_0451.zip

Quote

July 31, 20241 yr

Community Expert

Check filesystem on disk1, run it without -n

Also check/replace cables for cache1

Quote

July 31, 20241 yr

Author

I just noticed that I have more Diagnostics that were closer to all the described events. I'm posting them, in case they are helpful.

unkudjo-diagnostics-20240731-1053.zip unkudjo-diagnostics-20240731-1000.zip unkudjo-diagnostics-20240729-1016.zip

Quote

July 31, 20241 yr

Author

I just saw your post @JorgeB. I'll start that now. It'll take me a few mintues to read your linked article and make sure I do it correctly. I'll post back here when it's completed.

Thank you, @JorgeB.

Quote

July 31, 20241 yr

Author

Here is what it looked like before I started the Check filesystem operation(s):

Stopping array:
This took some time because it seemed to get stuck here...
image.png.8e6daead1a3523e1cc8b900afa7cd25a.png
image.png.c19a6db188cb69e0216334afb2842ce7.png

Should I force this or is it best to wait? (I've attached Diagnostics)

unkudjo-diagnostics-20240731-1259.zip

Quote

July 31, 20241 yr

Community Expert

Disk1 is busy, also disk6, type reboot in the CLI, if it doesn't reboot after 5 minutes you will need to force it, then check filesystem.

Quote

August 1, 20241 yr

Author

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Quote

August 1, 20241 yr

Community Expert

Use -L

Quote

August 1, 20241 yr

Author

I ran the operation without the "-n" and with "-L". I think the operation is complete, because the read/write count has stopped updating

But everytime I try to load the page where I ran the check filesystem from, I get this, so I can't actually see what happened.

I pulled Diagnostics, but don't know where to look to see instructions for what's next in them. According to what I'm reading online, others that were in a similar situation would reboot and start the array in normal mode (instead of Maintenance Mode, like I had it in for the check filesystem operation). Is that what I do next?

Thank you very much for all of your help with this so far! I am very grateful.

unkudjo-diagnostics-20240801-0917.zip

Quote

August 1, 20241 yr

Community Expert

type reboot in the CLI then post new diags after array start.

Quote

August 1, 20241 yr

Author

understood. Doing it now.

Quote

August 1, 20241 yr

Author

Operation complete.

Oh, Thank you, thank you, thank you!

It looks like it worked!!!

I can browse the shares again and disk1 looks like it has all it's data again!

New diags posted.

Did we do it?!

unkudjo-diagnostics-20240801-0949.zip

Edited August 1, 20241 yr by Kudjo

Quote

August 1, 20241 yr

Community Expert

21 hours ago, JorgeB said:

Also check/replace cables for cache1

Disk1 looks OK, still seeing issues with cache1

Quote

August 1, 20241 yr

Author

I'll reseat it again and then change the cable if that doesn't help.

Thank you. I'll post a summary and mark it as the solution.

Thank you, @JorgeB.

Quote

August 1, 20241 yr

Author
Solution

After reviewing the diagnostics, I followed the instructions on how to check filesystem and first ran without the "-n".

Pulled diags again and then ran the check filesystem without the "-n" again, but included "-L" this time. Once the operation was complete (the WebGUI "Main" display no longer showed the disk read/writes increasing), we pulled diags again and then rebooted the machine and started the array again.

At this point, the shares had been restored and the data on the target disk was accessible again. No data loss as far as I can tell. We pulled diags one more time to verify that all looks good.

Thank you for all your help, @JorgeB!

Quote

1

Replaced HDD, rebuilt successfully, but then data disappeared and Shares disappeared.

Featured Replies

Solved by Kudjo

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)