Jump to content

NVMe drive disappeared, cache pool now broken


Go to solution Solved by JorgeB,

Recommended Posts

I upgraded my server today -- Replaced the motherboard, CPU, and added additional RAM. When i came home i shut down UNRAID gracefully, however after a full graceful shutdown my server LED was still on and the fans were running. I checked the monitor connected to my server and it was completely blank and unresponsive. After waiting 15 minutes i force shut down the server by holding the power button.

 

I upgraded the server and put everything back together, booted into BIOS, updated to the latest firmware as of today (Gigabyte Z690 Gaming X - BIOS version F22), enabled virtualisation on the CPU, turned on XMP, and booted into UNRAID.

 

Unfortunately 1 of my 4 cache drives was not detected. I reseated everything, rebooted multiple times, but it's simply not detected, even in BIOS. I'm wondering if the forced shutdown corrupted the drive's firmware.

 

I inserted the drive into an NVMe caddy and connected it to my Windows PC and it's unable to read the drive, stating that it is uninitialized. I haven't attempted to initialize it because i don't want to nuke the data that's on it. I connected a known working NVMe from UNRAID to the same caddy and it was immediately detected and showed a healthy 1TB partition in Windows.

 

I've tried cleaning the gold contacts on the NVMe with electrical PCB cleaner as well, but that did not work.

 

My cache was set up as a 4 drive BTRFS pool (balanced? striped? not sure of the terminology)

 

I attempted following advice from some of the moderators on this forum after hours of searching, including starting the array with no cache drives to forget the config then starting it again with the drives, but unfortunately my cache drives/shares are not mountable. I got a bunch of errors from UNRAID about missing shares and files, and my Docker containers and VM's are all missing.

 

My "pool devices" was set to 4 devices, and UNRAID was complaining that a drive was missing / config not valid. I made the mistake of clicking "3 slots", then it immediately dropped the drive from the configuration. When i selected "4" slots again, UNRAID was no longer complaining about invalid configuration.

 

image.png.dde979cac805c9d802b2c765cb70c336.png

 

image.thumb.png.12a508f40488eedca6be76cee168552d.png

 

I'm unsure how a BTRFS pool works, but if it is striped then i assume my data is not recoverable.

 

I am willing to pay for data recovery from a local reputable data recovery company that specialises in flash storage, however i want to know:

 

A) If anything is currently salvageable. My hope is that the 250GB virtual machine lived on one SSD and that i'd be able to recover it by mounting that SSD and copying the virtual disk file. My docker containers are backed up, all i care about is my virtual machine.

 

B) If i do take the NVMe to a data recovery company and they manage to fix it, have I completely nuked my cache drive by starting it with 3 drives instead of 4? I DID NOT choose the option to format when starting the array.

 

Diagnostics attached.

 

Thank you in advance.

devoraid-diagnostics-20230205-2026.zip

Edited by LumpyCustard
Link to comment
3 minutes ago, LumpyCustard said:

create an entirely new pool, then assign the 3 alive SSD's to that new pool then start the array in maintenance mode?

No need for a new pool, just not a pool where it's detecting missing/replaced devices, you can start the array with all pool members unassigned, Unraid will forget the pool config, them stop the array, re-assign all 3 pool members to the same pool and start the array in normal mode.

Link to comment
  • Solution
Feb  5 23:10:31 devoraid kernel: BTRFS warning (device sdc1): chunk 4163153952768 missing 1 devices, max tolerance is 0 for writable mount

This means your pool was not redundant, so it cannot mount read/write with a missing device, you can manually mount it degraded and read only but most files will likely be damaged or incomplete, if you want to try see how to mount manually below:

 

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=543490

 

You'd need to use degraded,ro as mount options.

Link to comment

Just thought i'd update this post. I got my drive professionally repaired by a data recovery company -- they corrected an issue on the PCB of the m.2 SSD drive.

 

Despite starting and stopping the array dozens of times, creating new cache pools, swapping the drive order around in UNRAID and so on, when i inserted the drive back in and selected the drives, my cache pool came back to life and all the data was intact -- docker started, my VM's launched, etc.

 

I am in the process of breaking down my cache pool now -- i've purchased 4 new NVMe's and i'll be rebuilding it with redundancy as well as ditching the 2.5" drives i had mixed in with NVMe drives.

 

image.thumb.png.d78da145d798e590bf37f571837c4e85.png

Edited by LumpyCustard
  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...