PCIE to Sata failure -> Disk Disabled -> Emulated Disk "Unmountable: Wrong or no file system"

Anon · March 22, 2023

Hi,

I have currently lost access to parts of my data and hope you guys can help me save my data.

TLDR: Disk should be healthy but parity might have some slight errors resulting in emulated content being unmountable

Scenario:

I run 2 Unraid Servers via my ESXI Server
This morning I wanted to finally add a new 18TB Drive to my Unraid server. For that I added the sata cable to a new PCIE card I inserted together with some other disks to switch from a different PCIE to Sata card that had caused problems in the past when trying to add this 18TB Drive. (Full on Crash of the Host Server)
I passed the disk through to my Secondary Test Unraid Server for pre clearing
Disk available. To test all is working well I mount it with unassigned devices with a filesystem and copy ~50GB of stuff onto it to be sure its working just fine. All goes well
I start the Preclear. All seems well.
Suddenly my main Unraid Server is reporting problems and get the following 2 Mails:
- Event: Unraid Disk 7 SMART health [85]
  
  Subject: Warning [TOWER] - unknown attribute (failing now) is 93824992236885
  
  Description: WDC_WD140EDGZ-11B1PA0_Y6GVG5JC (sdg)
- Event: Unraid array errors
  
  Subject: Warning [TOWER] - array has errors
  
  Description: Array has 1 disk with read errors
  
  Importance: warning
  
  Disk 1 - WDC_WD120EDAZ-11F3RA0_8CK6E17F (sdd) (errors 397)
While checking Unraid I notice 100% CPU usage.
Disk 1 gets disabled. Has around ~800 Errors in Unraid UI, Disk 7 has around 600 Errors
I shut the Unraid Server + whole ESXI server down shortly after and do some experimenting while booting my ESXI a few times and decide to go back to my old PCIE Card
I disconnect the 18TB fully. I put in the old PCIE to Sata card and connect the drives again.
I boot all up again and don't notice that at this moment already the emulated disk was probably not readable
I remove the disabled disk, start array, stop it immediately, add it again
Now I start the array to start the rebuild on the same disk.
After around 1TB of the 12TB rebuild that was happening on the side without any Read or Write Errors I notice the "Unmountable" Error. After reading up on it I verify that this is not good . I also notice all files on the drive are not available due to this..
I stop the array. Restart the Unraid server. Remove the Disk again and just start it with a missing disk to see if it still shows Unmountable.
As it still shows unmountable I left it there while making sure no Processes or Dockers were running to further damage anything and wrote this post.

My personal interpretation:

Both Disk 7 and 1 are totally fine phyiscally and any small smart value change will have been caused by the PCIE to Sata Controller Error
As Both Disk 7 and 1 had errors I assume the file system of the emulated disk is getting jumbled.
I think 99.9% of Parity and 99.9% of the Disks content should be fine.
I want to be sure that anything I do does not make the situation worse so I am waiting on messages here before trying anything further.
I have a backup of the most critical data. But there is still a good chunk of data on that drive that would be very bad to loose.

I hope I can still get some support on this topic even if I am using a ESXI. I do not expect any help on the problems relating to the ESXI itself but just wanted to be transparent about what happened. Basically just a isolated view on data recovery in this scenario with me figuring out by myself why this was caused in the first place.

tower-diagnostics-20230322-1902.zip

Edited March 22, 2023 by Anon

JorgeB · March 22, 2023

Ideally you do this before rebuilding on top, but check filesystem on disk1.

Anon · March 22, 2023

@JorgeBThanks so much for your quick response.

I read through the link you sent and due to my File System being XFS I ran the check with the parameters -nv.

I got the following output:

image.png.3247be00bcbf4175694a6f4d903c9134.png

If I understand this correctly the command says it would write a modified primary superblock.

Is it correct that the way to move forward from here is to execute the Check again with the "-v" parameter alone?

Information that are linked to the shown error:

image.png.2d72b28b3ed1bf6a303121d902cad6ee.png

I apologize in advance for asking any small step but I just want to be absolutely certain I do not damage the files that might still be on the disk.

JorgeB · March 22, 2023

5 minutes ago, Anon said:

Is it correct that the way to move forward from here is to execute the Check again with the "-v" parameter alone?

Correct, and if it asks for -L use it.

Anon · March 22, 2023

Thanks once again you are awesome.

I ran the -v and got this:

As you already correctly predicted this message I am now to not follow the advice of the "mount, replay the log, unmount it" but rather run it directly with "-vL"

For other people in the future link to a different topic that explains why its okay to run "-L" to delete the log:

I am gonna run that now and report back afterwards

Anon · March 22, 2023

Ran the command sucessfully. For documentation if other users ever need it:

However the disk still shows like this:

@JorgeBIs this expected? Can i just follow the Guide you sent me and stop the array and start without maintenance?

JorgeB · March 22, 2023

You have to restart the array in normal mode.

Anon · March 22, 2023

You are a absolute Hero. THANKS! Files seem to be there on first glance. I will go check it a bit more to be sure.

Next step would be to stop array again and then rebuild onto my disk.

PCIE to Sata failure -> Disk Disabled -> Emulated Disk "Unmountable: Wrong or no file system"

Recommended Posts

Anon

Link to comment

JorgeB

Link to comment

Anon

Link to comment

JorgeB

Link to comment

Anon

Link to comment

Anon

Link to comment

JorgeB

Link to comment

Anon

Link to comment

Join the conversation