PCIE to Sata failure -> Disk Disabled -> Emulated Disk "Unmountable: Wrong or no file system"


Anon
Go to solution Solved by JorgeB,

Recommended Posts

Hi,

 

I have currently lost access to parts of my data and hope you guys can help me save my data.

 

TLDR: Disk should be healthy but parity might have some slight errors resulting in emulated content being unmountable

 

Scenario:

  • I run 2 Unraid Servers via my ESXI Server
  • This morning I wanted to finally add a new 18TB Drive to my Unraid server. For that I added the sata cable to a new PCIE card I inserted together with some other disks to switch from a different PCIE to Sata card that had caused problems in the past when trying to add this 18TB Drive. (Full on Crash of the Host Server)
  • I passed the disk through to my Secondary Test Unraid Server for pre clearing
  • Disk available. To test all is working well I mount it with unassigned devices with a filesystem and copy ~50GB of stuff onto it to be sure its working just fine. All goes well
  • I start the Preclear. All seems well.
  • Suddenly my main Unraid Server is reporting problems and get the following 2 Mails:
    • Event: Unraid Disk 7 SMART health [85]

      Subject: Warning [TOWER] - unknown attribute (failing now) is 93824992236885

      Description: WDC_WD140EDGZ-11B1PA0_Y6GVG5JC (sdg)

    • Event: Unraid array errors

      Subject: Warning [TOWER] - array has errors

      Description: Array has 1 disk with read errors

      Importance: warning

      Disk 1 - WDC_WD120EDAZ-11F3RA0_8CK6E17F (sdd) (errors 397)

  • While checking Unraid I notice 100% CPU usage.

  • Disk 1 gets disabled. Has around ~800 Errors in Unraid UI, Disk 7 has around 600 Errors

  • I shut the Unraid Server + whole ESXI server down shortly after and do some experimenting while booting my ESXI a few times and decide to go back to my old PCIE Card

  • I disconnect the 18TB fully. I put in the old PCIE to Sata card and connect the drives again.

  • I boot all up again and don't notice that at this moment already the emulated disk was probably not readable

  • I remove the disabled disk, start array, stop it immediately, add it again

  • Now I start the array to start the rebuild on the same disk.

  • After around 1TB of the 12TB rebuild that was happening on the side without any Read or Write Errors I notice the "Unmountable" Error. After reading up on it I verify that this is not good . I also notice all files on the drive are not available due to this..

  • I stop the array. Restart the Unraid server. Remove the Disk again and just start it with a missing disk to see if it still shows Unmountable.

  • As it still shows unmountable I left it there while making sure no Processes or Dockers were running to further damage anything and wrote this post.

 

My personal interpretation:

  • Both Disk 7 and 1 are totally fine phyiscally and any small smart value change will have been caused by the PCIE to Sata Controller Error
  • As Both Disk 7 and 1 had errors I assume the file system of the emulated disk is getting jumbled.
  • I think 99.9% of Parity and 99.9% of the Disks content should be fine.
  • I want to be sure that anything I do does not make the situation worse so I am waiting on messages here before trying anything further.
  • I have a backup of the most critical data. But there is still a good chunk of data on that drive that would be very bad to loose.

 

I hope I can still get some support on this topic even if I am using a ESXI. I do not expect any help on the problems relating to the ESXI itself but just wanted to be transparent about what happened. Basically just a isolated view on data recovery in this scenario with me figuring out by myself why this was caused in the first place.

 

 

tower-diagnostics-20230322-1902.zip

Edited by Anon
Link to comment

@JorgeBThanks so much for your quick response.

I read through the link you sent and due to my File System being XFS I ran the check with the parameters -nv.

 

I got the following output:

image.png.3247be00bcbf4175694a6f4d903c9134.png

 

If I understand this correctly the command says it would write a modified primary superblock.

Is it correct that the way to move forward from here is to execute the Check again with the "-v" parameter alone?

 

Information that are linked to the shown error:

image.png.2d72b28b3ed1bf6a303121d902cad6ee.png

 

I apologize in advance for asking any small step but I just want to be absolutely certain I do not damage the files that might still be on the disk.

 

Link to comment

Thanks once again you are awesome.

I ran the -v and got this:

image.thumb.png.623d38c741e755e545657b2e8ad6d437.png

 

As you already correctly predicted this message I am now to not follow the advice of the "mount, replay the log, unmount it" but rather run it directly with "-vL"

For other people in the future link to a different topic that explains why its okay to run "-L" to delete the log:

 

I am gonna run that now and report back afterwards

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.