Help!!! Failed Hard Drive. Now others show "Unmountable : No File System"


Recommended Posts

Had a 6TB WD Gold Hard drive fail on me and give me a read error and disabled itself.  Not exactly sure what happened.  I had a spare backup new 10TB that I threw in the system to replace it.  When I started the array it shows (2) of my other drives and the new 10 TB all as "Unmountable : No File System).  Its already starting the Parity Sync : Data Rebuild.  Not sure what my options are now to get these back online.  The 2 drives are my main Movie and TV show Shares for my plex server.  I have not started any dockers or VM's and am just letting this thing  go at this point.  Attached are my diagnostics.

 

Screen Shot 2020-09-05 at 9.42.31 PM.png

Screen Shot 2020-09-05 at 9.38.36 PM.png

ewing-diagnostics-20200905-2141.zip

Link to comment

@JorgeB Thanks for the pointers on fixing the XFS format.  I got it back and running at least as mountable.  I installed all new SATA cables to the motherboard SATA ports for all drives.  Then I also booted into maintenance mode and Terminal SSH in and ran the following for the NEW 10TB drive.  

 

# Destory all partition data structures (just be to extra sure)
sgdisk -Z /dev/md5

 

# Create new UnRAID Partition
parted /dev/md5 --script -- mklabel gpt
parted -a optimal /dev/md5 --script -- mkpart primary xfs 0% 100%

 

# Format XFS
mkfs.xfs -f /dev/md5

 

Started the array again and now it is Parity Sync Data Rebuild.  If it fails again I will grab the Diags.  I am not going to start anything and just let it run at this point.  This all happened when transferring a file onto that disk.  I'm a little worried the file was corrupt and couldn't write correctly and now every time it tries to put that file back on it getting the same problem.  I don't know that may just be paranoia as it had just finished moving the file the my Share folder and then I got the disabled alert.

Screen Shot 2020-09-06 at 9.01.30 AM.png

Link to comment

One other thing I should note is that the file I was copying to my "VM Storage" share located only on disk 5 was the VGA Bios i had just dumped following space invade one's tutorial.   Everything was fine until I moved it from my Root Folder into a different sub folder.  It was not but 5 min after that I go the original Disk 5 disabled note read/write fail.

Link to comment

30% complete and going well.  I have not started anything and am just letting it do its thing.  One thing to note is that I did have a Share folder called "VM Storage".  I noticed even though disk 5 is "emulated" currently.  I cannot see my "VM Storage" share anymore.  I do however have a new share called "lost+found".  Any thoughts what this could be?  The fact that I got a "Disk 5 Read error" not just on the 6tb existing drive but again on the 10tb makes me feel the issue is not the drive.  Now that I already assigned the 10TB to disk 5's space I know I cant go back and put the 6tb in but can I plug the sata drive into another PC and pull the data off this 6TB drive?  Just incase the parity rebuild doesn't rebuild the VM Storage Data? 

Screen Shot 2020-09-06 at 12.52.56 PM.png

Link to comment
43 minutes ago, Joshewing02 said:

I honestly feel like there is (1) corrupt File which is causing all this.

Not possible. If there was a corrupt file, it would be confined to a single disk, and wouldn't affect anything else on that disk or others. Each disk is an independent filesystem and each file exists completely on a single disk.

 

These are hardware problems. Are you sure you have good power to all disks? Do all SATA cables have enough slack so the connections don't have anything pulling on them? Are you bundling SATA cables in attempt to make things "neat"?

Link to comment

I don't.  I do however have a 

NEW - LSI Broadcom SAS 9300-8i 8-port 12Gb/s SATA+SAS PCI-Express 3.0 Low Profile Host Bus Adapter 

and also a 

LSI 9211-8i P20 IT Mode for ZFS FreeNAS unRAID Dell H310 6Gbps SAS HBA

I have not put in the system.

 

Also just took the 6TB and rebooted and left it in Unassigned devices.  Booted my windows VM un unraid and shared the contents.  All was well and I could access the drive and then I tried to delete the BIOS file I was suspecting of being corrupt.  File said deleted but then I lost all connection to the drive.  Would not remount in unassigned devices.  Going to see if I can grab the data off it and leave that file alone.  you suggest me trying either of the HBAs?

Link to comment

It's a problem with the onboard SATA controller, quite common on AMD boards, there are reports that running the latest beta helps due to newer kernel, disabling IOMMU should also help if not needed.

 

Sep  6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00000 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00180 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00280 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00300 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00400 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00480 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00580 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00680 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00700 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00800 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00880 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00980 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00a00 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00b00 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00c00 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00d00 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00e00 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00f00 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00f80 flags=0x0020]
Sep  6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff01000 flags=0x0020]

 

Link to comment

Thanks for this info.  I here is to hoping this LSI HBA Card solves the issue.  It’s currently at 72% so it’s moving at least.  I’m thinking due to all the issues I have been having with this new setup I may just turn it into a TR Gaming Machine on its own for my kids to share.  I’ll go back to a duel XEON setup for the server since that ran great since day 1.

 

@JorgeB in regards to the iommu groups I did notice that almost every single item was broken out in its own group.  Made it very difficult to pass Controllers through to my bare metal VM.  I appreciate that.  Maybe I’ll play around with those settings first.

 

I'm getting ahead of myself.  Let’s see how this recovery goes first!

Link to comment

Well it finished successfully.  However I did appear to lose my "VM Storage" Share on the rebuild.  But I can grab that off my Virtual machine and get the data off of the 6TB Drive I'm hoping.  Just upgraded to 3.9 Beta 25.  IOMMU groups are WAY better this way.  Hoping I can make this thing stable enough to use.  First time with AMD and its been a rough go.  Thank you.

  • Like 1
Link to comment

Okay now for the next issue.  It appears that when my disks were "Unmountable no file system" It is missing some movies now.  In plex they show as "Unavailable"  not all but some.  Lost and found appears to just be TV shows that also went missing.  Is there away to force a scan on the other disks and see what it can recover?

Link to comment
3 minutes ago, Joshewing02 said:

Is there away to force a scan on the other disks and see what it can recover?

You can check the filesystem on any disk using the same method you used before. Other than that don't know what you mean by a "scan". There isn't any database of your files or anything like that, there are just folders and files.

Link to comment

Yea not really sure what I mean either since I haven't dealt with this.  I wasn't sure how the lost and found folder even came to be so I was wondering if I could force whatever I did previously to get those.  I also have not made it through every sub folder in Lost and Found yet to put them back where they go.

Link to comment
2 minutes ago, Joshewing02 said:

I wasn't sure how the lost and found folder even came to be

On 9/6/2020 at 4:39 PM, trurl said:

lost+found is where filesystem repair puts the data for files it can't figure out. Typically these will be difficult to use again, especially if there are a lot of them. The names of the files and the folders they belonged to is unknown.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.