Joshewing02 Posted September 6, 2020 Share Posted September 6, 2020 Had a 6TB WD Gold Hard drive fail on me and give me a read error and disabled itself. Not exactly sure what happened. I had a spare backup new 10TB that I threw in the system to replace it. When I started the array it shows (2) of my other drives and the new 10 TB all as "Unmountable : No File System). Its already starting the Parity Sync : Data Rebuild. Not sure what my options are now to get these back online. The 2 drives are my main Movie and TV show Shares for my plex server. I have not started any dockers or VM's and am just letting this thing go at this point. Attached are my diagnostics. ewing-diagnostics-20200905-2141.zip Quote Link to comment
JorgeB Posted September 6, 2020 Share Posted September 6, 2020 Not normal to get so many unmountable disks, did you by any chance save the diags when the disk got disable? They might provide some clues, but for now when the rebuild finishes check filesystem on all the unmountable disks: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui Quote Link to comment
Joshewing02 Posted September 6, 2020 Author Share Posted September 6, 2020 Thanks. Just woke up to find that is says Disk Read error on Disk 5 again even though it is brand new. Not really sure where to go from here. Going to swap out the Sata Cables this morning. I also have a HBA controller I can throw in later I guess. Quote Link to comment
JorgeB Posted September 6, 2020 Share Posted September 6, 2020 2 minutes ago, Joshewing02 said: Not really sure where to go from here. Post the diags before rebooting, it might show the problem. Quote Link to comment
Joshewing02 Posted September 6, 2020 Author Share Posted September 6, 2020 Got it. I had already shut down. Will keep you posted. Quote Link to comment
Joshewing02 Posted September 6, 2020 Author Share Posted September 6, 2020 @JorgeB Thanks for the pointers on fixing the XFS format. I got it back and running at least as mountable. I installed all new SATA cables to the motherboard SATA ports for all drives. Then I also booted into maintenance mode and Terminal SSH in and ran the following for the NEW 10TB drive. # Destory all partition data structures (just be to extra sure) sgdisk -Z /dev/md5 # Create new UnRAID Partition parted /dev/md5 --script -- mklabel gpt parted -a optimal /dev/md5 --script -- mkpart primary xfs 0% 100% # Format XFS mkfs.xfs -f /dev/md5 Started the array again and now it is Parity Sync Data Rebuild. If it fails again I will grab the Diags. I am not going to start anything and just let it run at this point. This all happened when transferring a file onto that disk. I'm a little worried the file was corrupt and couldn't write correctly and now every time it tries to put that file back on it getting the same problem. I don't know that may just be paranoia as it had just finished moving the file the my Share folder and then I got the disabled alert. Quote Link to comment
Joshewing02 Posted September 6, 2020 Author Share Posted September 6, 2020 One other thing I should note is that the file I was copying to my "VM Storage" share located only on disk 5 was the VGA Bios i had just dumped following space invade one's tutorial. Everything was fine until I moved it from my Root Folder into a different sub folder. It was not but 5 min after that I go the original Disk 5 disabled note read/write fail. Quote Link to comment
Joshewing02 Posted September 6, 2020 Author Share Posted September 6, 2020 30% complete and going well. I have not started anything and am just letting it do its thing. One thing to note is that I did have a Share folder called "VM Storage". I noticed even though disk 5 is "emulated" currently. I cannot see my "VM Storage" share anymore. I do however have a new share called "lost+found". Any thoughts what this could be? The fact that I got a "Disk 5 Read error" not just on the 6tb existing drive but again on the 10tb makes me feel the issue is not the drive. Now that I already assigned the 10TB to disk 5's space I know I cant go back and put the 6tb in but can I plug the sata drive into another PC and pull the data off this 6TB drive? Just incase the parity rebuild doesn't rebuild the VM Storage Data? Quote Link to comment
trurl Posted September 6, 2020 Share Posted September 6, 2020 lost+found is where filesystem repair puts the data for files it can't figure out. Typically these will be difficult to use again, especially if there are a lot of them. The names of the files and the folders they belonged to is unknown. Quote Link to comment
Joshewing02 Posted September 7, 2020 Author Share Posted September 7, 2020 Well this totally sucks. Failed again at 57% this time. Grabbed the diagnostics attached before shutting down. Where to go from here???? ewing-diagnostics-20200906-1756.zip Quote Link to comment
trurl Posted September 7, 2020 Share Posted September 7, 2020 Problems writing disk5 and reading multiple disks. How are these connected? Quote Link to comment
Joshewing02 Posted September 7, 2020 Author Share Posted September 7, 2020 All Sata drives are connected to the onboard sata ports on my new Asus Zenith II Extreme Alpha. I don't have. I honestly feel like there is (1) corrupt File which is causing all this. when trying to recover. Also noticed disk 4 had a ton of errors now. Hopefully you can see from the diagnostic file. Quote Link to comment
Joshewing02 Posted September 7, 2020 Author Share Posted September 7, 2020 After reboot I have the attached I copied as well. Parity Read:Check History.xlsx Quote Link to comment
trurl Posted September 7, 2020 Share Posted September 7, 2020 43 minutes ago, Joshewing02 said: I honestly feel like there is (1) corrupt File which is causing all this. Not possible. If there was a corrupt file, it would be confined to a single disk, and wouldn't affect anything else on that disk or others. Each disk is an independent filesystem and each file exists completely on a single disk. These are hardware problems. Are you sure you have good power to all disks? Do all SATA cables have enough slack so the connections don't have anything pulling on them? Are you bundling SATA cables in attempt to make things "neat"? Quote Link to comment
trurl Posted September 7, 2020 Share Posted September 7, 2020 48 minutes ago, Joshewing02 said: the onboard sata ports on my new Asus Zenith II Extreme Alpha Looks like there are multiple controllers on that motherboard. Do you have any spare ports? Quote Link to comment
Joshewing02 Posted September 7, 2020 Author Share Posted September 7, 2020 I don't. I do however have a NEW - LSI Broadcom SAS 9300-8i 8-port 12Gb/s SATA+SAS PCI-Express 3.0 Low Profile Host Bus Adapter and also a LSI 9211-8i P20 IT Mode for ZFS FreeNAS unRAID Dell H310 6Gbps SAS HBA I have not put in the system. Also just took the 6TB and rebooted and left it in Unassigned devices. Booted my windows VM un unraid and shared the contents. All was well and I could access the drive and then I tried to delete the BIOS file I was suspecting of being corrupt. File said deleted but then I lost all connection to the drive. Would not remount in unassigned devices. Going to see if I can grab the data off it and leave that file alone. you suggest me trying either of the HBAs? Quote Link to comment
Joshewing02 Posted September 7, 2020 Author Share Posted September 7, 2020 Also replaced all SATA cables this morning not necessarily trying to make neat. I'll grab a pic in a few min. I actually have (1) spare spot on the motherboard so I can swap to that. Quote Link to comment
Joshewing02 Posted September 7, 2020 Author Share Posted September 7, 2020 After reboot again tried mounting the drive and noticed the following since it would not mount again. Pic of monitor. then I threw in the IT Mode HBA cards and some new cables again. I’ll try rebuilding this way. Quote Link to comment
JorgeB Posted September 7, 2020 Share Posted September 7, 2020 It's a problem with the onboard SATA controller, quite common on AMD boards, there are reports that running the latest beta helps due to newer kernel, disabling IOMMU should also help if not needed. Sep 6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00000 flags=0x0020] Sep 6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00180 flags=0x0020] Sep 6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00280 flags=0x0020] Sep 6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00300 flags=0x0020] Sep 6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00400 flags=0x0020] Sep 6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00480 flags=0x0020] Sep 6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00580 flags=0x0020] Sep 6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00680 flags=0x0020] Sep 6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00700 flags=0x0020] Sep 6 16:48:22 Ewing kernel: ahci 0000:46:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000007fffff00800 flags=0x0020] Sep 6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00880 flags=0x0020] Sep 6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00980 flags=0x0020] Sep 6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00a00 flags=0x0020] Sep 6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00b00 flags=0x0020] Sep 6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00c00 flags=0x0020] Sep 6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00d00 flags=0x0020] Sep 6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00e00 flags=0x0020] Sep 6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00f00 flags=0x0020] Sep 6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff00f80 flags=0x0020] Sep 6 16:48:22 Ewing kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=46:00.0 domain=0x0000 address=0x000007fffff01000 flags=0x0020] Quote Link to comment
Joshewing02 Posted September 7, 2020 Author Share Posted September 7, 2020 Thanks for this info. I here is to hoping this LSI HBA Card solves the issue. It’s currently at 72% so it’s moving at least. I’m thinking due to all the issues I have been having with this new setup I may just turn it into a TR Gaming Machine on its own for my kids to share. I’ll go back to a duel XEON setup for the server since that ran great since day 1. @JorgeB in regards to the iommu groups I did notice that almost every single item was broken out in its own group. Made it very difficult to pass Controllers through to my bare metal VM. I appreciate that. Maybe I’ll play around with those settings first. I'm getting ahead of myself. Let’s see how this recovery goes first! Quote Link to comment
Joshewing02 Posted September 7, 2020 Author Share Posted September 7, 2020 Well it finished successfully. However I did appear to lose my "VM Storage" Share on the rebuild. But I can grab that off my Virtual machine and get the data off of the 6TB Drive I'm hoping. Just upgraded to 3.9 Beta 25. IOMMU groups are WAY better this way. Hoping I can make this thing stable enough to use. First time with AMD and its been a rough go. Thank you. 1 Quote Link to comment
Joshewing02 Posted September 8, 2020 Author Share Posted September 8, 2020 Okay now for the next issue. It appears that when my disks were "Unmountable no file system" It is missing some movies now. In plex they show as "Unavailable" not all but some. Lost and found appears to just be TV shows that also went missing. Is there away to force a scan on the other disks and see what it can recover? Quote Link to comment
trurl Posted September 8, 2020 Share Posted September 8, 2020 3 minutes ago, Joshewing02 said: Is there away to force a scan on the other disks and see what it can recover? You can check the filesystem on any disk using the same method you used before. Other than that don't know what you mean by a "scan". There isn't any database of your files or anything like that, there are just folders and files. Quote Link to comment
Joshewing02 Posted September 8, 2020 Author Share Posted September 8, 2020 Yea not really sure what I mean either since I haven't dealt with this. I wasn't sure how the lost and found folder even came to be so I was wondering if I could force whatever I did previously to get those. I also have not made it through every sub folder in Lost and Found yet to put them back where they go. Quote Link to comment
trurl Posted September 8, 2020 Share Posted September 8, 2020 2 minutes ago, Joshewing02 said: I wasn't sure how the lost and found folder even came to be On 9/6/2020 at 4:39 PM, trurl said: lost+found is where filesystem repair puts the data for files it can't figure out. Typically these will be difficult to use again, especially if there are a lot of them. The names of the files and the folders they belonged to is unknown. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.