Drives Randomly Dropping


Recommended Posts

Hi All, 

 

Been having some real trouble with UnRaid recently and could use some help. I built a new system recently - Ryzen 3900x, with 3x 4TB drives, 1x NVME SSD Cache. (Full Diagnostics are attached.) This had been up and running fine for about 2 weeks before these issues started. All 3 disks were in my previous NAS 2 weeks ago working beautifully, and I ran extended smart tests on all last week that came clean, except for a 1 count UDMA CRC error. The array was about 3TB full as of yesterday. 

 

Disk SDD had a red X next to it yesterday. Previously this disk had no errors, so I attempted to restore the array, hoping for a quick fluke in the write process that had disabled it. I did a lot of troubleshooting steps, I'm trying to include as much information as I can remember.

 

Here's how I went from there:

 

Shutdown system. Verified all solid SATA connections and power connections. 

Rebooted. New config with disk SDB and SDD. That config ran overnight, all SMART results were good, and parity disk SDB rebuilt with SDD data and everything worked.

So I had a good array with a cache, partity drive (SDB) and data drive (SDD). All green, 0 errors. 

My other disk, disk SDC was showing up as problematic. I ran XFS_repair, it couldn't help with anything. Bad superblock, bad magic number, everything. Since that disk didn't actually have data on it, I deleted the partition and was going to have unraid reformat. 

 

SDC Showed up as unmountable, no filesystem, and there was the format unmountable disks checkbox. It started formatting SDC, and then the array crashed. Disk SDC never got formatted and still showed up as bad, and Disk SDD had a red X next to it. Trying to redo the configuration, it just seems like everything is..breaking. Trying to reconfigure the array, all my disk's disappeared (screenshot attached). On reboot, the disappeared disks come back, but trying to reconfig the array, they all disappear when I try to use the dropdown menu and reassign.

 

I'm not sure how to troubleshoot anymore. This problem came out of the blue, and all the disk's show up in bios everytime. If I can restore the array without losing the data on SDD that would be great, but it's not the end of the world if not. 

 

 

What's my next step? I appreciate all the help.

 

Sorry if this is all over the place, I've been troubleshooting this nonstop for hours. You guys know how it gets. Let me know if you need any other information!

 

 

 

 

 

Unraid Disk Missing.jpg

atlantis-diagnostics-20200415-1023.zip

Edited by akashb1
grammar
Link to comment

On reboot, the missing drives show again. Screenshot attached.

 

Theoretically, all my data is on the disk SDD, with an accurate parity in SDB. Any suggestions?

SDC was the unformatted disk that caused the red X to appear next to SDD when I tried to reformat it. 

 

I'm not going to rebuild or take any steps from here until I get some advice. Thanks!

Unraid Disks Come Back.jpg

Edited by akashb1
add
Link to comment

Haven't looked at diagnostics yet. But I notice you keep referring to disks as SDx. That is not a useful way to identify disks, as those designations can change on reboot, and are especially likely to change if you add or remove disks.

 

Unraid tracks disk assignments by disk serial number. I am concerned that you might or may have already confused your parity and data disks since you have been looking at their SDx designations.

Link to comment

More useful to refer to them with some unique portion of serial number when discussing them. The last 4 characters are usually sufficient.

 

Diagnostics indicate connection problems with all disks. Likely controller or power problem. No SMART in those diagnostics for any disk since they were disconnected. Post new diagnostics.

Link to comment

I think I found my problem. 

 

I had been using a plugin called VFIO-PCI CFG to interrupt my GPU with all it's audio and USB devices, as well as a USB card, for my Win10 VM.

 

I reviewed the settings page for that page by chance and noticed the SATA controller was checked. I unchecked and things are working much better now - currently rebuilding array with no loss of data. 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.