How to detect corrupted cache disk?


Recommended Posts

I've been chasing stability problems in my unraid system for almost a year now.  I have replaced almost every component of the system, but still get hangs on VMs, especially at load time and when detecting new devices.  I have one final theory:  The cache drive is unreliable.

 

My array is as follows:

Cache: Intel 1 TB M.2 SSD 

Disk 1 : 2 TB Western digital red

Disk 2 : 1 TB Western digital black (high performance) 

Parity:  2 TB western digital red

 

I noticed a while back that almost no data was actually consumed in either of the WD disks.  My VM HDD allocation was about 1.5 TB before I recently deleted and rebuilt my VMs.  I noticed that the cache was shown as fully used, but the array disks showed very low utilization.  

 

Yesterday, I decided to do a Windows 10 installation on the cache drive.  I deleted all the partitions form the windows installation  SW and installed it in the resulting unallocated space.  Things worked fine for the installation and upgrade to Win10 fall update.  Then I decided to plug in my other 256GB SSD with Win10 on it.  I figured I could mount the disk and re-install SW and drivers from its "downloads" directory.   The system hung at a black screen with spinning circles.  I told it to boot from the 256GB SSD.  Boots fine.  

 

So my thought is the SSD is actually bad, but I am not sure how to confirm this other than replacing the component.  Replacing a 1TB M2 SSD is not cheap, so I was hoping some diagnostics might help me determine if I need to or not.

 

--Harper

Link to comment

If the drive is bad, then it would normally give error messages in the log files - you might get read error, timeouts on commands etc.

 

And running the SMART diagnostics on the drive, you would normally see errors there too, unless it's just a cabling error that makes it hard for the drive to send data to the machine.

Link to comment

Your post is highly confusing. You have an SSD but you refer to your VM "HDD", which is larger than your cache SSD etc.

 

If I try to understand your post, it sounds like you have stuff from the VM written to the array (with parity)? It will always be slow writing to the array and reading will certainly be slower than an SSD.

Link to comment

When I refer to VM HDD, I mean the individual VM disk allocation size.  My theory was that an intermittent cache SSD caused occasional system hangs, and system hangs caused data corruption that wasn't caught by parity, since the hang occurred before parity was updated.  

 

Bad SSD was my last hope of a "fixable" problem with my system to run 2x VR sessions on VMs.  Intermittent hangs that happen at different times, regardless of whether PCI remapped or not... If the disks aren't bad, the only culprits left are Unraid, the KVM SW, and the actual virtualization tech in the chip.   

 

In any case, the system is running solidly on the former cache disk as a standalone Win10 gaming build.  I may give this one more try using my 256GB SATA SSD as new cache.  But I'm pretty tired of debugging this issue--its simply not converging.  Much as I like virtualization tech, I like platforms that don't hang even better.

 

--Brad

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.