SSD cache disk throwing lots of errors: disk or cabling issue?


Recommended Posts

My server underwent lots of changes lately, mostly due to the virtualization opportunity.

I got myself a SSD cache disk and for the first days everything went well. It was directly attached to the mb controller. Then I decided to make it hot-swappable, by using a SSD converter and slapped eveything into my trayless HDD 5x3 cage where the array disks have been for some time now.

 

When I assembled everything back and started the server again the cache disk was detected, the array could start and everything seemed well. I tinkered with my virtual machine, installed software and everything till the day before yesterday.

 


 

Yesterday I wanted to use my vm and although the windows desktop was visible I could not interact with anything. The vm seemed half frozen. I decided to take a look at the syslog and found it flooded with error messages regarding my cache disk.

 

I decided to turn the vm off and try to understand what went wrong (see diag1.zip).

After many minutes the vm would not shut down gracefully so I killed it.

 

I stopped the array and turned the server off. I thought the problem was the SSD converter the cache disk was in.

I removed the disk and reattached it directly to the motherboard.

I started the server again.

The SSD disk was there, but the fs was now corrupt... I imagined that. I removed it as cache drive since it said unmountable and presented me with the option of formatting it...

I was able to get it back with btrfschk , I then mounted it manually and copied everything to my array (it worked and the data seemed OK to me).

I unmounted the disk and the ran btrfs check --repair.

It worked again. Therefore I set the disk as cache drive again. unRAID read it correctly.

 

At this point I really wanted to make sure it was the SSD converter, so I stopped the server and put the disk in the converter and again in the cage.

I booted up the server, unRAID was OK with it.

I decided to test the cache disk by converting a raw image file to qcow2 from the array to it. It wrote 3.5GB and then froze: errors again... I thought it definitively was the SSD converter.

 

I stopped everything again and attached the disk back directly to the mb and tried the image conversion copy again.

This time it didn't even start... errors again (see diag2.zip).

 

So, now I don't know what to think. Is it the file system? Has the disk gone bad? The SATA cable? The mb port? Can you tell from the logs?

 

I attached 2 sets of logs.

diag1.zip

diag2.zip

Link to comment

I don't know why but it turns out the SSD is unstable when attached to the 6Gb primary mobo sata controller.

If I move the SSD on one of the 8 ports of my Supermicro 3Gb controller it works OK.

If I move a regular HD on the same channel the SSD was throwing a gazillion errors before, the HD works just fine.

 

So, I can't use my SSD at its full speed with unRAID. Why? It makes me lose confidence in the overall stability of the system.

I'm too old for this crap.

 

 

Link to comment

I swapped the cable, I did every possible test.

As I wrote what I know is that a drive platter has no problem using the same channel and cable.

Could it really be that a cable makes the difference when it comes to SSD, despite the use of a cables tested and used for years?

 

All I can say is the cable had worked for a long time, and likely had worked before on a different disk as I tend to hoard them and only toss if proven faulty. If you have a brand new cable try that.

Link to comment

I swapped the cable, I did every possible test.

As I wrote what I know is that a drive platter has no problem using the same channel and cable.

Could it really be that a cable makes the difference when it comes to SSD, despite the use of a cables tested and used for years?

 

All I can say is the cable had worked for a long time, and likely had worked before on a different disk as I tend to hoard them and only toss if proven faulty. If you have a brand new cable try that.

 

I can confirm it definitively was the sata cable. I had 4 different sata cables and all of them proved to be unreliable, at least for a SATA 3.1 device.

 

What was left was a couple of errors that, elsewhere in this forum, were considered negligible.

However, since I was still on time to have the SSD replaced I swapped it for a SandDisk Extreme Pro. The log is now clean with this one.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.