Jump to content

Cache-disk drops?


Tyrrandion

Recommended Posts

Hi, new user here, so bear with me.


I'm having an issue when doing larger writes towards my array. What happens, from my understanding, is that the cache-drive cuts out/Hangs/spazzes out, causing the system trying to write on a none-existing drive. If I reboot, the system works alright, until I do another large transfer. Now, I've tried moving the cache-SSD from the LSI-card to the onboard sata-controller, different cables, both for the power and the SATA itself, but I cannot get it to work. Is the cache the issue? Is it my install that's incorrectly configured? Could the old PSU be the issue? I've just bought the license along with 2 4TB drives, so I'd rather not spend more money without knowing what causes the issues.

 

I've attached the diagnostics, let me know if I need to provide any further information.


Best regards.

tower-diagnostics-20180510-2218.zip

Link to comment

Your cache disk dropped offline. Before that the controller reset the link several times and dropped the speed but couldn't maintain communication. This is usually caused by a cable fault (but you've changed cable). It could also be caused by a controller fault (but you've moved to a different controller). It could be the SSD itself, or...

 

1 hour ago, Tyrrandion said:

Could the old PSU be the issue?

 

Yes, it could be. What PSU are you using?

Link to comment

Hi John,

 

I'm using quite an old psu. While it's technically not a noname, I've never heard of the brand. I guess this is probably the culprit? The ssd is bought second handed, but it got less than 10k hours on it, which from my understanding is quite alright. 

 

Best regard. 

20180511_063352.jpg

Link to comment

Hmmm. I think the biggest problem with that one is the fact that it has split 12 volt rails. I would not be happy using it. Only ones with a single high current 12 volt rail are recommending. A bad power supply is a false economy and can cause all sorts of otherwise unexplained problems. Have a look at the PSU Tier List and choose one from near the top. I use only Corsair AX, RMx and SF models, depending on the application.

 

The power on hours of a disk or SSD are irrelevant as long as it is in good working order.

Link to comment

Hi, 

 

I see, thanks for the explanation. I'll look in to that list after work, my guess is that I have to order a PSU, since there's hardly any physical stores in Sweden selling these kinds of products.

Hopefully the new PSU will fix the issue. Thanks again for the help! 

Link to comment
2 hours ago, Tyrrandion said:

my guess is that I have to order a PSU, since there's hardly any physical stores in Sweden selling these kinds of products.

 

Net-on-net might have some local shop with Corsair RM or possibly HX or AX models. But all other chains I know about are limiting themselves to either unknown brands or lower-end model lines. And you might have some local store that happens to have some better PSU - but are likely to instead try to convince you that the models they have home are "just as good". In my view, you want brands and model lines that have proven themselves good. The sales peoples view can be "we have sold 10 and not heard any complaint (yet)" which is a very small statistical base and you don't even know if these 10 units are used 27/7/365 in server-class hardware.

 

So in the end, mail order is in general your only option if you want to be able to select exactly what you want.

Link to comment

Hi pwm, 

Yes, I looked in to Netonnet, and they had 10 different models to choose from. I found a EVGA Supernova 650GQ, which seems to place in tier two on the list, so I'm thinking about picking one of those up. I really want to find one locally, weekend-builds are the best builds. 

Link to comment

If it isn't a power problem and not a cable problem then it's time to consider a temperature problem.

 

Does the controller and the drive get enough ventilation so controller or drive doesn't get overheated?

 

But unless I have missed something, you haven't posted any SMART data for the cache drive. Only logs where SMART isn't available after the drive has dropped out.

Link to comment
1 hour ago, pwm said:

If it isn't a power problem and not a cable problem then it's time to consider a temperature problem.

 

Does the controller and the drive get enough ventilation so controller or drive doesn't get overheated?

 

But unless I have missed something, you haven't posted any SMART data for the cache drive. Only logs where SMART isn't available after the drive has dropped out.

 

The disks hover around 30C, so I don't think the temperature is the issue, I've got two fans in the front and one in the bottom sucking in air, and two in the top back blowing out.

 

You have certainly not missed anything, I have not posted any SMART data. And I was just out for a few hours, starting the "SMART short self-test" before I went out. ~2 hours, 20 minutes later, it's still at 10%, which I think we all can agree is not good...

Link to comment

Afraid so. I aborted the test, tried again, and it's stuck at 10%.

 

Edit: Left it on during the night. Still at 10%. Anyone got any other ideas?

 

Edit 2: I'm trying to move all data off the cache-drive, but when I invoke the mover, the log says

"May 12 10:07:33 Tower emhttpd: req (4): cmdStartMover=Move+now&csrf_token=***************
May 12 10:07:33 Tower emhttpd: shcmd (141): /usr/local/sbin/mover &> /dev/null & "

and the mover does not start, what's wrong now..?

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...