Tyrrandion Posted May 10, 2018 Share Posted May 10, 2018 Hi, new user here, so bear with me. I'm having an issue when doing larger writes towards my array. What happens, from my understanding, is that the cache-drive cuts out/Hangs/spazzes out, causing the system trying to write on a none-existing drive. If I reboot, the system works alright, until I do another large transfer. Now, I've tried moving the cache-SSD from the LSI-card to the onboard sata-controller, different cables, both for the power and the SATA itself, but I cannot get it to work. Is the cache the issue? Is it my install that's incorrectly configured? Could the old PSU be the issue? I've just bought the license along with 2 4TB drives, so I'd rather not spend more money without knowing what causes the issues. I've attached the diagnostics, let me know if I need to provide any further information. Best regards. tower-diagnostics-20180510-2218.zip Link to comment
John_M Posted May 10, 2018 Share Posted May 10, 2018 Your cache disk dropped offline. Before that the controller reset the link several times and dropped the speed but couldn't maintain communication. This is usually caused by a cable fault (but you've changed cable). It could also be caused by a controller fault (but you've moved to a different controller). It could be the SSD itself, or... 1 hour ago, Tyrrandion said: Could the old PSU be the issue? Yes, it could be. What PSU are you using? Link to comment
Tyrrandion Posted May 11, 2018 Author Share Posted May 11, 2018 Hi John, I'm using quite an old psu. While it's technically not a noname, I've never heard of the brand. I guess this is probably the culprit? The ssd is bought second handed, but it got less than 10k hours on it, which from my understanding is quite alright. Best regard. Link to comment
John_M Posted May 11, 2018 Share Posted May 11, 2018 Hmmm. I think the biggest problem with that one is the fact that it has split 12 volt rails. I would not be happy using it. Only ones with a single high current 12 volt rail are recommending. A bad power supply is a false economy and can cause all sorts of otherwise unexplained problems. Have a look at the PSU Tier List and choose one from near the top. I use only Corsair AX, RMx and SF models, depending on the application. The power on hours of a disk or SSD are irrelevant as long as it is in good working order. Link to comment
Tyrrandion Posted May 11, 2018 Author Share Posted May 11, 2018 Hi, I see, thanks for the explanation. I'll look in to that list after work, my guess is that I have to order a PSU, since there's hardly any physical stores in Sweden selling these kinds of products. Hopefully the new PSU will fix the issue. Thanks again for the help! Link to comment
pwm Posted May 11, 2018 Share Posted May 11, 2018 2 hours ago, Tyrrandion said: my guess is that I have to order a PSU, since there's hardly any physical stores in Sweden selling these kinds of products. Net-on-net might have some local shop with Corsair RM or possibly HX or AX models. But all other chains I know about are limiting themselves to either unknown brands or lower-end model lines. And you might have some local store that happens to have some better PSU - but are likely to instead try to convince you that the models they have home are "just as good". In my view, you want brands and model lines that have proven themselves good. The sales peoples view can be "we have sold 10 and not heard any complaint (yet)" which is a very small statistical base and you don't even know if these 10 units are used 27/7/365 in server-class hardware. So in the end, mail order is in general your only option if you want to be able to select exactly what you want. Link to comment
Tyrrandion Posted May 11, 2018 Author Share Posted May 11, 2018 Hi pwm, Yes, I looked in to Netonnet, and they had 10 different models to choose from. I found a EVGA Supernova 650GQ, which seems to place in tier two on the list, so I'm thinking about picking one of those up. I really want to find one locally, weekend-builds are the best builds. Link to comment
Tyrrandion Posted May 11, 2018 Author Share Posted May 11, 2018 Welp, that did not solve it. Just installed the new power supply, and after less than 5 minutes it's broken again. I guess this means that the disk is broken? Is there something that can be done? I've attached the logs again, I don't know if that'll help. Right now I feel like just throwing the entire tower out the window. tower-diagnostics-20180511-1727.zip Link to comment
JorgeB Posted May 11, 2018 Share Posted May 11, 2018 Looks more like a SATA cable problem, if not done yet replace that first. Link to comment
Tyrrandion Posted May 11, 2018 Author Share Posted May 11, 2018 Thanks Johnnie, but I'm currently on the third SATA-cable. Link to comment
pwm Posted May 11, 2018 Share Posted May 11, 2018 If it isn't a power problem and not a cable problem then it's time to consider a temperature problem. Does the controller and the drive get enough ventilation so controller or drive doesn't get overheated? But unless I have missed something, you haven't posted any SMART data for the cache drive. Only logs where SMART isn't available after the drive has dropped out. Link to comment
Tyrrandion Posted May 11, 2018 Author Share Posted May 11, 2018 1 hour ago, pwm said: If it isn't a power problem and not a cable problem then it's time to consider a temperature problem. Does the controller and the drive get enough ventilation so controller or drive doesn't get overheated? But unless I have missed something, you haven't posted any SMART data for the cache drive. Only logs where SMART isn't available after the drive has dropped out. The disks hover around 30C, so I don't think the temperature is the issue, I've got two fans in the front and one in the bottom sucking in air, and two in the top back blowing out. You have certainly not missed anything, I have not posted any SMART data. And I was just out for a few hours, starting the "SMART short self-test" before I went out. ~2 hours, 20 minutes later, it's still at 10%, which I think we all can agree is not good... Link to comment
pwm Posted May 11, 2018 Share Posted May 11, 2018 You sure you selected short test? That's normally like a 2 minutes test. But even the extended test is normally quite fast for an SSD. Link to comment
Tyrrandion Posted May 11, 2018 Author Share Posted May 11, 2018 Afraid so. I aborted the test, tried again, and it's stuck at 10%. Edit: Left it on during the night. Still at 10%. Anyone got any other ideas? Edit 2: I'm trying to move all data off the cache-drive, but when I invoke the mover, the log says "May 12 10:07:33 Tower emhttpd: req (4): cmdStartMover=Move+now&csrf_token=***************May 12 10:07:33 Tower emhttpd: shcmd (141): /usr/local/sbin/mover &> /dev/null & " and the mover does not start, what's wrong now..? Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.