demonmaestro Posted August 23, 2015 Share Posted August 23, 2015 I keep on having issues with this darn server left and right. It seems to work and I will put data on the drives. Then next thing I know I have 6 drives "Go bad". So I check them via a smart report and they come back good. So I clean the redball and add them back to the set. And do a parity rebuild. Next thing I know half my darn data is missing/corrupted. What the heck do I do to get this server acting correctly so I stop loosing data. I have LOST at least 10TB worth of data. http://lilnetwork.com/download/nas/tower-diagnostics-20150823-0242.zip Link to comment
CHBMB Posted August 23, 2015 Share Posted August 23, 2015 I keep on having issues with this darn server left and right. It seems to work and I will put data on the drives. Then next thing I know I have 6 drives "Go bad". So I check them via a smart report and they come back good. So I clean the redball and add them back to the set. And do a parity rebuild. Next thing I know half my darn data is missing/corrupted. What the heck do I do to get this server acting correctly so I stop loosing data. I have LOST at least 10TB worth of data. http://lilnetwork.com/download/nas/tower-diagnostics-20150823-0242.zip Well, I'm no expert, but if you're sure the drives are good, then the problem has got to be either faulty SATA cables or a dodgy controller. Are the drives all on the same drive controller? If they are and it's onboard then I guess you need a new MB.. Link to comment
demonmaestro Posted August 23, 2015 Author Share Posted August 23, 2015 No they are all on the SuperMicro AOC-SAS2LP-MV8 and I have done replaced this card once before. The ones that are connected to the MoBo are usually fine. Link to comment
itimpi Posted August 23, 2015 Share Posted August 23, 2015 Another thing that occurs to me is a RAM related issue as they are always hard to diagnose as they can cause all sorts of random errors. Things to try related to RAM are a long memtest. There also appears to be 32GB or RAM installed - it might be worth removing some of this stability often improves if you do not have all RAM slots populated. Also worth checking there is not a BIOS update for the motherboard. Link to comment
demonmaestro Posted August 23, 2015 Author Share Posted August 23, 2015 Another thing that occurs to me is a RAM related issue as they are always hard to diagnose as they can cause all sorts of random errors. Things to try related to RAM are a long memtest. There also appears to be 32GB or RAM installed - it might be worth removing some of this stability often improves if you do not have all RAM slots populated. Also worth checking there is not a BIOS update for the motherboard. That was the first thing I have tried when started having these issues in the first place. Besides It shouldn't give me as many as issues as it is due to the fact it is ECC. Link to comment
CHBMB Posted August 23, 2015 Share Posted August 23, 2015 No they are all on the SuperMicro AOC-SAS2LP-MV8 and I have done replaced this card once before. The ones that are connected to the MoBo are usually fine. Well that does leave the PCIe slot on the MB as a possibility... Link to comment
demonmaestro Posted August 23, 2015 Author Share Posted August 23, 2015 No they are all on the SuperMicro AOC-SAS2LP-MV8 and I have done replaced this card once before. The ones that are connected to the MoBo are usually fine. Well that does leave the PCIe slot on the MB as a possibility... Well what do I do to fix this issue? Link to comment
CHBMB Posted August 23, 2015 Share Posted August 23, 2015 Two options: Replace the motherboard or change the PCIe slot, sorry I thought that went without saying.. Link to comment
demonmaestro Posted August 23, 2015 Author Share Posted August 23, 2015 Two options: Replace the motherboard or change the PCIe slot, sorry I thought that went without saying.. Just thinking that you may have another idea than doing that. I guess I am going to replace the card yet again and hope things go well. I really don't want to replace the MoBo. Link to comment
CHBMB Posted August 23, 2015 Share Posted August 23, 2015 Can you test the card on another machine? Link to comment
demonmaestro Posted August 23, 2015 Author Share Posted August 23, 2015 Can you test the card on another machine? Yea but not really sure on how to test an addon card. I got another card on order. Link to comment
JorgeB Posted August 23, 2015 Share Posted August 23, 2015 Have you checked this thread? Maybe related. http://lime-technology.com/forum/index.php?topic=40683.0 Link to comment
demonmaestro Posted August 23, 2015 Author Share Posted August 23, 2015 I aint getting those kind of errors in my logs. I get cannot access diskxx or memblockxx stuff. Thank you for looking at that though! Link to comment
mr-hexen Posted August 23, 2015 Share Posted August 23, 2015 Did you remove that zip tie holding the SATA cables together like garycase suggested in your build thread? Link to comment
demonmaestro Posted August 23, 2015 Author Share Posted August 23, 2015 Did you remove that zip tie holding the SATA cables together like garycase suggested in your build thread? Yes I did, I removed all bread ties. Link to comment
mr-hexen Posted August 23, 2015 Share Posted August 23, 2015 Are the failed drives all in the icydock cages? Link to comment
demonmaestro Posted August 23, 2015 Author Share Posted August 23, 2015 No, Just the 8 that are connected to the card. Link to comment
mr-hexen Posted August 23, 2015 Share Posted August 23, 2015 Well you've tried replacing the card already, that did not fix it. So your next steps in narrowing this down are replace the mb (potentially bad PCIe slot) or replace the 8087-SATA cables attached to the card. Do you have the latest BIOS for the mainboard as well? Link to comment
demonmaestro Posted August 23, 2015 Author Share Posted August 23, 2015 Well you've tried replacing the card already, that did not fix it. So your next steps in narrowing this down are replace the mb (potentially bad PCIe slot) or replace the 8087-SATA cables attached to the card. Do you have the latest BIOS for the mainboard as well? Yea all the BIOS has been updated. I have fliped and floped the cables. So I am going to try the card one more time and hope for the best. I really don't want to replace the MoBo due to the cost. Link to comment
mr-hexen Posted August 23, 2015 Share Posted August 23, 2015 Indeed. Have you read the manual for the MB? It said something about two jumpers that provide better management for the PCIe slot. Might want to look into that and flop the jumpers around to try the other way. Link to comment
CHBMB Posted August 23, 2015 Share Posted August 23, 2015 I'd be suspicious if you're getting a third controller card, that the controller card isn't the source of the problem. Link to comment
demonmaestro Posted August 23, 2015 Author Share Posted August 23, 2015 Aug 23 14:56:49 Tower kernel: md: disk8: ATA_OP e3 ioctl error: -5 Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:57:00 Tower emhttp: /usr/bin/tail -n 42 -f /var/log/syslog 2>&1 Link to comment
CHBMB Posted August 23, 2015 Share Posted August 23, 2015 Tools ==> Diagnostics ==> Collect And post them here. I'm not very good at interpreting them but it's worth sticking them up so someone more knowledgeable can have a look and advise. Link to comment
tr0910 Posted August 23, 2015 Share Posted August 23, 2015 I haven't heard anyone here question the power supply. This is the one server weakness that can be the most puzzling to diagnose. Can you replace the power supply? Link to comment
demonmaestro Posted August 23, 2015 Author Share Posted August 23, 2015 I haven't heard anyone here question the power supply. This is the one server weakness that can be the most puzzling to diagnose. Can you replace the power supply? Yes I can but the PSU is not the issue at hand due to it is not under any kind of load and the server has been behind a APC Battery all of its life. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.