August 23, 201510 yr I keep on having issues with this darn server left and right. It seems to work and I will put data on the drives. Then next thing I know I have 6 drives "Go bad". So I check them via a smart report and they come back good. So I clean the redball and add them back to the set. And do a parity rebuild. Next thing I know half my darn data is missing/corrupted. What the heck do I do to get this server acting correctly so I stop loosing data. I have LOST at least 10TB worth of data. http://lilnetwork.com/download/nas/tower-diagnostics-20150823-0242.zip
August 23, 201510 yr I keep on having issues with this darn server left and right. It seems to work and I will put data on the drives. Then next thing I know I have 6 drives "Go bad". So I check them via a smart report and they come back good. So I clean the redball and add them back to the set. And do a parity rebuild. Next thing I know half my darn data is missing/corrupted. What the heck do I do to get this server acting correctly so I stop loosing data. I have LOST at least 10TB worth of data. http://lilnetwork.com/download/nas/tower-diagnostics-20150823-0242.zip Well, I'm no expert, but if you're sure the drives are good, then the problem has got to be either faulty SATA cables or a dodgy controller. Are the drives all on the same drive controller? If they are and it's onboard then I guess you need a new MB..
August 23, 201510 yr Author No they are all on the SuperMicro AOC-SAS2LP-MV8 and I have done replaced this card once before. The ones that are connected to the MoBo are usually fine.
August 23, 201510 yr Community Expert Another thing that occurs to me is a RAM related issue as they are always hard to diagnose as they can cause all sorts of random errors. Things to try related to RAM are a long memtest. There also appears to be 32GB or RAM installed - it might be worth removing some of this stability often improves if you do not have all RAM slots populated. Also worth checking there is not a BIOS update for the motherboard.
August 23, 201510 yr Author Another thing that occurs to me is a RAM related issue as they are always hard to diagnose as they can cause all sorts of random errors. Things to try related to RAM are a long memtest. There also appears to be 32GB or RAM installed - it might be worth removing some of this stability often improves if you do not have all RAM slots populated. Also worth checking there is not a BIOS update for the motherboard. That was the first thing I have tried when started having these issues in the first place. Besides It shouldn't give me as many as issues as it is due to the fact it is ECC.
August 23, 201510 yr No they are all on the SuperMicro AOC-SAS2LP-MV8 and I have done replaced this card once before. The ones that are connected to the MoBo are usually fine. Well that does leave the PCIe slot on the MB as a possibility...
August 23, 201510 yr Author No they are all on the SuperMicro AOC-SAS2LP-MV8 and I have done replaced this card once before. The ones that are connected to the MoBo are usually fine. Well that does leave the PCIe slot on the MB as a possibility... Well what do I do to fix this issue?
August 23, 201510 yr Two options: Replace the motherboard or change the PCIe slot, sorry I thought that went without saying..
August 23, 201510 yr Author Two options: Replace the motherboard or change the PCIe slot, sorry I thought that went without saying.. Just thinking that you may have another idea than doing that. I guess I am going to replace the card yet again and hope things go well. I really don't want to replace the MoBo.
August 23, 201510 yr Author Can you test the card on another machine? Yea but not really sure on how to test an addon card. I got another card on order.
August 23, 201510 yr Community Expert Have you checked this thread? Maybe related. http://lime-technology.com/forum/index.php?topic=40683.0
August 23, 201510 yr Author I aint getting those kind of errors in my logs. I get cannot access diskxx or memblockxx stuff. Thank you for looking at that though!
August 23, 201510 yr Did you remove that zip tie holding the SATA cables together like garycase suggested in your build thread?
August 23, 201510 yr Author Did you remove that zip tie holding the SATA cables together like garycase suggested in your build thread? Yes I did, I removed all bread ties.
August 23, 201510 yr Well you've tried replacing the card already, that did not fix it. So your next steps in narrowing this down are replace the mb (potentially bad PCIe slot) or replace the 8087-SATA cables attached to the card. Do you have the latest BIOS for the mainboard as well?
August 23, 201510 yr Author Well you've tried replacing the card already, that did not fix it. So your next steps in narrowing this down are replace the mb (potentially bad PCIe slot) or replace the 8087-SATA cables attached to the card. Do you have the latest BIOS for the mainboard as well? Yea all the BIOS has been updated. I have fliped and floped the cables. So I am going to try the card one more time and hope for the best. I really don't want to replace the MoBo due to the cost.
August 23, 201510 yr Indeed. Have you read the manual for the MB? It said something about two jumpers that provide better management for the PCIe slot. Might want to look into that and flop the jumpers around to try the other way.
August 23, 201510 yr I'd be suspicious if you're getting a third controller card, that the controller card isn't the source of the problem.
August 23, 201510 yr Author Aug 23 14:56:49 Tower kernel: md: disk8: ATA_OP e3 ioctl error: -5 Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:56:49 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO Aug 23 14:57:00 Tower emhttp: /usr/bin/tail -n 42 -f /var/log/syslog 2>&1
August 23, 201510 yr Tools ==> Diagnostics ==> Collect And post them here. I'm not very good at interpreting them but it's worth sticking them up so someone more knowledgeable can have a look and advise.
August 23, 201510 yr I haven't heard anyone here question the power supply. This is the one server weakness that can be the most puzzling to diagnose. Can you replace the power supply?
August 23, 201510 yr Author I haven't heard anyone here question the power supply. This is the one server weakness that can be the most puzzling to diagnose. Can you replace the power supply? Yes I can but the PSU is not the issue at hand due to it is not under any kind of load and the server has been behind a APC Battery all of its life.
Archived
This topic is now archived and is closed to further replies.