January 11, 201313 yr server has been humming alone nicely now for some time. issues started with strange plug-in behavior (not writting to log files etc), I assumed I may be having drive issues as my cache is a old laptop drive and was always expected the first to go. when I went to take a look I was getting errors on the drive sdd1. i attempted to remove the drive and after restarting the server I am getting all sorts of new errors. unfortunately I am struggling to decode the log file. I am currently running 7 2T drives (most are WD green drives from what I remember, others are Seagate) and the 160G 2.5" drive. I have been using the corsair CX430 without any issues. Based on reading this http://lime-technology.com/wiki/index.php/The_Analysis_of_Drive_Issues#Drive_Interface_Issues and starting thinking, maybe PS is overworked? Server was constructed mid 2010 and most recent drive was added 6ish months ago. Thanks for the help, hopefully this syslog has heplful info, I wish I thought to grab one before taking the drive out....lesson learned. syslog-2013-01-10.txt
January 11, 201313 yr The syslog indicates communication erors with the drives. Since several drives are effected there is likely a common hardware component that is faulty. Do the effected drives share a common component?
January 11, 201313 yr Author I currently have 6 drives plugged into the mobo directly. I am also using a sata controller card. the SAS adapter is using 2 of the 4 connections and I am not using the other port. I removed the controller card and reseated it, I also swapped ports for the SAS. I then checked all sata cables for secure connections, 1 seemed a little lose but looks good now. After a restarting, I am still getting a "red ball" on drive 4. I did reconnect the cache drive and it started right up again, complete with working plug-ins. Attached is a new syslog. syslog-2013-01-11.txt
January 12, 201313 yr 2 drive are having communication errors. One is the non-array disk WDC WD1600BEVT-22ZCT0.
January 25, 201313 yr Author I have had my server off line for a little bit and am trying to get it back in tip top shape here. I removed the drives to check for any obvious connections issues, none found. I also replaced the 2 drives that I believe are throwing errors. WDC_WD1600BEVT-22ZCT0 & ST2000DL003-9VT166_5YD4P8DY. They were both previously connected directly to the motherboard. While troubleshooting I connected them to my SATA controller card, and I have left them there. Upon restarting, it appears I am still getting errors. Where do I go from here? Thanks for your help! Attached is a new syslog. syslog-2013-01-25.txt
January 25, 201313 yr Author If I use the Syslog page from unmenu it reports these "Errors" and "Minor Issues". The cache drive is working just fine. I just want to make sure I am good before clearing the red ball then. How can I troubleshoot at this point to verify that the Sata cables were the issue? Attached is another Syslog. Jan 24 23:57:04 Tower kernel: pci0000:00: ACPI _OSC request failed (AE_SUPPORT), returned control mask: 0x0d (Minor Issues) Jan 24 23:57:04 Tower kernel: ACPI Warning: 0x00000400-0x0000041f SystemIO conflicts with Region \SMRG 1 (20120320/utaddress-251) (Minor Issues) Jan 24 23:57:04 Tower kernel: sas: ata7: end_device-0:0: dev error handler (Errors) Jan 24 23:57:04 Tower kernel: sas: ata7: end_device-0:0: dev error handler (Errors) Jan 24 23:57:04 Tower kernel: sas: ata8: end_device-0:1: dev error handler (Errors) Jan 24 23:57:04 Tower kernel: sas: ata7: end_device-0:0: dev error handler (Errors) Jan 24 23:57:04 Tower kernel: sas: ata8: end_device-0:1: dev error handler (Errors) Jan 24 23:57:04 Tower kernel: sas: ata9: end_device-0:2: dev error handler (Errors) Jan 24 23:57:04 Tower kernel: sas: ata7: end_device-0:0: dev error handler (Errors) Jan 24 23:57:04 Tower kernel: sas: ata8: end_device-0:1: dev error handler (Errors) Jan 24 23:57:04 Tower kernel: sas: ata9: end_device-0:2: dev error handler (Errors) Jan 24 23:57:04 Tower kernel: sas: ata10: end_device-0:3: dev error handler (Errors) Jan 24 23:57:13 Tower logger: # * Extensive error-handling mechanism, mirroring OpenSSL's error codes (Errors) Jan 24 23:58:51 Tower emhttp: shcmd (53): killall -HUP smbd (Minor Issues) syslog-2013-01-25_1.txt
January 25, 201313 yr Those are not errors. Any line including the word "error", e.g., terror or "This is not an error", will be marked (Errors) by unMENU.
January 25, 201313 yr Author just to be sure then... you are saying I am good? The issues you were commenting on last week are no longer showing in my syslog? If you ARE saying that I am ok... should I just follow http://lime-technology.com/wiki/index.php/FAQ#What_does_the_Red_Ball_mean.3F How do I re-enable a failed disk? instructions to clear the red ball? Thanks again for your help!
Archived
This topic is now archived and is closed to further replies.