RADIatiON Posted September 28, 2015 Share Posted September 28, 2015 Hi guys, So on the weekend I noticed that one of my drives (Drive 6) had gone redball. I powered off the system and restarted for good measure. I ran a smartctl against it and everything looked ok (as far as I could tell...it "passed" anyway). So I was heading out for golf and I decided to run the long test just for good measure. When I checked after it finished, again I could find no problems. So I initiated a rebuild of the drive on the array. After the rebuild started, everything proceeded fine until about 4% or 5% completed and then another drive (Drive 5) started reporting a bunch (1000s) of errors. I stopped the array and checked and the system could no longer find Drive5 (missing). I rebooted again, and the drive was back. I ran smartctl short test on that drive and it all showed OK, so I started the rebuild again. As before it bombed out at about 5% and Disk5 was "missing" again. So, I powered off, rebooted and started the array but stopped the build until I hear from you guys. I've attached a syslog and the smartctl logs from the two drives in question. I'm running unRaid 5.0.6 What do you guys suggest for a course of action? thanks, Sam Syslog_Smartctl.zip Quote Link to comment
trurl Posted September 28, 2015 Share Posted September 28, 2015 Are these disks on the same controller? What is the exact model of your power supply? Quote Link to comment
RADIatiON Posted September 28, 2015 Author Share Posted September 28, 2015 Hi, thanks for the reply. Yes they are on the same controller (controller to 5in3 cages). Not sure on the PS. It might be a Corsair Professional Series Gold AX850 CMPSU-850AX, I'm working off of old receipts. I'll confirm in 3 or 4 hours when I get home. Sam Quote Link to comment
SSD Posted September 28, 2015 Share Posted September 28, 2015 I wonder if you might have some loose connections to your 5in3 cage. Once the connections are made and burned in, and if you never open the case, they usually are good for years. But your symptoms point to something wrong with cabling to the cages. Some cages require a little extra shove to ensure that the cage (with drive attached) is secured to its backplane. My Supermicros snap the drive securely in place, but my Rosewills latch is not quite as sure and I always give an extra push. Not sure of your cages, but that might also be the issue. Quote Link to comment
RADIatiON Posted September 28, 2015 Author Share Posted September 28, 2015 Thanks for that. I haven't touched this computer in months... if not years. It's just been sitting there happily and quietly! I bought a replacement drive for that one that was questionable. I'll dump it in an preclear and then try rebuilding with it. Am I right that I didn't see anything in the smartctl reports? I noticed the drive in question had this: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 32 I only noticed this because a drive in my HTPC failed last week. Maybe there is something in the water! Sam Quote Link to comment
RADIatiON Posted September 29, 2015 Author Share Posted September 29, 2015 Still preclearing. I just wanted to correct some information. My power supply is a TX650 Corsair (CP-9020038-NA) This server has been running day and night since around August 2011. Sam Quote Link to comment
RADIatiON Posted September 30, 2015 Author Share Posted September 30, 2015 OK, well that didn't work. I tried rebuilding with the brand new drive to replace the one that was redball (Drive 6) but the rebuild process still knocked Disk 5 off line (missing). Can I move Disk 5 to another slot/controller and still rebuild from there, or will the array think that it's a different disk because it will change from /dev/sdh to /dev/sde or something similar. That's all I can think of trying for now. Any other suggestions? Sam Quote Link to comment
trurl Posted September 30, 2015 Share Posted September 30, 2015 ...Can I move Disk 5 to another slot/controller and still rebuild from there, or will the array think that it's a different disk because it will change from /dev/sdh to /dev/sde or something similar...Since you are on v5 unRAID will remember the disks by their serial number so you can try a different port. Quote Link to comment
RADIatiON Posted September 30, 2015 Author Share Posted September 30, 2015 Awesome trurl. Thanks. I will give that a try later tonight then. Sam Quote Link to comment
RADIatiON Posted October 1, 2015 Author Share Posted October 1, 2015 So it looks like I got something else going on. Either a controller/cable/cage problem. I moved Disk5 off to another slot in a seperate drive tray that I use when I preclear drives. I know that tray is connected directly to the motherboard. I started up the system and checked the file system of Disk5 just because I could (no problems). I then started another rebuild. Now another drive is getting knocked off line. I'm not sure if that drive is in the same 5in3 cage that Disk5 was, but I don't think it is. The system has not been moved/touched/disturbed in months so I don't think I have spontaneous loose cables. Plus I think I'm using locking cables (at lease coming from the add-in controller card) I'm starting to suspect the controller card (Supermicro AOC-SASLP-MV8). So, based on that information, what do you guys think? Any other thoughts on how to isolate? Looks like I will have to open the system up! Sam Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.