JustinChase Posted October 7, 2014 Share Posted October 7, 2014 So, I had my first red-ball, then another, then a drive went unformatted, then my cache did the same. Once I had most/all of that repaired, I decided to test a 3TB drive that had redballed by running a preclear. it came back clean, so i decided to use it to replace a 1TB drive that has been giving me occasional errors, and is much older. now that the 3TB drive is in the machine, and the rebuild is in process, it's going REALLY slow. Current speed is 2.2MB/sec. it says it will take 19 days to finish the rebuild. the preclear went just fine, and a 'normal' speed. i think it averaged around 90MB/sec for the preclear, and took about a day. My cache drive has completely died in all this, and can't be recognized in another machine either. I'm wondering if something else is causing all these issues in my server, and how I might get to the bottom of all these errors. I've spent the last 2 weeks copying data and moving drives and preclearing and rebuilding and checking parity, and I really just want to get back to having a server that serves my media, and not have to spend all my time checking on drives and processes. Suggestions, thoughts, ideas? syslog.txt Link to comment
StevenD Posted October 7, 2014 Share Posted October 7, 2014 I noticed all of your issues and almost said something in another thread. Are you certain your power supply is good? Link to comment
JustinChase Posted October 7, 2014 Author Share Posted October 7, 2014 The power supply is fairly new, less than 2 years old anyway. I don't know how to test it though. Is there some utility or bench test I can try? I don't really want to buy another power supply to test, since there's no 'specific' issue that would indicate if the new power supply was working any better. Link to comment
StevenD Posted October 7, 2014 Share Posted October 7, 2014 I have one of these. It seems to be fairly reliable. http://www.amazon.com/Rosewill-Digital-Supply-Tester-RTK-PST/dp/B004Q7FUGM Link to comment
bkastner Posted October 7, 2014 Share Posted October 7, 2014 Alternatively, find somewhere with a good return policy and buy a new one to test. I've had to do this with several items to validate a hypothesis. Sometimes even the 5-15% restocking fee is a reasonable cost to prove a theory. Link to comment
jphipps Posted October 7, 2014 Share Posted October 7, 2014 From your syslog, it looks like ata5 is having some IO issues, which I think is sdc (disk2) I wonder if that is causing your slowness. Link to comment
JustinChase Posted October 7, 2014 Author Share Posted October 7, 2014 figures. that's not a disk I've had any problems with before today. Of course, I've moved around disks so often, I could have unseated the cable on that drive to cause this. I suppose it's probably okay to stop the rebuild, shut down the machine, check all connections, and fire it up again. That's not going to cause any problems with the rebuild is it? Sadly, the drive it's rebuilding was 'empty' before the disk was replaced, since I'd already copied everything to other drives, so I actually thought the rebuild might go fairly quickly. oh well. Link to comment
garycase Posted October 7, 2014 Share Posted October 7, 2014 Sadly, the drive it's rebuilding was 'empty' before the disk was replaced, since I'd already copied everything to other drives, so I actually thought the rebuild might go fairly quickly. oh well. The speed of a rebuild has absolutely nothing to do with how much data is on the disk. The entire disk is rebuilt from the other drives + parity, regardless of whether it contains data, is all zeroes, or anything in-between. As for using a PSU tester -- not likely to be helpful, since they only show if the various power buses are present and are powering up within the time limits for the "power good" signal. They don't show issues that may be related to bus loading. The only reasonable way to confirm whether it's a power issue or not is to try another power supply. If you don't want to buy a spare unit (I find it handy to have a spare one around, but if you don't "fiddle" a lot with PC's you probably wouldn't have much use for it), then I'd exhaust all other possibilities (cables, etc.) first. Link to comment
JustinChase Posted October 7, 2014 Author Share Posted October 7, 2014 it's kind of funny, since this power supply was purchased because I thought my last power supply might be borderline, so I still have it sitting here, ready for the next PC I build. I think it's only a 400W unit, so it wouldn't be a good 'test' unit to put in this machine anyway. i decided to stop the rebuild, and check all the cables and restart. That was about an hour ago and it's currently chugging along at 100MB/sec, with an estimated finish of about 6 hours from now. Hopefully once finished, I'll run a parity check, then not have any more issues for a good long time. fingers crossed! thanks again to everyone for all the help and suggestions. Link to comment
JustinChase Posted October 8, 2014 Author Share Posted October 8, 2014 Okay, I finally got all drives back into the server, set the cache drive, then reinstalled a few dockers. While going thru and setting them up, I ran into some more issues. now, it seems my parity is unrecognized, and unRAID thinks it's a new drive, and is forcing me to do a parity check. It's going really slow. looking at the syslog, it seems another different drive is giving errors/having issues. I'm so close to just taking everything apart, selling all the parts, and not having a server any more. Does anyone have any suggestions on how to get this F...... server to just F...... run without problems, or do I just sell the parts, and spend all my newfound free time outside, in the real world? syslog.zip Link to comment
jphipps Posted October 8, 2014 Share Posted October 8, 2014 Not sure if this would help you or not, but a few of us found if you boot up under xen, some drive and parity issues seem to go away. Link to comment
bkastner Posted October 8, 2014 Share Posted October 8, 2014 Okay, I finally got all drives back into the server, set the cache drive, then reinstalled a few dockers. While going thru and setting them up, I ran into some more issues. now, it seems my parity is unrecognized, and unRAID thinks it's a new drive, and is forcing me to do a parity check. It's going really slow. looking at the syslog, it seems another different drive is giving errors/having issues. I'm so close to just taking everything apart, selling all the parts, and not having a server any more. Does anyone have any suggestions on how to get this F...... server to just F...... run without problems, or do I just sell the parts, and spend all my newfound free time outside, in the real world? Wow... you just can't catch a break, can you? Based on issues I've currently had, I would suggest that you do your work under the Xen config since under non-Xen I was getting errors on the parity drive and parity corrections - however under Xen the parity check runs smooth without issue. Not sure why, but I added a post under defects about this. Second, I still think your power supply is the most likely culprit. Not quite having enough power to all your drives would cause all sorts of inconsistent issues, and the problems could move from drive to drive as there is no guarantee which drive is being short changed by the PSU. I may be completely wrong here, but I know I've had random errors with different drives when I've added more HDs than my PSU can support. Link to comment
dgaschk Posted October 8, 2014 Share Posted October 8, 2014 Post a SMART report for disk2. Check for BIOS and firmware updates Link to comment
JustinChase Posted October 8, 2014 Author Share Posted October 8, 2014 Second, I still think your power supply is the most likely culprit. Not quite having enough power to all your drives would cause all sorts of inconsistent issues, and the problems could move from drive to drive as there is no guarantee which drive is being short changed by the PSU. I suspect power supply also. So, if correct what power supply should I buy? How much power do I need to run 12 drives and a decent video card? I have a gold certified 650Watt, single rail power supply currently, and it seems that's not enough. the UPS says I'm only using 129 watts with all drives spun up. I just don't want to/can't spend another $500 or more to get a new power supply, and a couple more disks in the hopes that this will fix everything. Link to comment
archedraft Posted October 8, 2014 Share Posted October 8, 2014 Second, I still think your power supply is the most likely culprit. Not quite having enough power to all your drives would cause all sorts of inconsistent issues, and the problems could move from drive to drive as there is no guarantee which drive is being short changed by the PSU. I suspect power supply also. So, if correct what power supply should I buy? How much power do I need to run 12 drives and a decent video card? I have a gold certified 650Watt, single rail power supply currently, and it seems that's not enough. the UPS says I'm only using 129 watts with all drives spun up. I just don't want to/can't spend another $500 or more to get a new power supply, and a couple more disks in the hopes that this will fix everything. Is the video card a new addition to your server? If so, maybe removing it will give it enough power?? Link to comment
JustinChase Posted October 8, 2014 Author Share Posted October 8, 2014 Post a SMART report for disk2. Check for BIOS and firmware updates Here are smart reports for the drives that look like they are having issues, disk 4 (WD20EACS) and parity (ST4000DM000). I didn't see anything wrong with disk 2 (), but it's attached also. Also, most recent syslog. I don't know how to read these, so please let me know if I need to act on any of this. thanks again for all the help. smartWD20EACS.txt smartST4000DM000.txt smartHDS5C3030ALA630.txt syslog.txt Link to comment
bkastner Posted October 8, 2014 Share Posted October 8, 2014 Second, I still think your power supply is the most likely culprit. Not quite having enough power to all your drives would cause all sorts of inconsistent issues, and the problems could move from drive to drive as there is no guarantee which drive is being short changed by the PSU. I suspect power supply also. So, if correct what power supply should I buy? How much power do I need to run 12 drives and a decent video card? I have a gold certified 650Watt, single rail power supply currently, and it seems that's not enough. the UPS says I'm only using 129 watts with all drives spun up. I just don't want to/can't spend another $500 or more to get a new power supply, and a couple more disks in the hopes that this will fix everything. I have 13 drives currently, but am only using on-board video, and have a AX860, which likely gives me lots of head room. However, it may not be that the 650 you have isn't giving sufficient power - it may just be faulty. Mine is around $180 USD on Newegg, but as mentioned I think it may be overkill. It's only because I've had PSU issues before that I wanted to give myself a lot of headroom so that as I expand my array I will (hopefully) not have similar issues. Link to comment
archedraft Posted October 8, 2014 Share Posted October 8, 2014 The only drive that would have me worried is the "smartST4000DM000" specifically the Reported_Uncorrect value SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 110 099 006 Pre-fail Always - 27141960 3 Spin_Up_Time 0x0003 098 092 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1393 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 40 7 Seek_Error_Rate 0x000f 076 060 030 Pre-fail Always - 42275386 9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 7665 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 187 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 094 094 000 Old_age Always - 6 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 095 095 000 Old_age Always - 5 190 Airflow_Temperature_Cel 0x0022 069 050 045 Old_age Always - 31 (Min/Max 31/32) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 226 193 Load_Cycle_Count 0x0032 095 095 000 Old_age Always - 10892 194 Temperature_Celsius 0x0022 031 050 000 Old_age Always - 31 (0 16 0 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 1311h+16m+55.411s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 76567108688 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 335881793285 Link to comment
JustinChase Posted October 8, 2014 Author Share Posted October 8, 2014 Is the video card a new addition to your server? If so, maybe removing it will give it enough power?? Actually, yes, it is new to the server. I'll pull it out and see if that changes anything. I wonder if any of the smart reports indicate a fatal problem with any of the drives, or if it might just be a power supply issue. the video card says it only draws a maximum of 116 Watts, and needs a minimum 400W PS, so maybe I am just running of out clean power here. Although, I'm not even using the video card at this point. I was hoping to set up a windows VM to see if I could use it, but I guess that's not going to happen. The really sad thing is that I bought a new motherboard and CPU to prepare for the possibility of GPU passthru, which is pretty much all wasted money at this point. I may take everything apart, put the old motherboard and CPU back into the machine, replace the power supply, and rebuild my HTPC and just go back to having 2 computers, which worked just fine until I decided to 'make it better'. Huge FAIL!! Link to comment
archedraft Posted October 8, 2014 Share Posted October 8, 2014 Opps another thing just popped out: On drive "smartHDS5C3030ALA630" 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 46 I believe this indicates that the sata cable is bad / loose? Link to comment
JustinChase Posted October 8, 2014 Author Share Posted October 8, 2014 The only drive that would have me worried is the "smartST4000DM000" specifically the Reported_Uncorrect value I don't know what that means, but that's my parity drive. I have a new 4TB drive I just installed to replace a different failed drive, and it has almost no data on it right now. I wonder if I should remove it, set it to be parity, then pre-clear/test the current parity drive to see how it looks. if I decided to do that, I honestly don't even know how that would work, as I'd have to remove that drive from the array, then I'd have a parity and that drive red-balled, which I don't think would allow unRAId to even start the array. AARRRGGGGHHHHHH!!!!!! I hate computers sometimes. Also, the other 2TB drive is continually getting reset as per the syslog, so I suspect it's either a drive problem, or perhaps the PS. I'm going to shutdown, remove the video card, then restart and see how things look then. Link to comment
jphipps Posted October 8, 2014 Share Posted October 8, 2014 If you search amazon, I have: Corsair RM Series 1000 Watt ATX/EPS 80PLUS Gold-Certified Power Supply - CP-9020062-NA RM1000 I am running 20 hard drives and haven't had any issues, and it is only $149 and it is modular... 1000W should easily handle the 12 drives.. I would try the Xen boot before anything else... Some of those counts may not mean anything if they are old. My parity drive has a few errors on it, but occurred years ago, and haven't increased and I haven't seen an issues from it.. Link to comment
JustinChase Posted October 8, 2014 Author Share Posted October 8, 2014 Okay, I removed the video card, changed the default boot to XEN, replaced the SATA cable going to Disk4 - WDC_WD20EACS-11BHUB0_WD-WCAZA3758422 (sdd) drive (ata4) and rebooted. disk 4 still throws lots of errors in the syslog, and the parity check is running at less than 1MB/sec. I don't have a drive here that I can use to replace disk4, but if I need to, I could go buy one at Best Buy, which is about the same price as Amazon or Newegg. Honestly, at this point I'd rather just remove the drive and move the data onto one of my other drives, but as slow as it's going, that will take many days to complete. Any ideas/suggestions on how to proceed from here? syslog.txt Link to comment
garycase Posted October 8, 2014 Share Posted October 8, 2014 Since you have your old (400w) power supply laying around, switch to that and see if things change. You may simply have a faulty bus on your new power supply. The 400w should be plenty as long as you don't reinstall the video card. If it resolves things, then you'll at least know that a new PSU should both resolve the issues and allow you to use the video card. Link to comment
garycase Posted October 8, 2014 Share Posted October 8, 2014 By the way, is there any correlation between the drives that are having issues and which SATA port they're connected to? Just curious if the add-in SATA cards you're using might have driver issues with v6. Did you have these issues with v5 ?? Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.