jj0076 Posted November 16, 2014 Share Posted November 16, 2014 I've tried searching the forum but only managed to confuse myself more. A while ago (5+ weeks) my parity drive showed up a red ball and a bunch of errors. So I shut down, removed drive, rebooted, shutdown, replaced the same drive and booted up and brought the array online to start a parity sync. It ran over night and all appeared well the next morning, so I ran a parity check which ran while I was at work and all was fine when I returned home. Last night I noticed that it had red balled again - I followed the same procedure and it is now just over 10% into the sync running at less that 10 MB/sec and looking like it will take just under 2 days to complete. Is it time to pull that drive out and replace with a new one? (and if I do put in a new parity, am I running a risk by putting the old drive back into the array as a data drive?) Thanks in advance. Link to comment
Lacehim Posted November 17, 2014 Share Posted November 17, 2014 Run a smart test and post the results. Personally I would have replaced the drive on the first red ball and then investigated the old drive. Until you've given the old drive a good check out I wouldn't be putting it anyway near your array as a data drive. Old drives are great for door stops, paper weights etc. Link to comment
SSD Posted November 17, 2014 Share Posted November 17, 2014 The most common cause of red balls is cabling problems. My guess is that's what you have. If that's the problem, replace the SATA cable with a known good one, Or at least resecure both ends of the current cable. Also resecure the power connection. This is a very very common issue. But the way, to know if the drive is failing or there is a cabling problem is to get a SMART report and post the results. Link to comment
jj0076 Posted November 17, 2014 Author Share Posted November 17, 2014 Thanks for the help, following the parity sync it red balled again so I'm on the way to get some new drives now. No idea how to do a smart test so I'll look that up later to run on the old drive. Link to comment
dikkiedirk Posted November 17, 2014 Share Posted November 17, 2014 Thanks for the help, following the parity sync it red balled again so I'm on the way to get some new drives now. No idea how to do a smart test so I'll look that up later to run on the old drive. Time to learn how to get a SMART report. You should preclear the new disk anyhow. smartctl -a /dev/sdX I believe. You could throw in a LONG test too. Link to comment
Frank1940 Posted November 17, 2014 Share Posted November 17, 2014 Thanks for the help, following the parity sync it red balled again so I'm on the way to get some new drives now. No idea how to do a smart test so I'll look that up later to run on the old drive. Time to learn how to get a SMART report. You should preclear the new disk anyhow. smartctl -a /dev/sdX I believe. You could throw in a LONG test too. Here is a link to the Manual page for smartctl: http://smartmontools.sourceforge.net/man/smartctl.8.html Look toward the bottom of the page for examples of typical command lines. Also, both the Dynamix and unMENU plugins contain built-in SMART Reports app's. Installing either one of these app's will give you a virtually idiot-proof series of mouse clicks to do and/or get all the tests and reports. Link to comment
SSD Posted November 17, 2014 Share Posted November 17, 2014 Thanks for the help, following the parity sync it red balled again so I'm on the way to get some new drives now. No idea how to do a smart test so I'll look that up later to run on the old drive. Unless you need a new drive it is better to get the SMART report first. Failed drives are far less common than loose cables. Link to comment
jj0076 Posted November 17, 2014 Author Share Posted November 17, 2014 Thanks for the help, following the parity sync it red balled again so I'm on the way to get some new drives now. No idea how to do a smart test so I'll look that up later to run on the old drive. Unless you need a new drive it is better to get the SMART report first. Failed drives are far less common than loose cables. Well I'm nearly out of storage space anyway so if it's not failed then that solves that issue as well!! I'll get on the task of pre clear, smart report etc later tonight. Link to comment
SSD Posted November 17, 2014 Share Posted November 17, 2014 For now just run a smart report. Very quick. There are GUI tools like myMain (see my Sig). But the sickest easy would be to run it from the command line. E.g., smartctl -A -a /dev/sdz (where sdz is tge sata device id for the drive in question. Link to comment
jj0076 Posted November 17, 2014 Author Share Posted November 17, 2014 I've attached the short smart report for the drive in question. Looking at the reallocated sector count and the current pending sector count, things don't look good. Is there anything that can be done with this drive? smart.txt Link to comment
Frank1940 Posted November 17, 2014 Share Posted November 17, 2014 I've attached the short smart report for the drive in question. Looking at the reallocated sector count and the current pending sector count, things don't look good. Is there anything that can be done with this drive? If it were me, I order a new drive ASAP. Personally, I would shut the server down until I got it. (The last thing you want is problems with a second drive.) I would run the new drive through three preclear cycles. (If you don't have a second computer that you can use to do this, keep the array off line.) Install it and allow parity to rebuilt. If you are really want to see if this drive is salvageable, you could take the old drive and attempt to run several cycles of preclear on it and see what happens. In some cases, the 'Current_Pending_Sector' and 'Offline_Uncorrectable' counts might drop to zero and stay there, and the 'Reallocated_Sector_Ct ' count remain stable. If these conditions don't happen, then my next use for the drive would be a doorstop. (My last failed drive actually prevented two different computers from even booting into unRAID!!!) Link to comment
jj0076 Posted November 17, 2014 Author Share Posted November 17, 2014 Cheers for confirming what I already thought from the smart report. I bought a new drive this afternoon so the unraid box will be shut down overnight and the new one installed tomorrow afternoon. Thanks again. Link to comment
JonathanM Posted November 17, 2014 Share Posted November 17, 2014 Cheers for confirming what I already thought from the smart report. I bought a new drive this afternoon so the unraid box will be shut down overnight and the new one installed tomorrow afternoon. Thanks again. I strongly urge you to run the new drive through some full surface verification routine, preclear preferably, or at least a smart long scan before trusting it with your data. New drive doesn't mean it's good, a small percentage of new drives fail within the first few days of service. Link to comment
jj0076 Posted November 18, 2014 Author Share Posted November 18, 2014 Ok, I'll do that with the new drive. Am I right in thinking that the parameters I mentioned above are the main ones to look at? Link to comment
Lacehim Posted November 18, 2014 Share Posted November 18, 2014 Yes, roughly speaking. SMART data isn't 100% fool proof because manufacturers use different values, but any reallocated sectors or pending ones aren't a good thing. Some drives fail and the SMART data was good, and others let you know. For example heres your drive SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 165 162 021 Pre-fail Always - 6741 4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2207 5 Reallocated_Sector_Ct 0x0033 181 181 140 Pre-fail Always - 372 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 080 080 000 Old_age Always - 15059 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 706 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 41 193 Load_Cycle_Count 0x0032 198 198 000 Old_age Always - 7956 194 Temperature_Celsius 0x0022 123 112 000 Old_age Always - 27 196 Reallocated_Event_Count 0x0032 051 051 000 Old_age Always - 149 197 Current_Pending_Sector 0x0032 198 196 000 Old_age Always - 880 198 Offline_Uncorrectable 0x0030 076 076 000 Old_age Offline - 40606 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 001 001 000 Old_age Offline - 275728 And here's an old drive I just pre-cleared and added to the array from my PC. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 825 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 067 065 025 Pre-fail Always - 10157 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1274 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 10883 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1161 181 Program_Fail_Cnt_Total 0x0022 100 100 000 Old_age Always - 3021256 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 145 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 064 061 000 Old_age Always - 33 (Min/Max 16/39) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 90 223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1286 It's old but has no current pending sectors, or reallocated even counts. Not bad for a drive with 10,000 power on hours eh. I would advise you do a pre-clear on the new drive with screen on the new drive before you use it. It will test it properly and format it for immediate use on the array (when you add it it will take a few seconds to install instead of having to fully format the disk). http://lime-technology.com/wiki/index.php/Configuration_Tutorial#Preclearing_With_Screen 2Tb will take around 20 hours roughly from my experience. Just make sure you pre-clear the right drive And I always take a screen shot of the main menu to make sure if I do anything with the other drives like unplugging cables that they are put back in the correct order. Link to comment
jj0076 Posted November 19, 2014 Author Share Posted November 19, 2014 Ok, so I'm all up and running again thanks to all the help here. I have pre-cleared and smart tested the new drive and the parity sync is complete with the first parity check due to run overnight. I put the old drive into the arrray (unassigned) with a view to running a few pre-clear cycles on it to see if its fit for anything and encountered something odd. I was expecting it to show up as sdf, but it got listed as hdf instead. When it was parity is was sdb, which has now been taken by the new drive. From what I found by searching, if a drive shows as hdx rather that sdx it is a BIOS setting issue? Is that correct? So I guess my next question is, should I be concerned by this - and if so what changes do I need to be looking into? Thanks again for the continued support. Link to comment
itimpi Posted November 19, 2014 Share Posted November 19, 2014 From what I found by searching, if a drive shows as hdx rather that sdx it is a BIOS setting issue? Is that correct? So I guess my next question is, should I be concerned by this - and if so what changes do I need to be looking into? If the disk is showing up as a 'hd;' device, then this normally means that it is configured in the BIOS to run in IDE emulation mode. This tends to lead to lower performance. You need to look into your BIOS settings to see what mode the disks are set to run in. Link to comment
jj0076 Posted November 20, 2014 Author Share Posted November 20, 2014 Great, thanks. Is this something that I can safely change without affecting anything else? Link to comment
itimpi Posted November 20, 2014 Share Posted November 20, 2014 Great, thanks. Is this something that I can safely change without affecting anything else? You should be able to. Link to comment
jj0076 Posted November 21, 2014 Author Share Posted November 21, 2014 Great, thanks. Is this something that I can safely change without affecting anything else? You should be able to. All working perfectly now with all drives showing up as sdx. Thanks again. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.