June 13, 201214 yr I have ran the parity check twice with the correct option checkmarked. I have recently swapped a drive that was failing in the smart logs with another drive. The new to me drive passed the pre clear check. other than that the only other thing i did was update 5.0 beta14 to 5.0 rc4. thanks for your help syslog-2012-06-12.zip
June 13, 201214 yr Author 3rd scan took away the error. if someone sees anything in the log thats important please let me know but the error is now gone...thanks
June 13, 201214 yr You have problems related to 2 drives. You have a bad sector (media error, UNC) on the parity drive (sda, WDC_WD30EZRX-00MMMB0_WD-WCAWZ1942728), which is probably responsible for the single parity error that has been occurring. Although you may have a successful parity check now, I would still obtain a SMART report for the drive, probably do a SMART long test on it (see the Troubleshooting page, Obtaining a SMART report section and the following SMART test section). You also have numerous 'device errors' spewing from Disk 3 (sdd, WDC_WD20EARS-00S8B1_WD-WCAVY4213919, ata4 on fourth onboard SATA port). Unfortunately, I cannot tell exactly what the problem is from the error messages. There may be something configured wrong with the drive internally, or failures in seeking. I say that because all or most of the errors are from a failed attempt to read sector 3907029167, which just happens to be the very last sector on the drive! Why does it want the last sector? I suspect this is because the kernel at first setup of the drive does a quick test by trying to read the first sector, then seeking to the last track and trying to read the very last sector, which would be a very quick way to test that a drive is fully operational. Other possibilities - bad SATA port, if you had another SATA port available, I would recommend switching to it, because one possibility is that there is something wrong with that port. I believe the cables are fine. Another possibility is the power to that drive, perhaps it is sharing a power line (rail) with too many other drives. I would get an hdparm report on the drive, checking for total number of sectors (should be equal to 3907029168, and I would get a SMART report on the drive, plus a SMART short test. We can decide what to do next after checking these results.
June 14, 201214 yr Author thank you. i will run smart test on parity drive. thats a brand new drive ran 3 pre clear tests on that and passed each one didnt bother to do a smart test on it but i will. Drive 3 i am aware of problems i am currently having 1 drive shipped to me from WD for an RMA once i get that i will replace disk 3. which drive do you want an hdparm test on?
June 14, 201214 yr thank you. i will run smart test on parity drive. thats a brand new drive ran 3 pre clear tests on that and passed each one didnt bother to do a smart test on it but i will. If you have just recently Precleared the parity drive, see if you can locate the Preclear reports for it. I would not be surprised to see a Current Pending Sectors value of 1 or more, on the last Preclear SMART report. The commands you want are: (assuming drive ID stays sda, change if necessary) smartctl -a -d ata /dev/sda >/boot/smart_WD30EZRX-2728_pre-test.txt smartctl -d ata -tlong /dev/sda # now wait 2 to 12 hours depending on size and speed of drive, about 10 hours I think for this drive smartctl -a -d ata /dev/sda >/boot/smart_WD30EZRX-2728_post-test.txt which drive do you want an hdparm test on? Disk 3 Drive 3 i am aware of problems i am currently having 1 drive shipped to me from WD for an RMA once i get that i will replace disk 3. I think it's premature for an RMA yet, as we don't know for sure that there are any problems with the drive. Many drive errors are actually interface issues, such as the cabling, disk controller, port, or the power to the drive. Let's check the hdparm and SMART reports and result of the SMART test first. Commands should be: (assuming drive ID stays sdd, change if necessary) hdparm -I /dev/sdd >/boot/hdparm_WD20EARS-3919.txt smartctl -a -d ata /dev/sdd >/boot/smart_WD20EARS-3919_pre-test.txt smartctl -d ata -tshort /dev/sdd # now wait 5 to 10 minutes smartctl -a -d ata /dev/sdd >/boot/smart_WD20EARS-3919_post-test.txt The 5 reports will be in the root of the flash. Please attach them to another post (zipping them together creates an easy package to copy and attach). Add the Preclear reports if you like, especially if they have Current_Pending_Sector and/or Reallocated_Sector_Ct changes.
June 14, 201214 yr Author If you have just recently Precleared the parity drive, see if you can locate the Preclear reports for it. I would not be surprised to see a Current Pending Sectors value of 1 or more, on the last Preclear SMART report. The commands you want are: (assuming drive ID stays sda, change if necessary) How recent is recent? I did this probably a good 2 weeks ago if not 3 weeks even... I think it's premature for an RMA yet, as we don't know for sure that there are any problems with the drive. Many drive errors are actually interface issues, such as the cabling, disk controller, port, or the power to the drive. Let's check the hdparm and SMART reports and result of the SMART test first. Commands should be: (assuming drive ID stays sdd, change if necessary) will the smart test i ran for my other post still be relevant? I have changed sata cables (purchased from monoprice) the sata power is directly from my corsair PSU not an extender or splitter.. it was advised for me to replace two drives it was disk 1 (which is now a different disk) and disk 3 which i will be replacing soon. pending current findings i reckon... http://lime-technology.com/forum/index.php?topic=19475.15 Ill attach the results in the morning once they finish
June 14, 201214 yr Author here are the results...thanks again! New drive came in so let me know if i need to replace disk3 hdparm_WD20EARS-3919.zip
June 15, 201214 yr I apologize but I clearly underestimated how long the tests would take, neither test was done when you obtained their final SMART reports. Both claimed to be around 10% testing left, but that has to be interpreted very loosely, as less than 20% remaining. The long test probably needed close to 15 hours, and the short test 10 to 12 minutes. Guess I'm not used to the huge drives users have now-a-days! Could you obtain one more SMART report for each of those 2 drives, with names similar to the post test names, and attach them too? What is apparent already though, is that the parity drive seems to have dealt successfully with that bad sector, without having to reallocate it. The WD20EARS though has had a problem with that very last sector for quite awhile. You ran SMART tests about 1100 and 2800 operational hours ago on that drive, and each time it reported a Read Failure on that very last sector. I have to assume therefore that once you introduced this drive to UnRAID, it has been spewing those errors into the syslog every single time, and has probably been affecting server performance too. I suspect you did not Preclear this drive. It's a bazaar problem. I would like to see its final SMART report. I would start Preclearing your new drive.
June 15, 201214 yr Author I apologize but I clearly underestimated how long the tests would take, neither test was done when you obtained their final SMART reports. Both claimed to be around 10% testing left, but that has to be interpreted very loosely, as less than 20% remaining. The long test probably needed close to 15 hours, and the short test 10 to 12 minutes. Guess I'm not used to the huge drives users have now-a-days! Could you obtain one more SMART report for each of those 2 drives, with names similar to the post test names, and attach them too? i will run them again...and make sure i give them both a decent amount of time... What is apparent already though, is that the parity drive seems to have dealt successfully with that bad sector, without having to reallocate it. so what does this mean for me? will this be a problem later on... RMA? The WD20EARS though has had a problem with that very last sector for quite awhile. You ran SMART tests about 1100 and 2800 operational hours ago on that drive, and each time it reported a Read Failure on that very last sector. I have to assume therefore that once you introduced this drive to UnRAID, it has been spewing those errors into the syslog every single time, and has probably been affecting server performance too. I suspect you did not Preclear this drive. It's a bazaar problem. I would like to see its final SMART report. you are correct. i did not preclear this drive..only started doing so in the last few drives i have introduced...only because i was unaware of preclearing... I would start Preclearing your new drive. actually i am out of sata ports on my board so i will have to replace a drive for me to add this one at the moment...so we will see what comes of disk3
June 15, 201214 yr Author i am doing the scan via command prompt using the telnet feature...i accidently just put my computer to sleep and woke it right back up...will this stop the scan :'(
June 15, 201214 yr Just as a side note if you have a spare computer and usb stick you can preclear it on another computer you can boot unraid on, It does not have to be in the same system. Just in case you wanted to get a head start.
June 15, 201214 yr Use "screen" to prevent the headaches of disconnecting from a telnet session as well. Check config tutorial for setup..
June 15, 201214 yr Author damnit i checked the results it said aborted by user 90%!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ill look into screen thank you @mbryanr lol i do not have a spare computer at home that i can use to preclear the drive but thats a good idea i can probably do it at work thanks!.............starting scans again )= **turns out i have installed screen...problem is im not that bright i see the instructions for preclearing with screen...currently i am just running smart scans via telnet can smart scan be used with screen?
June 15, 201214 yr SMART does not require screen. It runs in the background and results are shown in the normal smartctl report. Make sure the disk is set to never spin down in unRAID. A spin down request will abort the test.
June 16, 201214 yr Author SMART does not require screen. It runs in the background and results are shown in the normal smartctl report. Make sure the disk is set to never spin down in unRAID. A spin down request will abort the test. ahhh ok thanks ill do that...trying to install smart history in unmenu but i screwed something up lol
June 18, 201214 yr Author ok so i let the short scan go for almost 4 hours the long scan let go for 24 hours...hope it finished :'( smart_WD30EZRX-2728_pre-test.zip
June 18, 201214 yr EARS-3919: Didn't finish due to read failure.. Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 60% 13908 3907029167 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1 your long test on the EZRX-2728 was aborted by host. # 1 Extended offline Aborted by host 60% 1166 -
June 18, 201214 yr Author damnit i dont know why it keeps saying i abort it. i stopped disk from spinning down and i didnt close the telnet or shut off comp...)= the otherdrive that failed because of read error are we thinking replacement on that one??
June 18, 201214 yr Run pre-clear on both of the drives. It may resolve the read error on the first drive.
June 23, 201214 yr Author ok sorry i have been MIA from this i was running the scans individually based off the wiki info just in case something went wrong running the codes at once...i precleared my parity(sda) drive 3 times before i added it to the unraid it passed all three times im not sure preclearing that drive would do anything at this point? im all for trying to preclear disk 3(sdd) though...here are the test ran individually..thanks again SMART.zip
June 23, 201214 yr Your parity drive looks great. Whatever issue it had with a sector was completely resolved, without even having to remap it. It is a little odd in how quickly it was dealt with. None of the SMART tests you ran on that drive showed any issue at all, yet the syslog clearly indicates a media error with flag UNC, at about 2 hours through a 12 hour parity check. I mention those times just in case it ever happens again, because it's possible there is a marginal sector about a sixth of the way across that 3TB drive. Disk 3 still has a bazaar issue, one that seriously impacts your performance. I missed in your first hdparm report the fact that it is running at UDMA2 speed (same as UDMA/33), which is about quarter speed. Every test reports the same thing, that it cannot read the final sector of the drive, and none of the tests were able to convince the drive to deal with it. You don't need that sector, and the Reiser file system is not using that sector (the drive mounts without issue), but the kernel drive modules insist on accessing that sector on first setup, and because they cannot access it, they insist that the speed to that drive be slowed way down, wrongly thinking that that might help! It IS still possible that a Preclear will force the drive to fix the issue. Otherwise, in effect, the drive is not suitable for UnRAID, for Linux actually. I suspect it might be fine in Windows. If you were to resize the partition, trim it a little at the end, then you should be able to reintroduce it to UnRAID, but I'm not sure it's worth the effort. Thank you any way for an interesting problem, though I doubt that is any comfort!
June 23, 201214 yr Author thank you @Robj so heres what ill do...ill do a preclear to disk3 then ill post smart scans and sys log again. i had started another thread and thoughts were that the problem was related to disk 3 let me know what you think. it appears that since i updated to UnRaid V 5.0-rc4 that the WebGUI has been REALLY slow at responding. http://lime-technology.com/forum/index.php?topic=20835.msg184878#msg184878? ill get the preclear scans started...should i do multiple preclear or would one be sufficient? thanks again
June 23, 201214 yr thank you @Robj so heres what ill do...ill do a preclear to disk3 then ill post smart scans and sys log again. i had started another thread and thoughts were that the problem was related to disk 3 let me know what you think. it appears that since i updated to UnRaid V 5.0-rc4 that the WebGUI has been REALLY slow at responding. http://lime-technology.com/forum/index.php?topic=20835.msg184878#msg184878? ill get the preclear scans started...should i do multiple preclear or would one be sufficient? thanks again Just a quick response, one Preclear is enough. It either fixes it or it doesn't. There is no point to keep beating your head against the same wall. Now I'll check that syslog from the other thread...
June 23, 201214 yr Author thanks! something i just thought about...this is an EARS drive and it used to be that i had to put a jumper on the drive which i do not beileve you have to do any more with the newer unRAID OS...so the jumper is still on there should i use preclear_disk.sh -t /dev/sdX rather than ./preclear_disk.sh /dev/sdX because of the jumper? or should i just remove the jumper?
June 23, 201214 yr something i just thought about...this is an EARS drive and it used to be that i had to put a jumper on the drive which i do not believe you have to do any more with the newer unRAID OS...so the jumper is still on there should i use preclear_disk.sh -t /dev/sdX rather than ./preclear_disk.sh /dev/sdX because of the jumper? I have not had to deal with the WD jumpers, so others are more knowledgeable, but I believe they would say "once jumpered, always jumpered, nothing to change". The -t option is just a quick test whether drive is already Precleared, probably not the option you wanted, but I don't think you need any options. Looked at the other syslog, and yes this same drive is probably causing all your problems. Because disk issues often involve small waits (typically 5 seconds but yours are 2 to 3 seconds), that generally leads to system delays and slow operation. Your Disk 3 is for long periods trying to read that last drive sector and failing, causing very slow operation during those periods. In that syslog, you booted at Jun 19 20:06:08, and the drive continued spewing errors until Jun 19 20:40:11, took a short break, then continued spewing at Jun 19 21:17:08 until Jun 19 21:19:36, then took some time off until Jun 21 07:12:03, and resumed spewing through the end of the syslog at Jun 21 07:14:14. The sooner you fix or remove that drive, the better for your system!
Archived
This topic is now archived and is closed to further replies.