wsume99 Posted October 29, 2010 Share Posted October 29, 2010 Well I am by no means an expert but there are lots of I/O errors in there and it looks to me like at some point unRAID had to reset the link to the drive because of this. Here are some of the lines from the report ... Oct 29 02:47:09 Titan kernel: ata5.00: failed command: CHECK POWER MODE Oct 29 02:47:09 Titan kernel: ata5.00: cmd e5/00:00:00:00:00/00:00:00:00:00/40 tag 0 Oct 29 02:47:09 Titan kernel: res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Oct 29 02:47:09 Titan kernel: ata5.00: status: { DRDY } Oct 29 02:47:09 Titan kernel: ata5: hard resetting link Maybe you have a bad power connection and/or SATA cable. Check the connections and cables. I'd recommend switching them with a known good power connector and SATA cable. Perhaps use ones that are connected to another drive that is working fine now. Then try the precelar again. Quote Link to comment
Seven Posted October 29, 2010 Share Posted October 29, 2010 Well I am by no means an expert but there are lots of I/O errors in there and it looks to me like at some point unRAID had to reset the link to the drive because of this. Here are some of the lines from the report ... Oct 29 02:47:09 Titan kernel: ata5.00: failed command: CHECK POWER MODE Oct 29 02:47:09 Titan kernel: ata5.00: cmd e5/00:00:00:00:00/00:00:00:00:00/40 tag 0 Oct 29 02:47:09 Titan kernel: res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Oct 29 02:47:09 Titan kernel: ata5.00: status: { DRDY } Oct 29 02:47:09 Titan kernel: ata5: hard resetting link Maybe you have a bad power connection and/or SATA cable. Check the connections and cables. I'd recommend switching them with a known good power connector and SATA cable. Perhaps use ones that are connected to another drive that is working fine now. Then try the precelar again. I checked the cables just now, even moved it to a different SATA port on the motherboard but unraid still throws errors about this disk when I reboot. I pulled the disk out of the array and put it on my external SATA dock on my Windows 7 machine and tried to check it using CrystalDiskInfo but it didn't even see the disk. The disk isn't showing up in the BIOS of my Windows PC at boot time either. So all signs point to a bad disk. Time to start the RMA process with Newegg. Thanks for your help! Seven Quote Link to comment
Joe L. Posted October 29, 2010 Share Posted October 29, 2010 Looks like preclear failed on a brand new disk I purchased to keep on hand as a spare drive. I've attached all of the logfiles and output I could find.... I did not find a smart_finish report in the /tmp directory. This is a new WD20EARS drive from Newegg and I did install a jumper over pins 7/8 before attaching to my server. Any thoughts or suggestions would be appreciated. Thanks! Either that or the SATA or POWER cable to it came loose. If it is the drive, it is good that it failed before you started using it for your data. FAR easier to RMA before you put it in the array. Joe L. Quote Link to comment
mangledjustice Posted October 30, 2010 Share Posted October 30, 2010 Looks like preclear failed on a brand new disk I purchased to keep on hand as a spare drive. I've attached all of the logfiles and output I could find.... I did not find a smart_finish report in the /tmp directory. This is a new WD20EARS drive from Newegg and I did install a jumper over pins 7/8 before attaching to my server. Any thoughts or suggestions would be appreciated. Thanks! Seven...... I experience that same exact issue with a brand new WD20EARS with jumpers on. Here is the thread i was asking if someone could help me out.http://lime-technology.com/forum/index.php?topic=8506.0 All the errors start showing up about 60% into the post read phase, the process just hang from there and would not proceed, so i stop it. I really cannot pinpoint what the problem was, cause no one chimed in. But from looking at the syslog it might have been some kind of connection issue of some sort be it data or power. Anyway i restarted the preclear process and it completed the process fully and the drive was ok as far as in know how to interpret the results. Good luck.... Quote Link to comment
JohnnyDrama Posted October 30, 2010 Share Posted October 30, 2010 Hey guys, Just completed my first ever unRAID build, specs are in my sig. Quite pleased at the ease of use, and speed at getting this up and running. I ran the preclear script on my 3 WD10EARS (jumpered) drives by opening 3 shells and running the script for each drive on a different shell, before adding them to the array and building parity for the first time. I noticed my third drive (sdc) logged some errors and I shrugged them off at the time as a couple of simple read errors, but now I'm wondering if I should have paid more attention before adding it to the array and letting it rebuild parity (sda). Just wondering if anyone can put my mind at ease? I've attached my full syslog as a ZIP in case it's needed or if anyone is just curious. == Disk /dev/sdc has been successfully precleared == == Ran 1 preclear-disk cycle == == Using :Read block size = 8225280 Bytes == Last Cycle's Pre Read Time : 7:01:54 (59 MB/s) == Last Cycle's Zeroing time : 7:27:25 (55 MB/s) == Last Cycle's Post Read Time : 16:58:26 (24 MB/s) == Last Cycle's Total Time : 31:28:54 == == Total Elapsed Time 31:28:54 == == Disk Start Temperature: 30C == == Current Disk Temperature: 35C, == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 19,20c19,20 < Offline data collection status: (0x80)^IOffline data collection activity < ^I^I^I^I^Iwas never started. --- > Offline data collection status: (0x84)^IOffline data collection activity > ^I^I^I^I^Iwas suspended by an interrupting command from host. 54c54 < 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 --- > 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 2 58c58 < 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 --- > 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 63c63 < 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 29 --- > 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 31 ============================================================================ syslog.zip Quote Link to comment
Joe L. Posted October 30, 2010 Share Posted October 30, 2010 Hey guys, Just completed my first ever unRAID build, specs are in my sig. Quite pleased at the ease of use, and speed at getting this up and running. I ran the preclear script on my 3 WD10EARS (jumpered) drives by opening 3 shells and running the script for each drive on a different shell, before adding them to the array and building parity for the first time. I noticed my third drive (sdc) logged some errors and I shrugged them off at the time as a couple of simple read errors, but now I'm wondering if I should have paid more attention before adding it to the array and letting it rebuild parity (sda). Just wondering if anyone can put my mind at ease? I've attached my full syslog as a ZIP in case it's needed or if anyone is just curious. == Disk /dev/sdc has been successfully precleared == == Ran 1 preclear-disk cycle == == Using :Read block size = 8225280 Bytes == Last Cycle's Pre Read Time : 7:01:54 (59 MB/s) == Last Cycle's Zeroing time : 7:27:25 (55 MB/s) == Last Cycle's Post Read Time : 16:58:26 (24 MB/s) == Last Cycle's Total Time : 31:28:54 == == Total Elapsed Time 31:28:54 == == Disk Start Temperature: 30C == == Current Disk Temperature: 35C, == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 19,20c19,20 < Offline data collection status: (0x80)^IOffline data collection activity < ^I^I^I^I^Iwas never started. --- > Offline data collection status: (0x84)^IOffline data collection activity > ^I^I^I^I^Iwas suspended by an interrupting command from host. 54c54 < 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 --- > 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 2 58c58 < 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 --- > 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 63c63 < 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 29 --- > 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 31 ============================================================================ All drives have raw read errors... Some report them some do not. The actual number reported is meaningful only to the manufacturer. Notice that the "Normalized" value of 200 is unchanged and nowhere near the failure threshold of 51. Your drive looks fine. The other changes were the values changing from the factory initialized 253 value to the starting "normalized" value of 200. Joe L. Quote Link to comment
abs0lut.zer0 Posted November 5, 2010 Share Posted November 5, 2010 Disk Temperature: 28C, Elapsed Time: 18:11:33 ============================================================================ == == Disk /dev/sdc has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 71c71 < 190 Airflow_Temperature_Cel 0x0022 076 070 000 Old_age Always - 24 (Lifetime Min/Max 24/24) --- > 190 Airflow_Temperature_Cel 0x0022 072 070 000 Old_age Always - 28 (Lifetime Min/Max 24/30) 78c78 < 201 Soft_Read_Error_Rate 0x000a 253 253 000 Old_age Always - 0 --- > 201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0 ============================================================================ This is my output of my one drive and i am not sure what i am looking for, i have read a some of the preclear thread but it's very involved. Is there any guide to what comes out of this script and what to look for. i hope it's not somewhere obvious cause i did look. thanks Hi Joe, The preclear_disk.sh script finally completed one cycle and below is the result. I suppose it is OK, right? Thanks, --Tom =========================================================================== = unRAID server Pre-Clear disk /dev/sdj = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Elapsed Time: 14:36:38 ============================================================================ == == Disk /dev/sdj has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 71c71 < 190 Airflow_Temperature_Cel 0x0022 075 075 000 Old_age Always - 25 (Lifetime Min/Max 25/26) --- > 190 Airflow_Temperature_Cel 0x0022 072 072 000 Old_age Always - 28 (Lifetime Min/Max 25/28) 77c77 < 200 Multi_Zone_Error_Rate 0x000a 253 253 000 Old_age Always - 0 --- > 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0 ============================================================================ After 14+ hours your disk temperature went from 25C to 28C. I'd say that in itself is not too serious but it does say you have some serious fans. The S.M.A.R.T. wiki here indicates that attribute 200 is 200 C8 Write Error Rate / Multi-Zone Error Rate The total number of errors when writing a sector. You started with the default initialized value of 253, and after a full 14+ hour pre-clear cycle, it has a normalized value of 100. The failure threshold is 0. You are nowhere close to the failure threshold value, so unless it changes over time, you are fine there too. Joe L. this answered my question... sorry if i wasted anyone's time... Quote Link to comment
burnaby_boy Posted November 5, 2010 Share Posted November 5, 2010 Greetings, Last week I started to get errors on one drive in the array. There were no errors in the parity check, but the drive itself was showing errors - 67 and then 98. I replaced the drive with a new one and the data was rebuilt successfully. I put the questionable drive in an test unRaid server and did a preclear. These are the results: ============================================================================ 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 42 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 78 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5673 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5674 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 10 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 1 ============================================================================ Do I have any reason to be concerned? Cheers Quote Link to comment
Joe L. Posted November 5, 2010 Share Posted November 5, 2010 Greetings, Last week I started to get errors on one drive in the array. There were no errors in the parity check, but the drive itself was showing errors - 67 and then 98. I replaced the drive with a new one and the data was rebuilt successfully. I put the questionable drive in an test unRaid server and did a preclear. These are the results: ============================================================================ 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 42 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 78 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5673 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5674 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 10 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 1 ============================================================================ Do I have any reason to be concerned? Cheers Probably... I would never expect to see a "Current Pending Sector" in the post-clear smart report. That's because they should all have been identified in the pre-read phase, and re-allocated in the writing of zeros. The post-zeroing phase should therefore not have detected any additional un-readable sectors, yet it appears it has. I'd run it through another cycle or two, and see if that last sector pending re-allocation gets re-allocated and no others show themselves. Sorry to say, but a continual trickle of un-readable sectors is not an indication of a healthy drive. If you only have a few, and the number do not increment when you continue to use the drive, then you should be OK. Otherwise, you are asking for constant random read errors. All un-readable sectors should re-allocate themselves when you preclear a drive, as it writes to every sector on the disk. Quote Link to comment
burnaby_boy Posted November 5, 2010 Share Posted November 5, 2010 Thanks for the feedback, Joe. It's a relatively new WD 1.5TB EARS (jumpered). Sounds like I should start an RMA. I certainly wouldn't feel comfortable putting it back in the array. Quote Link to comment
Joe L. Posted November 5, 2010 Share Posted November 5, 2010 Thanks for the feedback, Joe. It's a relatively new WD 1.5TB EARS (jumpered). Sounds like I should start an RMA. I certainly wouldn't feel comfortable putting it back in the array. ok your choice, but they will not consider it failed... most disks have several thousand spare sectors, and you've apparently used NONE of them. Look closer... you had 10 pending re-allocation at the start, and most, if not all were NOT re-allocated, but instead were re-written in place in their original sectors. (You did not show any output in the Reallocated Sector counter, so it must not have changed) To me that indicates a different class of problem, one where the writing to the sector was poorly done and when re-written it worked. That could easily be a vibration issue, or a power supply issue, with noise on the power supply leading to a poor quality written sector. Of course it could be poor electronics in the drive itself too, but it points less towards defective magnetic surface on the platters. Quote Link to comment
burnaby_boy Posted November 5, 2010 Share Posted November 5, 2010 OK, well I will try a couple more rounds with preclear and see what the results are. Thanks again for your analysis. Quote Link to comment
Traxxus Posted November 6, 2010 Share Posted November 6, 2010 I'm having problems preclearing a new disk i got, it starts out fine on step one but after about 10 mins it slows down to a crawl, like ~4 MB/s down from ~100/s, it only updates the screen every 5 mins or so at this point as well, it should be doing so every 10 seconds I believe. I stopped the process and restarted a couple times, and restarted the server but it is still doing the same thing. This isn't how it should go is it? Syslog attached, let me know if there is another log somewhere that will help. Edit: It's been an hour now and only 3% into the pre read. Edit part deux: Now it's back up to 100/s, am I just being too impatient? Getting errors now, updated syslog here http://pastebin.ca/1983708 Quote Link to comment
Joe L. Posted November 6, 2010 Share Posted November 6, 2010 I'm having problems preclearing a new disk i got, it starts out fine on step one but after about 10 mins it slows down to a crawl, like ~4 MB/s down from ~100/s, it only updates the screen every 5 mins or so at this point as well, it should be doing so every 10 seconds I believe. I stopped the process and restarted a couple times, and restarted the server but it is still doing the same thing. This isn't how it should go is it? Syslog attached, let me know if there is another log somewhere that will help. Edit: It's been an hour now and only 3% into the pre read. Edit part deux: Now it's back up to 100/s, am I just being too impatient? Getting errors now, updated syslog here http://pastebin.ca/1983708 You are getting tons of "media errors" (Un-readable sectors) If you get a smart report on the drive you'll see them as sectors pending re-allocation... ( or you can wait, the final preclear report will show them too) Joe L. Quote Link to comment
Traxxus Posted November 6, 2010 Share Posted November 6, 2010 I'll just wait, not in a huge hurry to add it and it's the weekend anyway. Kind of interesting to see what happens with bad hard drives so I can recognize them in the future. I'll RMA it on Monday. This is the third hard drive I have acquired for my server, but the first one I have used preclear on. It's a good thing I did, intended to use this one as a parity drive. Quote Link to comment
Joe L. Posted November 7, 2010 Share Posted November 7, 2010 I'll just wait, not in a huge hurry to add it and it's the weekend anyway. Kind of interesting to see what happens with bad hard drives so I can recognize them in the future. I'll RMA it on Monday. This is the third hard drive I have acquired for my server, but the first one I have used preclear on. It's a good thing I did, intended to use this one as a parity drive. You would not have been happy having that drive in your server. Yes, most drives preclear just fine... and then there are those, like yours, that show their true colors. Just think, 99.99 % of the people installing drives have no idea at all if they are readable, and most will only know when their program/and or data is subsequently unreadable. Most will blame it on Microsoft... Joe L. Quote Link to comment
Traxxus Posted November 7, 2010 Share Posted November 7, 2010 Indeed, just glad i decided to give it a try. The preclear finally moved onto the writing to disk stage, and started putting out a different error, this error kept repeating in the syslog until it was over a GB in size, and it was only 7% done, so I had to close it out, didn't want it to crash the server. Obviously cant post it all, but its basically just this repeating over and over added onto the previous syslog Nov 6 18:17:39 Tower kernel: Descriptor sense data with sense descriptors (in hex): Nov 6 18:17:39 Tower kernel: 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 Nov 6 18:17:39 Tower kernel: 05 ef 2f 68 Nov 6 18:17:39 Tower kernel: sd 1:0:0:0: [sda] ASC=0x0 ASCQ=0x0 Nov 6 18:17:39 Tower kernel: sd 1:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 05 ef 2f 68 00 04 00 00 Nov 6 18:17:39 Tower kernel: end_request: I/O error, dev sda, sector 99561320 Nov 6 18:17:39 Tower kernel: ata1: EH complete Nov 6 18:17:39 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Nov 6 18:17:39 Tower kernel: ata1.00: irq_stat 0x40000001 Nov 6 18:17:39 Tower kernel: ata1.00: failed command: WRITE DMA EXT Nov 6 18:17:39 Tower kernel: ata1.00: cmd 35/00:00:68:33:ef/00:04:05:00:00/e0 tag 0 dma 524288 out Nov 6 18:17:39 Tower kernel: res 61/04:00:68:33:ef/4d:04:05:00:00/e0 Emask 0x1 (device error) Nov 6 18:17:39 Tower kernel: ata1.00: status: { DRDY DF ERR } Nov 6 18:17:39 Tower kernel: ata1.00: error: { ABRT } Nov 6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1) Nov 6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1) Nov 6 18:17:39 Tower kernel: ata1.00: configured for UDMA/133 (device error ignored) Nov 6 18:17:39 Tower kernel: ata1: EH complete Nov 6 18:17:39 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Nov 6 18:17:39 Tower kernel: ata1.00: irq_stat 0x40000001 Nov 6 18:17:39 Tower kernel: ata1.00: failed command: WRITE DMA EXT Nov 6 18:17:39 Tower kernel: ata1.00: cmd 35/00:00:68:33:ef/00:04:05:00:00/e0 tag 0 dma 524288 out Nov 6 18:17:39 Tower kernel: res 61/04:00:68:33:ef/4d:04:05:00:00/e0 Emask 0x1 (device error) Nov 6 18:17:39 Tower kernel: ata1.00: status: { DRDY DF ERR } Nov 6 18:17:39 Tower kernel: ata1.00: error: { ABRT } Nov 6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1) Nov 6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1) Nov 6 18:17:39 Tower kernel: ata1.00: configured for UDMA/133 (device error ignored) Nov 6 18:17:39 Tower kernel: ata1: EH complete Nov 6 18:17:39 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Nov 6 18:17:39 Tower kernel: ata1.00: irq_stat 0x40000001 Nov 6 18:17:39 Tower kernel: ata1.00: failed command: WRITE DMA EXT Nov 6 18:17:39 Tower kernel: ata1.00: cmd 35/00:00:68:33:ef/00:04:05:00:00/e0 tag 0 dma 524288 out Nov 6 18:17:39 Tower kernel: res 61/04:00:68:33:ef/4d:04:05:00:00/e0 Emask 0x1 (device error) Nov 6 18:17:39 Tower kernel: ata1.00: status: { DRDY DF ERR } Nov 6 18:17:39 Tower kernel: ata1.00: error: { ABRT } Nov 6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1) Nov 6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1) Nov 6 18:17:39 Tower kernel: ata1.00: configured for UDMA/133 (device error ignored) Nov 6 18:17:39 Tower kernel: ata1: EH complete Nov 6 18:17:39 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Nov 6 18:17:39 Tower kernel: ata1.00: irq_stat 0x40000001 Nov 6 18:17:39 Tower kernel: ata1.00: failed command: WRITE DMA EXT Nov 6 18:17:39 Tower kernel: ata1.00: cmd 35/00:00:68:33:ef/00:04:05:00:00/e0 tag 0 dma 524288 out Nov 6 18:17:39 Tower kernel: res 61/04:00:68:33:ef/4d:04:05:00:00/e0 Emask 0x1 (device error) Nov 6 18:17:39 Tower kernel: ata1.00: status: { DRDY DF ERR } Nov 6 18:17:39 Tower kernel: ata1.00: error: { ABRT } Nov 6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1) Nov 6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1) Nov 6 18:17:39 Tower kernel: ata1.00: configured for UDMA/133 (device error ignored) Nov 6 18:17:39 Tower kernel: ata1: EH complete Nov 6 18:17:39 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Nov 6 18:17:39 Tower kernel: ata1.00: irq_stat 0x40000001 Nov 6 18:17:39 Tower kernel: ata1.00: failed command: WRITE DMA EXT Nov 6 18:17:39 Tower kernel: ata1.00: cmd 35/00:00:68:33:ef/00:04:05:00:00/e0 tag 0 dma 524288 out Nov 6 18:17:39 Tower kernel: res 61/04:00:68:33:ef/4d:04:05:00:00/e0 Emask 0x1 (device error) Nov 6 18:17:39 Tower kernel: ata1.00: status: { DRDY DF ERR } Nov 6 18:17:39 Tower kernel: ata1.00: error: { ABRT } Nov 6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1) Nov 6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1) Nov 6 18:17:39 Tower kernel: ata1.00: configured for UDMA/133 (device error ignored) Nov 6 18:17:39 Tower kernel: ata1: EH complete Nov 6 18:17:39 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Nov 6 18:17:39 Tower kernel: ata1.00: irq_stat 0x40000001 Nov 6 18:17:39 Tower kernel: ata1.00: failed command: WRITE DMA EXT Nov 6 18:17:39 Tower kernel: ata1.00: cmd 35/00:00:68:33:ef/00:04:05:00:00/e0 tag 0 dma 524288 out Nov 6 18:17:39 Tower kernel: res 61/04:00:68:33:ef/4d:04:05:00:00/e0 Emask 0x1 (device error) Nov 6 18:17:39 Tower kernel: ata1.00: status: { DRDY DF ERR } Nov 6 18:17:39 Tower kernel: ata1.00: error: { ABRT } Nov 6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1) Nov 6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1) Nov 6 18:17:39 Tower kernel: ata1.00: configured for UDMA/133 (device error ignored) Nov 6 18:17:39 Tower kernel: sd 1:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08 Nov 6 18:17:39 Tower kernel: sd 1:0:0:0: [sda] Sense Key : 0xb [current] [descriptor However, smart test for the drive shows up as fine with no errors, no sectors pending or reallocated. There are a bunch of other fault codes though. It is attached. smart.txt Quote Link to comment
Joe L. Posted November 7, 2010 Share Posted November 7, 2010 As you said, writes to the disk were failing... and eventually the syslog might have used up all RAM and crashed the server. Good you killed it. Joe L. Quote Link to comment
Traxxus Posted November 7, 2010 Share Posted November 7, 2010 All right, thanks for the clarifications, havent seen a normal preclear yet so didn't really have a baseline for it. Hopefully the RMA will be up to snuff. Quote Link to comment
SSD Posted November 7, 2010 Share Posted November 7, 2010 Just wanted to point out that you cannot draw conclusions about whether a drives is failing by looking at the syslog. Only by seeing reallocated sectors or or failed attributes in a smart report will you know the drive itself in the problem. It is MUCH more common for syslog errors to be traced back to a cabling / backplane issue. So look at your SMART report and confirm the reallocated sectors are increasing. Otherwise you may have cabling issues in addition to a suspect disk. Quote Link to comment
burnaby_boy Posted November 7, 2010 Share Posted November 7, 2010 I ran preclear 2 more times - on the first, the current pending sectors dropped by one, but then this is what I got on the second: ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 54c54 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 96 --- 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 132 63c63 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5682 --- 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5683 65c65 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 9 --- 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 46 ============================================================================ It seems to be getting worse. Do I just keep running preclear until the drive fails? - and then RMA it? Cheers Quote Link to comment
SSD Posted November 7, 2010 Share Posted November 7, 2010 I ran preclear 2 more times - on the first, the current pending sectors dropped by one, but then this is what I got on the second: ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 54c54 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 96 --- 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 132 63c63 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5682 --- 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5683 65c65 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 9 --- 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 46 ============================================================================ It seems to be getting worse. Do I just keep running preclear until the drive fails? - and then RMA it? Cheers I would not keep running. RMA the disk. Quote Link to comment
Joe L. Posted November 7, 2010 Share Posted November 7, 2010 Since the sectors seem to be re-written to their existing locations (I see no change in the re-allocated sector count, only "pending sector") I would not be so quick to point the finger at the disk drive. It could be the drive, but it could just as easily be the power supply. Basically, the drive seems to be able to re-write the sectors in their existing locations. The question is what made the "writing" of the sector un-reliable the first time? Was it the disk itself? vibration? noise on the power supply? Almost impossible to tell from an outsiders point of view. Since there are continual sectors pending re-allocation I'd just start an RMA stating that fact. You'll probably never get the drive to fail a smart test, at least not in the next weeks/months. Joe L. Quote Link to comment
burnaby_boy Posted November 7, 2010 Share Posted November 7, 2010 The PSU's in both computers, that this HDD has been connected to, both seem to be sound. In the main unRaid unit (using a Corsair TX650W PSU) there are 11 other HDD's that seem to be free of the issues that this one has. In terms of vibration, when it was a part of the main array it was secured using silicone grommets. As you suggest, I will RMA, but include a note mentioning the continual sectors pending re-allocation. Thanks again for the assistance. Cheers Quote Link to comment
slarco Posted November 10, 2010 Share Posted November 10, 2010 I just precleared a new drive. Can anyone take a look at this please? Seems to be normal to me but i just want to make sure before add the disk to the array. Syslog Attached. Thanks in advance!! syslogData3-10-11-2010.txt Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.