funhur Posted September 14, 2012 Share Posted September 14, 2012 Hi, my disk4(sdg) was red balled during a mover operation. From my research, I should be trusting my parity drive as mover completes after disk was disabled So I took it out of array and check the cable connections Start tower up, disk is still unassigned in array config proceed with a preclear operation to see if there is any problem with the disk. Report looks normal to me. proceed to assign back to array and start data rebuild data rebuild completes successfully then i proceed with a parity check and the disk becomes red balled almost immediately I am using 5.0-rc6-r8168-test and disk 4 is connected to a SM AOC-SAS2LP-MV8 The disk is a Samsung F4 2Tb Appreciate if you can advise what could be wrong? Thanks in advance Attaching the preclear reports and syslog preclear_reports_S2H7J90B203119_2012-09-13.zip syslog.20120914.txt Link to comment
kenoka Posted September 14, 2012 Share Posted September 14, 2012 Your syslog doesn't seem to say a whole lot. It looks like a report taken just after your most recent reboot. What we'd need to see is the syslog from when the failure occurred. Whenever a drive is questionable, you should probably run a SMART check. Run that and post the results please. On your preclear run, there are a couple of worrying items. "Program_Fail_Cnt_Total" has a crazy high number: 16937351. After some internet snooping, it looks like a common complaint for this drive. I'm not sure how much of a worry it actually is. The second is "G-Sense Error Rate", which should probably not be showing up at all in a desktop machine, except in an earthquake. However there are no reallocated sectors, or pending sectors, or UDMA CRC errors. Link to comment
funhur Posted September 14, 2012 Author Share Posted September 14, 2012 @kenoka The failure you referred is when it red balled during the mover operation? Let me check for the older syslogs. I had shut down the tower to check the cable connections. Noted the "G-Sense Error Rate". Will do a smart check when I get home later thanks Link to comment
dgaschk Posted September 14, 2012 Share Posted September 14, 2012 Both G-Sense Error Rate and Program_Fail_Cnt_Total are unchanged from their initial VALUEs of 100. The raw numbers have meaning only to Samsung and should not be used to determine drive health. Run a long SMART test. Link to comment
funhur Posted September 14, 2012 Author Share Posted September 14, 2012 I can't find any previous syslog in /var/log, the log file looks be overwritten every time the tower reboot. Are they kept in other directories ? Just started the long smart test Link to comment
kenoka Posted September 14, 2012 Share Posted September 14, 2012 No, you need to capture the current syslog prior to shutdown. It restarts with every boot. Link to comment
funhur Posted September 14, 2012 Author Share Posted September 14, 2012 Attaching the long SMART results. Is it normal as I see some read errors towards the end but overall health passed thanks smart-sdg-rpt.txt Link to comment
dgaschk Posted September 15, 2012 Share Posted September 15, 2012 The long test failed: # 1 Extended offline Completed: read failure 90% 2300 51264 Link to comment
Joe L. Posted September 15, 2012 Share Posted September 15, 2012 The long test failed: # 1 Extended offline Completed: read failure 90% 2300 51264 It would if you had the disk spin down. Did you disable the spin-down feature of unRAID while the test was in progress? Link to comment
funhur Posted September 15, 2012 Author Share Posted September 15, 2012 I did not disable spin down, however I have took the disk out of array before the long test. Does this exclude the spin down factor? So does the failed test means I have a spoilt disk? thanks Link to comment
Joe L. Posted September 15, 2012 Share Posted September 15, 2012 I did not disable spin down, however I have took the disk out of array before the long test. Does this exclude the spin down factor?Probably. So does the failed test means I have a spoilt disk? thanks No, because there are no sectors pending re-allocation, nor any re-allocated. It meant that one sector's contents did not match the checksum at the end of that sector, and it was apparently successfully re-written in place. Joe L. Link to comment
funhur Posted September 17, 2012 Author Share Posted September 17, 2012 Hi test repeated and its the same results. SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 2348 51264 # 2 Extended offline Completed: read failure 90% 2300 51264 # 3 Short offline Completed without error 00% 2258 - Any smart-sdg-rpt1.txt Link to comment
funhur Posted September 17, 2012 Author Share Posted September 17, 2012 I found the following but my drive is Feb-11 Article stated drives manufactured December 2010 or later include the firmware patch So i guess I do not need this patch right ? http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks Link to comment
dgaschk Posted September 17, 2012 Share Posted September 17, 2012 Check on the Seagate site. Link to comment
funhur Posted September 20, 2012 Author Share Posted September 20, 2012 Hi, here's an update. Had me on false hopes but I have the syslog for the error now Checked that my F4 do not need firmware patching Run preclear on the drive Ran the long SMART test <-- complete without error Perform data-rebuilt Ran the long SMART test <-- complete without error Perform parity check NOCORRECT <--- complete without error Reboot the tower for any problems<-- array started well, no error Proceed to copy some files over to cache disk Manual activate mover script Disk4 become red balled and disabled Any idea why it becomes red balled ? lotsa write errors in syslog smart-sdg-aft-preclear.txt smart-sdg-long-aft-rebuilt.txt syslog.20120920.txt.zip Link to comment
dgaschk Posted September 20, 2012 Share Posted September 20, 2012 Hi, here's an update. Had me on false hopes but I have the syslog for the error now Checked that my F4 do not need firmware patching Run preclear on the drive Ran the long SMART test <-- complete without error Perform data-rebuilt Ran the long SMART test <-- complete without error Perform parity check NOCORRECT <--- complete without error Reboot the tower for any problems<-- array started well, no error Proceed to copy some files over to cache disk Manual activate mover script Disk4 become red balled and disabled Any idea why it becomes red balled ? lotsa write errors in syslog What does a current SMART report show? Link to comment
funhur Posted September 20, 2012 Author Share Posted September 20, 2012 SMART report cannot be run immediately after the error. Will reboot the tower to test root@Tower:~# smartctl -a /dev/sdg smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Short INQUIRY response, skip product id A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. root@Tower:~# smartctl -a -T permissive /dev/sdg smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Short INQUIRY response, skip product id Log Sense failed, IE page [scsi response fails sanity test] defect list format 6 unknown Grown defect list length=12078 bytes [unknown number of elements] Error Counter logging not supported Device does not support Self Test logging Link to comment
funhur Posted September 20, 2012 Author Share Posted September 20, 2012 Here's the long smart report, as before, it stops without completion. There is a difference this time. Does it mean hdd is failing? 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 2 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 smart-sdg-0921.txt Link to comment
dgaschk Posted September 21, 2012 Share Posted September 21, 2012 This drive is problematic. Run pre-clear or several on it and if the pending sector count goes to zero and stays there then the disk should be ok. RMA is an option. Link to comment
funhur Posted September 21, 2012 Author Share Posted September 21, 2012 Thank you all for the help rendered. I will RMA the drive, far too much time is spent on this Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.