danioj Posted January 12, 2015 Share Posted January 12, 2015 Hello All, Just as always happens. You start a little project and then something happens to put a spanner in the works! I woke up dying this morning. Dog Sick. So much so I've taken a day off work. For some reason my therapy at 5am this morning while I tried to get into a state to go back to bed was to sit at the computer and look at my Unraid home page. All good. For WHATEVER reason I clicked the syslog - I still don't know why, and I read this: tail -n 40 -f /var/log/syslog Jan 13 05:04:18 nas kernel: ASC=0x11 ASCQ=0x4 Jan 13 05:04:18 nas kernel: sd 1:0:0:0: [sdb] CDB: Jan 13 05:04:18 nas kernel: cdb[0]=0x28: 28 00 10 f4 57 80 00 01 00 00 Jan 13 05:04:18 nas kernel: end_request: I/O error, dev sdb, sector 284448672 Jan 13 05:04:18 nas kernel: ata1: EH complete Jan 13 05:04:21 nas kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 13 05:04:21 nas kernel: ata1.00: irq_stat 0x40000001 Jan 13 05:04:21 nas kernel: ata1.00: failed command: READ DMA EXT Jan 13 05:04:21 nas kernel: ata1.00: cmd 25/00:08:a0:57:f4/00:00:10:00:00/e0 tag 0 dma 4096 in Jan 13 05:04:21 nas kernel: res 51/40:08:a0:57:f4/00:00:10:00:00/e0 Emask 0x9 (media error) Jan 13 05:04:21 nas kernel: ata1.00: status: { DRDY ERR } Jan 13 05:04:21 nas kernel: ata1.00: error: { UNC } Jan 13 05:04:21 nas kernel: ata1.00: configured for UDMA/133 Jan 13 05:04:21 nas kernel: sd 1:0:0:0: [sdb] Unhandled sense code Jan 13 05:04:21 nas kernel: sd 1:0:0:0: [sdb] Jan 13 05:04:21 nas kernel: Result: hostbyte=0x00 driverbyte=0x08 Jan 13 05:04:21 nas kernel: sd 1:0:0:0: [sdb] Jan 13 05:04:21 nas kernel: Sense Key : 0x3 [current] [descriptor] Jan 13 05:04:21 nas kernel: Descriptor sense data with sense descriptors (in hex): Jan 13 05:04:21 nas kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Jan 13 05:04:21 nas kernel: 10 f4 57 a0 Jan 13 05:04:21 nas kernel: sd 1:0:0:0: [sdb] Jan 13 05:04:21 nas kernel: ASC=0x11 ASCQ=0x4 Jan 13 05:04:21 nas kernel: sd 1:0:0:0: [sdb] CDB: Jan 13 05:04:21 nas kernel: cdb[0]=0x28: 28 00 10 f4 57 a0 00 00 08 00 Jan 13 05:04:21 nas kernel: end_request: I/O error, dev sdb, sector 284448672 Jan 13 05:04:21 nas kernel: Buffer I/O error on device sdb1, logical block 35556076 Jan 13 05:04:21 nas kernel: ata1: EH complete Jan 13 05:06:21 nas logger: ./media/Movies/HD Movies/Movie 1 (1900) Jan 13 05:06:21 nas logger: .d..t...... media/Movies/HD Movies/Movie 1 (1900)/ Jan 13 05:06:21 nas logger: ./media/Movies/HD Movies Jan 13 05:06:21 nas logger: .d..t...... media/Movies/HD Movies/ Jan 13 05:06:21 nas logger: ./media/Movies Jan 13 05:06:21 nas logger: .d..t...... media/Movies/ Jan 13 05:06:21 nas logger: ./media/ Jan 13 05:06:21 nas logger: .d..t...... media/ Jan 13 05:06:21 nas logger: mover finished Jan 13 06:06:53 nas kernel: mdcmd (318): spindown 0 Jan 13 06:59:17 nas kernel: mdcmd (319): spindown 3 Jan 13 06:59:18 nas kernel: mdcmd (320): spindown 4 How would the experienced heads in the Unraid crowd interpret that log? sdb dying? Ta, Daniel EDIT: sdb is the cache drive. A new one in fact I installed a few weeks back. This was happening while the mover was running it appears. GUI doesn't report anything though (see newly attached) .... Link to comment
dgaschk Posted January 12, 2015 Share Posted January 12, 2015 Attach a SMART report and entire syslog. zip if needed. Link to comment
danioj Posted January 12, 2015 Author Share Posted January 12, 2015 Of course. Apologies. Here they are. Ignore the network stuff in the syslog as I was having router problems on Sunday. In addition I have stripped out logs pertaining to some ftp moves and the confirmation of file moves by the mover. Note that despite these errors no file failed to move from the Cache drive to the Array. They are all there. P.S.: I had not thought to run a SMART report on sdb as the Unraid status was "thumbs up". I am shocked to see that there are in fact errors in the SMART report. syslog_extract_20150113_0950.txt sdb_smart_report_20150113_0945.txt Link to comment
danioj Posted January 13, 2015 Author Share Posted January 13, 2015 OK. While I am dying here I decided to read about SMART reports and realised I had been ignorant about what they actually meant. Read this: http://lime-technology.com/wiki/index.php/Understanding_SMART_Reports Firstly I think I understand now why the SMART STATUS is showing as OK. Thats because according to the SMART report the overall STATUS is OK. I didn't realise also that only 3 variables contributed to that status (e.g. those that are marked pre_fail are critical) Raw_Read_Error_Rate, Spin_Up_Time and Reallocated_Secont_Ct. So if I am reading this right: Of the critical attributes: I have started to have more Raw Read Errors. No Reallocated Sectors and Spin Up Time is fine also. Of the none critical attributes: Clearly it has been powered on for a good number of hours (not surprising as it was a drive I used to have in my old ReadyNAS NV+) but all others seem ok. So what I am thinking here is, yes there was some read errors - but thats ok it happens. But I should keep an eye on the drive - specifically around the Raw Read Errors - but I shouldn't feel the need to replace it straight away. Is this stab at it right? Link to comment
danioj Posted January 14, 2015 Author Share Posted January 14, 2015 Hmmm - I must be wrong!!! Link to comment
dgaschk Posted January 14, 2015 Share Posted January 14, 2015 http://lime-technology.com/wiki/index.php/Troubleshooting#Resolving_a_Pending_Sector Link to comment
danioj Posted January 14, 2015 Author Share Posted January 14, 2015 http://lime-technology.com/wiki/index.php/Troubleshooting#Resolving_a_Pending_Sector Thanks for the reply. I don't understand though. Based on my interpretation of the SMART report, there are no pending sectors? I refer to this from the Wiki .... VALUE is almost always used as a normalized scale of perfectly good to perfectly bad, usually starting at VALUE=100, then dropping toward a worst case of VALUE=1. You can generally think of it as representing a scale starting at 100% good, then slowly dropping until failure at some predetermined percentage number, in the THRESHOLD column. Someone realized that if the values only run from 100 to 1, then they are wasting the possible values from 101 to 252, so some SMART programmers have decided to stretch the scale for certain attributes to start at 200 instead of 100, providing twice the data points. Unfortunately, which attributes are scaled from 200 to 1 is completely inconsistent, with almost all SMART reports showing some attributes starting at 100, and other attributes starting at 200. In addition, there are a few Maxtor and Samsung drives that took the start of the scale all the way to 252 or 253! Above, you see all but 1 attribute using 100, the exception being attribute 199 which starts at 200. In general, you can think of 200-type scales as 100 times 2 (just divide the number by 2), and from now on, that is what we are going to do in most of the discussion. So ..... on the basis that most of my attributes are at 100 (including pending sectors) doesn't that mean that is ok? Link to comment
itimpi Posted January 14, 2015 Share Posted January 14, 2015 Pending sectors is always an absolute value and ideally should be 0. A non-zero value does not mean that the disk is faulty - just that there are sectors which gave unreliable values when they were last read. This matters as pending sectors can stop a rebuild working perfectly so definitely want them to be cleared. Pending sectors are either cleared or changed to reallocated sectors the next time the sectors in question are written. Link to comment
danioj Posted January 14, 2015 Author Share Posted January 14, 2015 Pending sectors is always an absolute value and ideally should be 0. A non-zero value does not mean that the disk is faulty - just that there are sectors which gave unreliable values when they were last read. This matters as pending sectors can stop a rebuild working perfectly so definitely want them to be cleared. Pending sectors are either cleared or changed to reallocated sectors the next time the sectors in question are written. Ahhh, thank you. I misunderstood the wiki. It is only my cache drive so I've not been overly worried as the mover eventually succeeded in moving the files it needed to (and I've been limiting use somewhat) - but it is out of warranty so I'll backup the apps folders to my array and pre-clear the sucker at at the weekend. Will update once I'm done. Thanks again! Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.