Errors in Syslog - help to interpret please

danioj · January 12, 2015

Hello All,

Just as always happens. You start a little project and then something happens to put a spanner in the works!

I woke up dying this morning. Dog Sick. So much so I've taken a day off work. For some reason my therapy at 5am this morning while I tried to get into a state to go back to bed was to sit at the computer and look at my Unraid home page. All good. For WHATEVER reason I clicked the syslog - I still don't know why, and I read this:

tail -n 40 -f /var/log/syslog

Jan 13 05:04:18 nas kernel: ASC=0x11 ASCQ=0x4

Jan 13 05:04:18 nas kernel: sd 1:0:0:0: [sdb] CDB:

Jan 13 05:04:18 nas kernel: cdb[0]=0x28: 28 00 10 f4 57 80 00 01 00 00

Jan 13 05:04:18 nas kernel: end_request: I/O error, dev sdb, sector 284448672

Jan 13 05:04:18 nas kernel: ata1: EH complete

Jan 13 05:04:21 nas kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Jan 13 05:04:21 nas kernel: ata1.00: irq_stat 0x40000001

Jan 13 05:04:21 nas kernel: ata1.00: failed command: READ DMA EXT

Jan 13 05:04:21 nas kernel: ata1.00: cmd 25/00:08:a0:57:f4/00:00:10:00:00/e0 tag 0 dma 4096 in

Jan 13 05:04:21 nas kernel: res 51/40:08:a0:57:f4/00:00:10:00:00/e0 Emask 0x9 (media error)

Jan 13 05:04:21 nas kernel: ata1.00: status: { DRDY ERR }

Jan 13 05:04:21 nas kernel: ata1.00: error: { UNC }

Jan 13 05:04:21 nas kernel: ata1.00: configured for UDMA/133

Jan 13 05:04:21 nas kernel: sd 1:0:0:0: [sdb] Unhandled sense code

Jan 13 05:04:21 nas kernel: sd 1:0:0:0: [sdb]

Jan 13 05:04:21 nas kernel: Result: hostbyte=0x00 driverbyte=0x08

Jan 13 05:04:21 nas kernel: sd 1:0:0:0: [sdb]

Jan 13 05:04:21 nas kernel: Sense Key : 0x3 [current] [descriptor]

Jan 13 05:04:21 nas kernel: Descriptor sense data with sense descriptors (in hex):

Jan 13 05:04:21 nas kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00

Jan 13 05:04:21 nas kernel: 10 f4 57 a0

Jan 13 05:04:21 nas kernel: sd 1:0:0:0: [sdb]

Jan 13 05:04:21 nas kernel: ASC=0x11 ASCQ=0x4

Jan 13 05:04:21 nas kernel: sd 1:0:0:0: [sdb] CDB:

Jan 13 05:04:21 nas kernel: cdb[0]=0x28: 28 00 10 f4 57 a0 00 00 08 00

Jan 13 05:04:21 nas kernel: end_request: I/O error, dev sdb, sector 284448672

Jan 13 05:04:21 nas kernel: Buffer I/O error on device sdb1, logical block 35556076

Jan 13 05:04:21 nas kernel: ata1: EH complete

Jan 13 05:06:21 nas logger: ./media/Movies/HD Movies/Movie 1 (1900)

Jan 13 05:06:21 nas logger: .d..t...... media/Movies/HD Movies/Movie 1 (1900)/

Jan 13 05:06:21 nas logger: ./media/Movies/HD Movies

Jan 13 05:06:21 nas logger: .d..t...... media/Movies/HD Movies/

Jan 13 05:06:21 nas logger: ./media/Movies

Jan 13 05:06:21 nas logger: .d..t...... media/Movies/

Jan 13 05:06:21 nas logger: ./media/

Jan 13 05:06:21 nas logger: .d..t...... media/

Jan 13 05:06:21 nas logger: mover finished

Jan 13 06:06:53 nas kernel: mdcmd (318): spindown 0

Jan 13 06:59:17 nas kernel: mdcmd (319): spindown 3

Jan 13 06:59:18 nas kernel: mdcmd (320): spindown 4

How would the experienced heads in the Unraid crowd interpret that log? sdb dying?

Ta,

Daniel

EDIT: sdb is the cache drive. A new one in fact I installed a few weeks back. This was happening while the mover was running it appears. GUI doesn't report anything though (see newly attached) ....

dgaschk · January 12, 2015

Attach a SMART report and entire syslog. zip if needed.

danioj · January 12, 2015

Of course. Apologies. Here they are.

Ignore the network stuff in the syslog as I was having router problems on Sunday. In addition I have stripped out logs pertaining to some ftp moves and the confirmation of file moves by the mover.

Note that despite these errors no file failed to move from the Cache drive to the Array. They are all there.

P.S.: I had not thought to run a SMART report on sdb as the Unraid status was "thumbs up". I am shocked to see that there are in fact errors in the SMART report.

syslog_extract_20150113_0950.txt

sdb_smart_report_20150113_0945.txt

danioj · January 13, 2015

OK. While I am dying here I decided to read about SMART reports and realised I had been ignorant about what they actually meant.

Read this: http://lime-technology.com/wiki/index.php/Understanding_SMART_Reports

Firstly I think I understand now why the SMART STATUS is showing as OK. Thats because according to the SMART report the overall STATUS is OK. I didn't realise also that only 3 variables contributed to that status (e.g. those that are marked pre_fail are critical) Raw_Read_Error_Rate, Spin_Up_Time and Reallocated_Secont_Ct.

So if I am reading this right:

Of the critical attributes: I have started to have more Raw Read Errors. No Reallocated Sectors and Spin Up Time is fine also.

Of the none critical attributes: Clearly it has been powered on for a good number of hours (not surprising as it was a drive I used to have in my old ReadyNAS NV+) but all others seem ok.

So what I am thinking here is, yes there was some read errors - but thats ok it happens. But I should keep an eye on the drive - specifically around the Raw Read Errors - but I shouldn't feel the need to replace it straight away.

Is this stab at it right?

danioj · January 14, 2015

Hmmm - I must be wrong!!! :-[

dgaschk · January 14, 2015

http://lime-technology.com/wiki/index.php/Troubleshooting#Resolving_a_Pending_Sector

danioj · January 14, 2015

http://lime-technology.com/wiki/index.php/Troubleshooting#Resolving_a_Pending_Sector

Thanks for the reply. I don't understand though.

Based on my interpretation of the SMART report, there are no pending sectors? I refer to this from the Wiki ....

VALUE is almost always used as a normalized scale of perfectly good to perfectly bad, usually starting at VALUE=100, then dropping toward a worst case of VALUE=1. You can generally think of it as representing a scale starting at 100% good, then slowly dropping until failure at some predetermined percentage number, in the THRESHOLD column.

Someone realized that if the values only run from 100 to 1, then they are wasting the possible values from 101 to 252, so some SMART programmers have decided to stretch the scale for certain attributes to start at 200 instead of 100, providing twice the data points. Unfortunately, which attributes are scaled from 200 to 1 is completely inconsistent, with almost all SMART reports showing some attributes starting at 100, and other attributes starting at 200. In addition, there are a few Maxtor and Samsung drives that took the start of the scale all the way to 252 or 253! Above, you see all but 1 attribute using 100, the exception being attribute 199 which starts at 200. In general, you can think of 200-type scales as 100 times 2 (just divide the number by 2), and from now on, that is what we are going to do in most of the discussion.

So ..... on the basis that most of my attributes are at 100 (including pending sectors) doesn't that mean that is ok?

itimpi · January 14, 2015

Pending sectors is always an absolute value and ideally should be 0. A non-zero value does not mean that the disk is faulty - just that there are sectors which gave unreliable values when they were last read. This matters as pending sectors can stop a rebuild working perfectly so definitely want them to be cleared. Pending sectors are either cleared or changed to reallocated sectors the next time the sectors in question are written.

danioj · January 14, 2015

Pending sectors is always an absolute value and ideally should be 0. A non-zero value does not mean that the disk is faulty - just that there are sectors which gave unreliable values when they were last read. This matters as pending sectors can stop a rebuild working perfectly so definitely want them to be cleared. Pending sectors are either cleared or changed to reallocated sectors the next time the sectors in question are written.

Ahhh, thank you. I misunderstood the wiki. It is only my cache drive so I've not been overly worried as the mover eventually succeeded in moving the files it needed to (and I've been limiting use somewhat) - but it is out of warranty so I'll backup the apps folders to my array and pre-clear the sucker at at the weekend.

Will update once I'm done. Thanks again!

Errors in Syslog - help to interpret please

Recommended Posts

danioj

Link to comment

dgaschk

Link to comment

danioj

Link to comment

danioj

Link to comment

danioj

Link to comment

dgaschk

Link to comment

danioj

Link to comment

itimpi

Link to comment

danioj

Link to comment

Archived