Idenitifying which drive is "ata9" -- throwing errors into syslog....


Recommended Posts

I'm getting errors like:

 

May 13 08:29:33 unraid kernel: ata9: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 (Errors)

May 13 08:29:33 unraid kernel: ata9: irq_stat 0x40000001 (Drive related)

May 13 08:32:06 unraid kernel: ata9: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 (Errors)

May 13 08:32:06 unraid kernel: ata9: irq_stat 0x40000001 (Drive related)

 

How do I identify what disk is ata9 -- as I understand it, the 'ata' number is related to the disk channel, not necessarily "disk9"

 

Attached syslog

syslog-2013-05-13.zip

Link to comment

ATA9 in your system is the Seagate "ST3000DM001-1CH166" disk.

 

Yea, I searched and saw that line in the syslog....

 

I have several ST3000DM001 disks.  And none show 1CH166 -- in fact, the drive identifiers shown on the Unraid main page are longer -- like "ST3000DM001-9YN166_Z1F0MZ6Y" and "ST3000DM001-9YN166_Z1F0L360" -- etc...

 

So, how do I match 1CH166 to something?  Where should I be looking to find the "1CH166"...

Link to comment

As you've noted, the drive IDs on the Web GUI main page show the drive serial numbers -- which clearly let you identify exactly which drive is which (although in most cases you have to remove the drives to read the serial numbers).

 

Are the errors in the Syslog showing as errors in the Web GUI display?  If so, then clearly the disk that's got a non-zero entry in the "Errors" column is the one you've looking for  :)

 

... its' not clear, however, that this will be the case.  Many errors are correctable ... and if these are in the category, then there won't be any errors display in the Web GUI.  However, that also means the errors aren't causing data loss  :)

 

I'd check the SMART reports for your drives as well, to see if one is showing signs of failure (doesn't mean it won't "Pass" SMART ... but if you look at the detailed parameters, there may be some signs of pending failure -- post a SMART report here).

 

Link to comment

 

Are the errors in the Syslog showing as errors in the Web GUI display? 

 

 

Ugh, no.  I'm showing zero's across the board for errors...

 

I'd check the SMART reports for your drives as well, to see if one is showing signs of failure (doesn't mean it won't "Pass" SMART ... but if you look at the detailed parameters, there may be some signs of pending failure -- post a SMART report here).

 

I was afraid of that.  I'll have to get deeper into my SMART checks.  I at least know it's one of my 5 3TB ST3000DM001 Seagates - out of 13 drives...  (Of other MFGs, Size, or Model #'s}

 

Ugh.  It "seems" to me there ought to be a better way to do a definitive match from "ataX" number to "it's this drive" -- but oh well...  I'll quit expecting an easy way to do this ;)

 

Once I track down the drive, I want to "trade" that drive into another slot/controller -- and see if the problem follows the drive, or stays with the controller. 

 

I am assuming that "ata6" would stay consistent -- that is, it's a "hardware" descriptor, of a SATA port...  or am I asking to much that "ata6" would mean the same thing (data path) between reboots?

Link to comment

I am assuming that "ata6" would stay consistent -- that is, it's a "hardware" descriptor, of a SATA port...  or am I asking to much that "ata6" would mean the same thing (data path) between reboots?

Not a good assumption.  unRAID used to use the SATA ports to tie a disk assignment to a logical slot on the array until one user's MB kept changing the assignment when booted.  I would suspect the ataX designations are assigned as disk controller ports initialize, and when there are multiple identical disk controllers, the order in which initializes first can vary.

 

I found this script on the web... give it a try, download, unzip, move to your flash drive and invoke as

ata_devices.sh

 

output will look like this:

root@Tower:/# ata_devices.sh

sda: ata1.00

sdb: ata2.00

sdc: ata3.00

sdd: ata4.00

(Device sde is not an ATA device, but a USB device [e. g. a pen drive])

sdf: ata5.00

sdg: ata6.00

sdh: ata7.00

sdi: ata8.00

sdj: ata10.00

sdk: ata11.00

sdl: ata12.00

 

Then you can type

hdparm -i /dev/sdX

(where sdX = the three letter drive identifier) to identify the specific drive make/model/serial number.

ata_devices.zip

Link to comment

Thanks Joe, for my purposes that script solves the mystery.  ata9 = sdf -- happens to be my cache drive, in an external eSATA enclosure.  Ugh.  Will work on that issue....  I have enough info now to trade the 'cache disk' with another drive, and see if the 'problem' moves with the drive (eg, it's a drive issue) or stays with the eSata -- ergo, is a controller/cable/eSata issue...  {I think the latter}

 

However, I do have 13 drives in my array (12 + cache) -- and the script didn't identify everything.  So others usage of it may not be 100% successful for this exercise.

 

As a side note -- my 'suggestion' would be for UnRaid -- in some future version -- to include this (or an enhanced script) and have it run automatically as part of the boot-load -- so it puts these "rosetta stone" (ataX=sdX) messages into the syslog.  Would help a lot ;)

 

~Update:  Ah, I bet the 6 drives that aren't identified -- are the ones that are on the 6-port SuperMicro controller  (AOC-SASLP-MV8) -- so perhaps the script could be tweaked {uh, not in my skill set} by someone, eventually...  being that's a common/recommended controller

 

...Chuck

 

root@unraid:/boot# chmod +x ata_devices.sh

root@unraid:/boot# ./ata_devices.sh

(Device sda is not an ATA device, but a USB device [e. g. a pen drive])

sdb: ata3.00

sdc: ata4.00

sdd: ata5.00

sde: ata6.00

sdf: ata9.00

sdg: ata10.00

sdh: ata19.00

sdi: ata20.00

cat: /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/host1/port-1:0/end_device-1:0/target1:0:0/hostsdj/scsi_host/hostsdj/unique_id: No such file or directory

: ata.

cat: /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/host1/port-1:1/end_device-1:1/target1:0:1/hostsdk/scsi_host/hostsdk/unique_id: No such file or directory

: ata.

cat: /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/host1/port-1:2/end_device-1:2/target1:0:2/hostsdl/scsi_host/hostsdl/unique_id: No such file or directory

: ata.

cat: /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/host1/port-1:3/end_device-1:3/target1:0:3/hostsdm/scsi_host/hostsdm/unique_id: No such file or directory

: ata.

cat: /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/host1/port-1:4/end_device-1:4/target1:0:4/hostsdn/scsi_host/hostsdn/unique_id: No such file or directory

: ata.

cat: /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/host1/port-1:5/end_device-1:5/target1:0:5/hostsdo/scsi_host/hostsdo/unique_id: No such file or directory

: ata.

root@unraid:/boot#

Link to comment

Thanks Joe, for my purposes that script solves the mystery.  ata9 = sdf -- happens to be my cache drive, in an external eSATA enclosure.  Ugh.  Will work on that issue....  I have enough info now to trade the 'cache disk' with another drive, and see if the 'problem' moves with the drive (eg, it's a drive issue) or stays with the eSata -- ergo, is a controller/cable/eSata issue...  {I think the latter}

 

However, I do have 13 drives in my array (12 + cache) -- and the script didn't identify everything.  So others usage of it may not be 100% successful for this exercise.

 

As a side note -- my 'suggestion' would be for UnRaid -- in some future version -- to include this (or an enhanced script) and have it run automatically as part of the boot-load -- so it puts these "rosetta stone" (ataX=sdX) messages into the syslog.  Would help a lot ;)

 

~Update:  Ah, I bet the 6 drives that aren't identified -- are the ones that are on the 6-port controller  AOC-SASLP-MV8 -- so perhaps the script could be tweaked {uh, not in my skill set} by someone, eventually...  being that's a common/recommended controller

 

...Chuck

 

root@unraid:/boot# chmod +x ata_devices.sh

root@unraid:/boot# ./ata_devices.sh

(Device sda is not an ATA device, but a USB device [e. g. a pen drive])

sdb: ata3.00

sdc: ata4.00

sdd: ata5.00

sde: ata6.00

sdf: ata9.00

sdg: ata10.00

sdh: ata19.00

sdi: ata20.00

cat: /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/host1/port-1:0/end_device-1:0/target1:0:0/hostsdj/scsi_host/hostsdj/unique_id: No such file or directory

: ata.

cat: /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/host1/port-1:1/end_device-1:1/target1:0:1/hostsdk/scsi_host/hostsdk/unique_id: No such file or directory

: ata.

cat: /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/host1/port-1:2/end_device-1:2/target1:0:2/hostsdl/scsi_host/hostsdl/unique_id: No such file or directory

: ata.

cat: /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/host1/port-1:3/end_device-1:3/target1:0:3/hostsdm/scsi_host/hostsdm/unique_id: No such file or directory

: ata.

cat: /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/host1/port-1:4/end_device-1:4/target1:0:4/hostsdn/scsi_host/hostsdn/unique_id: No such file or directory

: ata.

cat: /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/host1/port-1:5/end_device-1:5/target1:0:5/hostsdo/scsi_host/hostsdo/unique_id: No such file or directory

: ata.

root@unraid:/boot#

I'm guessing some of your drives are on a port-multiplier???
Link to comment

I'm guessing some of your drives are on a port-multiplier???

 

You typed, while I updated my post!

 

 

~Update:  Ah, I bet the 6 drives that aren't identified -- are the ones that are on the 6-port SuperMicro controller  (AOC-SASLP-MV8) -- so perhaps the script could be tweaked {uh, not in my skill set} by someone, eventually...  being that's a common/recommended controller

Link to comment

I'm guessing some of your drives are on a port-multiplier???

 

You typed, while I updated my post!

 

 

~Update:  Ah, I bet the 6 drives that aren't identified -- are the ones that are on the 6-port SuperMicro controller  (AOC-SASLP-MV8) -- so perhaps the script could be tweaked {uh, not in my skill set} by someone, eventually...  being that's a common/recommended controller

See here for more info:

http://utcc.utoronto.ca/~cks/space/blog/linux/LinuxSATANames

Link to comment

One reason to not include that script and store it in the syslog is that you shouldn't depend on it.  In general, ATA (and SATA) designation are assigned during the initial boot-up and scanning of the system, and these can (and will) VARY depending on exactly which order the drives respond to the bus scans.    So what's ATA9 today might be ATA7 tomorrow; etc.

 

But it's certainly a handy script to run while the system's booted -- and you can see what the CURRENT designations are;  which lets you do what you needed here -- i.e. figure out which drive is causing the issue.

 

Link to comment

One reason to not include that script and store it in the syslog is that you shouldn't depend on it.  In general, ATA (and SATA) designation are assigned during the initial boot-up and scanning of the system, and these can (and will) VARY depending on exactly which order the drives respond to the bus scans.    So what's ATA9 today might be ATA7 tomorrow; etc.

 

But it's certainly a handy script to run while the system's booted -- and you can see what the CURRENT designations are;  which lets you do what you needed here -- i.e. figure out which drive is causing the issue.

 

Having it in the syslog seems like a good idea to me. The information will be correct in relation to the rest of the syslog contents.

Link to comment

One reason to not include that script and store it in the syslog is that you shouldn't depend on it.  In general, ATA (and SATA) designation are assigned during the initial boot-up and scanning of the system, and these can (and will) VARY depending on exactly which order the drives respond to the bus scans.    So what's ATA9 today might be ATA7 tomorrow; etc.

 

But it's certainly a handy script to run while the system's booted -- and you can see what the CURRENT designations are;  which lets you do what you needed here -- i.e. figure out which drive is causing the issue.

 

Having it in the syslog seems like a good idea to me. The information will be correct in relation to the rest of the syslog contents.

 

Agree it's handy --as long as you remember to look at the CURRENT log.  But if you happen to look at the wrong log;  or simply have a copy of an earlier one and don't think about the fact that the assignments may be different, the results can be BAD  :)      I think being forced to Telnet in and execute the command virtually eliminates the possibility of using "stale" (e.g. wrong) data r.e. the assignment.

 

Link to comment

One reason to not include that script and store it in the syslog is that you shouldn't depend on it.  In general, ATA (and SATA) designation are assigned during the initial boot-up and scanning of the system, and these can (and will) VARY depending on exactly which order the drives respond to the bus scans.    So what's ATA9 today might be ATA7 tomorrow; etc.

 

But it's certainly a handy script to run while the system's booted -- and you can see what the CURRENT designations are;  which lets you do what you needed here -- i.e. figure out which drive is causing the issue.

 

Having it in the syslog seems like a good idea to me. The information will be correct in relation to the rest of the syslog contents.

 

Agree it's handy --as long as you remember to look at the CURRENT log.  But if you happen to look at the wrong log;  or simply have a copy of an earlier one and don't think about the fact that the assignments may be different, the results can be BAD  :)      I think being forced to Telnet in and execute the command virtually eliminates the possibility of using "stale" (e.g. wrong) data r.e. the assignment.

 

Given that this will be an entry in the current syslog that applies to the current syslog, I don't see the issue.

Link to comment

One reason to not include that script and store it in the syslog is that you shouldn't depend on it.  In general, ATA (and SATA) designation are assigned during the initial boot-up and scanning of the system, and these can (and will) VARY depending on exactly which order the drives respond to the bus scans.    So what's ATA9 today might be ATA7 tomorrow; etc.

 

But it's certainly a handy script to run while the system's booted -- and you can see what the CURRENT designations are;  which lets you do what you needed here -- i.e. figure out which drive is causing the issue.

 

Having it in the syslog seems like a good idea to me. The information will be correct in relation to the rest of the syslog contents.

 

Agree it's handy --as long as you remember to look at the CURRENT log.  But if you happen to look at the wrong log;  or simply have a copy of an earlier one and don't think about the fact that the assignments may be different, the results can be BAD  :)      I think being forced to Telnet in and execute the command virtually eliminates the possibility of using "stale" (e.g. wrong) data r.e. the assignment.

 

I think that's easily fixed by including in the syslog a warning "**** THIS INFORMATION VALID FOR THIS BOOT ONLY ****" or something above/below the text block of ata?=sd? messages.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.