Device is disabled - understanding the SMART report


Gog

Recommended Posts

SMART for that disk looks OK. But you should always

 

go to Tools - Diagnostics and attach the complete diagnostics zip file to your NEXT post.

 

Diagnostics include SMART for all disks, syslog that might give a better idea of what happened (if you haven't rebooted), and many other things that give a more complete understanding of your situation.

 

I will wait on the diagnostics before making any recommendations about how to proceed.

Link to comment

Most of your disks are very full, and some are still ReiserFS. Why are you logging Mover?

3 hours ago, Gog said:

I just noticed a disabled drive

Jan 15 03:53:42 Tower kernel: md: disk3 write error, sector=1953336760

Looks like disk3 got disabled Jan 15. Do you not have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected?

 

You also have problems communicating with cache and disk1. Are these all the same controller? Disk3 needs to be rebuilt of course, especially since it is out-of-sync more than a week. But I'm not confident about the rebuild with these other issues.

Jan  8 19:31:51 Tower kernel: ata1.00: ATA-10: KINGSTON SA400S37480G, 50026B778227C383, SBFK71B1, max UDMA/133
Jan  8 19:32:09 Tower emhttpd: import 30 cache device: (sdi) KINGSTON_SA400S37480G_50026B778227C383
Jan 23 07:08:31 Tower kernel: ata1.00: exception Emask 0x10 SAct 0x1800000 SErr 0x280100 action 0x6 frozen
Jan 23 07:08:31 Tower kernel: ata1: hard resetting link
...
Jan  8 19:31:51 Tower kernel: ata2.00: ATA-9: HGST HDN726060ALE614, K8H5GNMD, APGNW7JH, max UDMA/133
Jan  8 19:32:08 Tower kernel: md: import disk1: (sdj) HGST_HDN726060ALE614_K8H5GNMD size: 5860522532 
Jan 23 03:38:34 Tower kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen
Jan 23 03:38:34 Tower kernel: ata2: hard resetting link

Let's see if @johnnie.black is still awake and if he has anything to say about your controller or suggestions about how to proceed.

Link to comment
Quote

Most of your disks are very full, and some are still ReiserFS

Yes, new drives are xfs but I'm not actively migrating data.  I just remove the smallest drive when I add a new one.

Quote

Why are you logging Mover?

I was tracking an odd behavior a while ago and forgot to mute the mover.

Quote

Looks like disk3 got disabled Jan 15. Do you not have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected?

I do but missed that email.  I did an inbox cleanup and here we are

Quote

You also have problems communicating with cache and disk1

yes, these are the CRC errors I mentioned

Quote

Are these all the same controller?

Not sure, I'll have to power down the server to pull the drawers to verify

Link to comment

Disk1 and cache need a new SATA cable

Jan  9 03:23:49 Tower kernel: ata1: SError: { UnrecovData 10B8B BadCRC }
...
Jan 12 11:40:04 Tower kernel: ata2: SError: { UnrecovData 10B8B BadCRC }

It's also highly recommended to update the LSI to latest firmware p20.00.07.00, all earlier p20 releases have known issues and possibly what got disk3 disabled.

Link to comment
On 1/25/2020 at 2:54 AM, johnnie.black said:

Disk1 and cache need a new SATA cable


Jan  9 03:23:49 Tower kernel: ata1: SError: { UnrecovData 10B8B BadCRC }
...
Jan 12 11:40:04 Tower kernel: ata2: SError: { UnrecovData 10B8B BadCRC }

 

I replaced those cables

 

Disk 3 is on the LSI but disk 1 and cache were not.

Quote

It's also highly recommended to update the LSI to latest firmware p20.00.07.00, all earlier p20 releases have known issues and possibly what got disk3 disabled.

I'm on p20.00.02.00, trying to get the p20.00.07.00 from a reliable source but supermicro's ftp is refusing connections from my IP.

 

These instructions are bang on except I can't get the binaries: https://www.ixsystems.com/community/threads/flashing-the-lsi2308-firmware-on-a-supermicro-x10sl7-f-motherboard.38884/

 

I found https://www.mediafire.com/?py9c1w5u56xytw2

that gives a procedure to upgrade LSI SAS 9211-8i to p20.00.07.00.  Do you know id the same firmware works on my controller(LSI 2308) ?  The broadcom website is not really helpful

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.