Jump to content

Device is Disabled - Contents Emulated


Netbug

Recommended Posts

After upgrading my parity drive to 4TB and one of my data drives from 1TB to 4TB, I'm now getting a red X on another drive.

 

Screenshot

 

I ran a SMART test on the drive (results attached).

 

So, 2 questions...

 

  • Is this ANOTHER drive I need to replace (4 drives swapped in one weekend is getting expensive)?
  • How can I get the drive back into the array and have the data rebuilt?

 

From my digging (and apparently I encountered a similar problem before), I need to create a new config under "Tools." Is this correct? How do I go about doing that while preserving/rebuilding my data?

 

Thanks.

 

tower-smart-20170802-2338.zip

tower-diagnostics-20170803-0710.zip

Link to comment

While the 3T Seagates do not have a sterling record, I do not see anything wrong with that drive.

 

Are you using hot-swap style drive cages? My guess is that every time you open the server, you and jiggling wires enough to create intermittent connections. Yours are the typical signs.

 

Look at the CSE-M35T-1B cages.

Link to comment

You log is full of these errors:

 

Aug  2 22:21:13 Tower kernel: Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012
Aug  2 22:21:13 Tower kernel: Workqueue: scsi_wq_1 sas_destruct_devices [libsas]
Aug  2 22:21:13 Tower kernel: ffffc9000bcb3b28 ffffffff813a4a1b ffffc9000bcb3b78 ffffffff81944854
Aug  2 22:21:13 Tower kernel: ffffc9000bcb3b68 ffffffff8104d0d9 000000ed0bcb3be0 0000000000000000
Aug  2 22:21:13 Tower kernel: ffffffff81c7b220 ffff880211e05820 0000000000800070 ffff88001eb50000
Aug  2 22:21:13 Tower kernel: Call Trace:
Aug  2 22:21:13 Tower kernel: [<ffffffff813a4a1b>] dump_stack+0x61/0x7e
Aug  2 22:21:13 Tower kernel: [<ffffffff8104d0d9>] __warn+0xb8/0xd3
Aug  2 22:21:13 Tower kernel: [<ffffffff8104d13a>] warn_slowpath_fmt+0x46/0x4e
Aug  2 22:21:13 Tower kernel: [<ffffffff81176a20>] sysfs_remove_group+0x4d/0x80
Aug  2 22:21:13 Tower kernel: [<ffffffff81491bb6>] dpm_sysfs_remove+0x4b/0x50
Aug  2 22:21:13 Tower kernel: [<ffffffff81488604>] device_del+0x44/0x1f0
Aug  2 22:21:13 Tower kernel: [<ffffffff814bde8a>] sd_remove+0x50/0xb7
Aug  2 22:21:13 Tower kernel: [<ffffffff8148b96f>] __device_release_driver+0x98/0x11c
Aug  2 22:21:13 Tower kernel: [<ffffffff8148ba11>] device_release_driver+0x1e/0x2b
Aug  2 22:21:13 Tower kernel: [<ffffffff8148acb9>] bus_remove_device+0xf6/0x109
Aug  2 22:21:13 Tower kernel: [<ffffffff81488721>] device_del+0x161/0x1f0
Aug  2 22:21:13 Tower kernel: [<ffffffff814b9f62>] __scsi_remove_device+0x5e/0xcb
Aug  2 22:21:13 Tower kernel: [<ffffffff814b9ff0>] scsi_remove_device+0x21/0x2e
Aug  2 22:21:13 Tower kernel: [<ffffffff814ba178>] scsi_remove_target+0x151/0x1ad
Aug  2 22:21:13 Tower kernel: [<ffffffffa006f8a8>] sas_rphy_remove+0x26/0x6b [scsi_transport_sas]
Aug  2 22:21:13 Tower kernel: [<ffffffffa006f8fa>] sas_rphy_delete+0xd/0x18 [scsi_transport_sas]
Aug  2 22:21:13 Tower kernel: [<ffffffffa0081191>] sas_destruct_devices+0x63/0x85 [libsas]
Aug  2 22:21:13 Tower kernel: [<ffffffff8105ed53>] process_one_work+0x192/0x295
Aug  2 22:21:13 Tower kernel: [<ffffffff8105f752>] worker_thread+0x27d/0x369
Aug  2 22:21:13 Tower kernel: [<ffffffff8105f4d5>] ? rescuer_thread+0x2b1/0x2b1
Aug  2 22:21:13 Tower kernel: [<ffffffff81063939>] kthread+0xdb/0xe3
Aug  2 22:21:13 Tower kernel: [<ffffffff8106385e>] ? kthread_park+0x52/0x52
Aug  2 22:21:13 Tower kernel: [<ffffffff8167f785>] ret_from_fork+0x25/0x30
Aug  2 22:21:13 Tower kernel: ---[ end trace c8cceb7943eb323b ]---
Aug  2 22:21:13 Tower kernel: ------------[ cut here ]------------

They appear related to the SASLP, failed disk is there, you can try using it in a different slot, or better get get rid of it as there are known issues with these controllers.

Link to comment
52 minutes ago, bjp999 said:

While the 3T Seagates do not have a sterling record, I do not see anything wrong with that drive.

 

Are you using hot-swap style drive cages? My guess is that every time you open the server, you and jiggling wires enough to create intermittent connections. Yours are the typical signs.

 

Look at the CSE-M35T-1B cages.

 

I am using hot-swap bays. The cages you recommend are actually the cages I'm using. They've been very good so far, but I suppose the wires could be loose somehow.

 

30 minutes ago, johnnie.black said:

You log is full of these errors:

 


Aug  2 22:21:13 Tower kernel: Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012
Aug  2 22:21:13 Tower kernel: Workqueue: scsi_wq_1 sas_destruct_devices [libsas]
Aug  2 22:21:13 Tower kernel: ffffc9000bcb3b28 ffffffff813a4a1b ffffc9000bcb3b78 ffffffff81944854
Aug  2 22:21:13 Tower kernel: ffffc9000bcb3b68 ffffffff8104d0d9 000000ed0bcb3be0 0000000000000000
Aug  2 22:21:13 Tower kernel: ffffffff81c7b220 ffff880211e05820 0000000000800070 ffff88001eb50000
Aug  2 22:21:13 Tower kernel: Call Trace:
Aug  2 22:21:13 Tower kernel: [<ffffffff813a4a1b>] dump_stack+0x61/0x7e
Aug  2 22:21:13 Tower kernel: [<ffffffff8104d0d9>] __warn+0xb8/0xd3
Aug  2 22:21:13 Tower kernel: [<ffffffff8104d13a>] warn_slowpath_fmt+0x46/0x4e
Aug  2 22:21:13 Tower kernel: [<ffffffff81176a20>] sysfs_remove_group+0x4d/0x80
Aug  2 22:21:13 Tower kernel: [<ffffffff81491bb6>] dpm_sysfs_remove+0x4b/0x50
Aug  2 22:21:13 Tower kernel: [<ffffffff81488604>] device_del+0x44/0x1f0
Aug  2 22:21:13 Tower kernel: [<ffffffff814bde8a>] sd_remove+0x50/0xb7
Aug  2 22:21:13 Tower kernel: [<ffffffff8148b96f>] __device_release_driver+0x98/0x11c
Aug  2 22:21:13 Tower kernel: [<ffffffff8148ba11>] device_release_driver+0x1e/0x2b
Aug  2 22:21:13 Tower kernel: [<ffffffff8148acb9>] bus_remove_device+0xf6/0x109
Aug  2 22:21:13 Tower kernel: [<ffffffff81488721>] device_del+0x161/0x1f0
Aug  2 22:21:13 Tower kernel: [<ffffffff814b9f62>] __scsi_remove_device+0x5e/0xcb
Aug  2 22:21:13 Tower kernel: [<ffffffff814b9ff0>] scsi_remove_device+0x21/0x2e
Aug  2 22:21:13 Tower kernel: [<ffffffff814ba178>] scsi_remove_target+0x151/0x1ad
Aug  2 22:21:13 Tower kernel: [<ffffffffa006f8a8>] sas_rphy_remove+0x26/0x6b [scsi_transport_sas]
Aug  2 22:21:13 Tower kernel: [<ffffffffa006f8fa>] sas_rphy_delete+0xd/0x18 [scsi_transport_sas]
Aug  2 22:21:13 Tower kernel: [<ffffffffa0081191>] sas_destruct_devices+0x63/0x85 [libsas]
Aug  2 22:21:13 Tower kernel: [<ffffffff8105ed53>] process_one_work+0x192/0x295
Aug  2 22:21:13 Tower kernel: [<ffffffff8105f752>] worker_thread+0x27d/0x369
Aug  2 22:21:13 Tower kernel: [<ffffffff8105f4d5>] ? rescuer_thread+0x2b1/0x2b1
Aug  2 22:21:13 Tower kernel: [<ffffffff81063939>] kthread+0xdb/0xe3
Aug  2 22:21:13 Tower kernel: [<ffffffff8106385e>] ? kthread_park+0x52/0x52
Aug  2 22:21:13 Tower kernel: [<ffffffff8167f785>] ret_from_fork+0x25/0x30
Aug  2 22:21:13 Tower kernel: ---[ end trace c8cceb7943eb323b ]---
Aug  2 22:21:13 Tower kernel: ------------[ cut here ]------------

They appear related to the SASLP, failed disk is there, you can try using it in a different slot, or better get get rid of it as there are known issues with these controllers.

 

I'm not quite sure what you mean by the SASLP; is that the SATA controller?

 

At this point, it is detecting the drive again, and as you said, there don't appear to be any errors with the drive.

 

How can I get it back in the array?

Link to comment
4 minutes ago, johnnie.black said:

You can rebuild to the same disk, but with all those errors it's just a matter of time until that or another disk is disabled again.

 

Ok. I'll add it to the list of hardware I need to replace ($500 on hard drives this week alone :( ).

 

For now, though, how do I go about safely getting the disk to be read by the array again?

Link to comment
5 minutes ago, johnnie.black said:

https://wiki.lime-technology.com/Troubleshooting#Re-enable_the_drive

 

Disk wasn't unmountable when you posted the screenshot and diags, but it looks to be empty, if so you can just format it.

 

That's got it. Sorry, I was 100% sure i saw it as unmountable when I left the house this morning. It's reconstructing now.

 

I really appreciate all the help.

 

So as of right now, I've got a new motherboard, RAM, and processor on the way, I will be getting two additional hot-swap bays and a controller. Now I need to add to that a second SATA controller.

 

An expensive week.

 

Thank you again.

Link to comment
3 minutes ago, Netbug said:

I assume the disks will need to be re-assigned in their proper positions after the controller is replaced.

 

No, disks are tracked by serial number, as long as it's supported and not a raid controller, it's just plug and play, currently LSI HBAs are recommended, e.g., 9201-8i, 9211-8i and clones.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...