Jump to content

Disabled Drive After Successful Parity Check?

Featured Replies

Posted

Hi, hoping to get some advice/next steps - my server just completed a successful parity check this morning (0 errors), but less than 15 hours later, one of my drives became disabled (red 'X'), and I'm seeing the following in the disk log:

 

Sep 3 16:07:16 Proteus kernel: sd 1:0:1:0: [sde] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Sep 3 16:07:16 Proteus kernel: sd 1:0:1:0: [sde] tag#2 Sense Key : 0x5 [current]
Sep 3 16:07:16 Proteus kernel: sd 1:0:1:0: [sde] tag#2 ASC=0x21 ASCQ=0x0
Sep 3 16:07:16 Proteus kernel: sd 1:0:1:0: [sde] tag#2 CDB: opcode=0x8a 8a 00 00 00 00 00 ae a9 eb c8 00 00 00 08 00 00
Sep 3 16:07:16 Proteus kernel: print_req_error: critical target error, dev sde, sector 2930371528
Sep 3 16:07:16 Proteus kernel: print_req_error: critical target error, dev sde, sector 2930371528

 

What are my next steps for troubleshooting/repair?  Is the disk toast, or should I attempt a repair and put it back into service?

 

Thanks!

proteus-diagnostics-20180903-1956.zip

Edited by quinnjudge

  • Community Expert

SMART for disk1 looks OK. Might just be a connection issue.

 

You can rebuild to a spare disk if you have one. That would allow you to keep the original in reserve in case there is a problem rebuilding.

 

Or you can rebuild to the same disk. Do you know the procedure?

 

Do you have backups of any important and irreplaceable files?

  • Author

Thanks for the quick reply!

 

I don't have a spare disk, so I'll have to rebuild the existing one...I'll shut down, check the connections, and bring the server back up...can you point me to the rebuild procedure? (having a spare sitting around is on my to-do list, lol!)

 

I do have good backups; just did a test restore :)

Once you're happy with the connections power up and if the array is set to auto-start, stop it. (At this point you can check the SMART status again, run a SMART self-test if you want.) Unassign the disk. Start the array. Stop the array. Re-assign the disk. Start the array and the rebuild with begin.

  • Author

Server restarted, disk rebuilding...looks like I have ~8 hours until rebuild is complete; I'll go grab some popcorn and cross my fingers :)

 

Thank you @trurl and @John_M for your quick help, it is appreciated!

  • Author

Good news - rebuild completed without errors.  Bad news - now I have a reported error on the same disk:

 

Sep 3 21:28:14 Proteus kernel: mdcmd (2): import 1 sdf 64 2930266532 0 WDC_WD30EFRX-68EUZN0_WD-WMC4N0862856
Sep 3 21:28:14 Proteus kernel: md: import disk1: (sdf) WDC_WD30EFRX-68EUZN0_WD-WMC4N0862856 size: 2930266532

Sep 3 21:28:49 Proteus emhttpd: shcmd (886): /usr/local/sbin/set_ncq sdf 1
Sep 3 21:28:49 Proteus emhttpd: shcmd (887): echo 128 > /sys/block/sdf/queue/nr_requests
Sep 5 20:57:41 Proteus kernel: sd 9:0:2:0: [sdf] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Sep 5 20:57:41 Proteus kernel: sd 9:0:2:0: [sdf] tag#0 Sense Key : 0x3 [current]
Sep 5 20:57:41 Proteus kernel: sd 9:0:2:0: [sdf] tag#0 ASC=0x11 ASCQ=0x0
Sep 5 20:57:41 Proteus kernel: sd 9:0:2:0: [sdf] tag#0 CDB: opcode=0x88 88 00 00 00 00 01 2f 2a 88 98 00 00 00 08 00 00
Sep 5 20:57:41 Proteus kernel: print_req_error: critical medium error, dev sdf, sector 5086283928

 

I did a short SMART test against the drive right before I started the rebuild (came back successful)...next steps?

proteus-diagnostics-20180905-2155.zip

  • Community Expert

Disk1 is failing and needs to be replaced

Archived

This topic is now archived and is closed to further replies.