removing BAD disk from array

sacretagent · January 18, 2011

HI Guys,

Had to go out of town for work and of course i have a failed disk when this happens

now i can remotely connect trough vpn to my system so no worries there but i can not replace the disk with a new one till i am back

so i copied the content of the failed disk to my main machine and now i would like to remove the failed disk from the array

so i would like to check with you guys if this is the correct procedure underneath

stop array

unasign bad disk

reboot server

telnet in

initconfig (to rebuild the array with only 9 disks and so that the parity is correct )

wait till the parity is synced for the new 9 disk only array

copy my content from disk 10 back to the array (have enough space for that)

when i come home just preclear a new disk and add to array

from reading the wiki i think this is the way to do this ? but i would like to be sure

don't want to loose 9 other data drives

Joe L. · January 18, 2011

HI Guys,

Had to go out of town for work and of course i have a failed disk when this happens

now i can remotely connect trough vpn to my system so no worries there but i can not replace the disk with a new one till i am back

so i copied the content of the failed disk to my main machine and now i would like to remove the failed disk from the array

so i would like to check with you guys if this is the correct procedure underneath

stop array

unasign bad disk

reboot server

telnet in

initconfig (to rebuild the array with only 9 disks and so that the parity is correct )

wait till the parity is synced for the new 9 disk only array

copy my content from disk 10 back to the array (have enough space for that)

when i come home just preclear a new disk and add to array

from reading the wiki i think this is the way to do this ? but i would like to be sure

don't want to loose 9 other data drives

Looks good to me, but you do not need to reboot the server after un-assigning the bad disk.

(it won't hurt, ad it will clean out the syslog, but unless you are running out of memory, it is not needed.)

sacretagent · January 18, 2011

Thanks Joe for confirming

Parity is rebuilding

this were the errors messages about disk 10 in the logs

Jan 18 12:02:03 p5bplus unmenu[1658]: Unrecognized state, drive sdl, assuming not spinning: drive state is: unknown
Jan 18 12:02:03 p5bplus kernel: sd 11:0:0:0: [sdl] Unhandled error code

Jan 18 12:02:03 p5bplus kernel: sd 11:0:0:0: [sdl] Result: hostbyte=0x04 driverbyte=0x00

Jan 18 12:02:03 p5bplus kernel: sd 11:0:0:0: [sdl] CDB: cdb[0]=0x28: 28 00 00 00 00 00 00 00 20 00

Jan 18 12:02:03 p5bplus kernel: end_request: I/O error, dev sdl, sector 0

Jan 18 12:02:03 p5bplus kernel: Buffer I/O error on device sdl, logical block 0

Jan 18 12:02:03 p5bplus kernel: Buffer I/O error on device sdl, logical block 1

Jan 18 12:02:03 p5bplus kernel: Buffer I/O error on device sdl, logical block 2

Jan 18 12:02:03 p5bplus kernel: Buffer I/O error on device sdl, logical block 3

Jan 18 12:02:03 p5bplus kernel: sd 11:0:0:0: [sdl] Unhandled error code

Jan 18 12:02:03 p5bplus kernel: sd 11:0:0:0: [sdl] Result: hostbyte=0x04 driverbyte=0x00

Jan 18 12:02:03 p5bplus kernel: sd 11:0:0:0: [sdl] CDB: cdb[0]=0x28: 28 00 00 00 00 00 00 00 08 00

Jan 18 12:02:03 p5bplus kernel: end_request: I/O error, dev sdl, sector 0

Jan 18 12:02:03 p5bplus kernel: Buffer I/O error on device sdl, logical block 0

They are still spawning in my logs

anything i can do to say to the OS that the disk can be disabled ?

sacretagent · January 18, 2011

OK parity rebuild on 9 drives

all well

did reboot the server after the parity rebuild

and did smartctl on bad disk

and it said PASSED

see attached txt, file

so i think let's try a preclear as i am not around the computer so running preclear now and get the syslog filled with these

Jan 18 20:17:24 p5bplus kernel: ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Jan 18 20:17:24 p5bplus kernel: ata11.00: irq_stat 0x48000000

Jan 18 20:17:24 p5bplus kernel: ata11.00: failed command: READ FPDMA QUEUED

Jan 18 20:17:24 p5bplus kernel: ata11.00: cmd 60/00:00:00:b3:87/02:00:16:00:00/40 tag 0 ncq 262144 in

Jan 18 20:17:24 p5bplus kernel: res 41/40:00:ab:b3:87/54:00:16:00:00/40 Emask 0x409 (media error) <F>

Jan 18 20:17:24 p5bplus kernel: ata11.00: status: { DRDY ERR }

Jan 18 20:17:24 p5bplus kernel: ata11.00: error: { UNC }

Jan 18 20:17:24 p5bplus kernel: ata11.00: configured for UDMA/133

Jan 18 20:17:24 p5bplus kernel: sd 11:0:0:0: [sdl] Unhandled sense code

Jan 18 20:17:24 p5bplus kernel: sd 11:0:0:0: [sdl] Result: hostbyte=0x00 driverbyte=0x08

Jan 18 20:17:24 p5bplus kernel: sd 11:0:0:0: [sdl] Sense Key : 0x3 [current] [descriptor]

Jan 18 20:17:24 p5bplus kernel: Descriptor sense data with sense descriptors (in hex):

Jan 18 20:17:24 p5bplus kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00

Jan 18 20:17:24 p5bplus kernel: 16 87 b3 ab

Jan 18 20:17:24 p5bplus kernel: sd 11:0:0:0: [sdl] ASC=0x11 ASCQ=0x4

Jan 18 20:17:24 p5bplus kernel: sd 11:0:0:0: [sdl] CDB: cdb[0]=0x28: 28 00 16 87 b3 00 00 02 00 00

Jan 18 20:17:24 p5bplus kernel: end_request: I/O error, dev sdl, sector 377992107

Jan 18 20:17:24 p5bplus kernel: Buffer I/O error on device sdl, logical block 47249013

Jan 18 20:17:24 p5bplus kernel: Buffer I/O error on device sdl, logical block 47249014

Jan 18 20:17:24 p5bplus kernel: Buffer I/O error on device sdl, logical block 47249015

Jan 18 20:17:24 p5bplus kernel: Buffer I/O error on device sdl, logical block 47249016

Jan 18 20:17:24 p5bplus kernel: Buffer I/O error on device sdl, logical block 47249017

Jan 18 20:17:24 p5bplus kernel: Buffer I/O error on device sdl, logical block 47249018

Jan 18 20:17:24 p5bplus kernel: Buffer I/O error on device sdl, logical block 47249019

Jan 18 20:17:24 p5bplus kernel: Buffer I/O error on device sdl, logical block 47249020

Jan 18 20:17:24 p5bplus kernel: Buffer I/O error on device sdl, logical block 47249021

Jan 18 20:17:24 p5bplus kernel: Buffer I/O error on device sdl, logical block 47249022

Jan 18 20:17:24 p5bplus kernel: ata11: EH complete

Jan 18 20:17:27 p5bplus kernel: ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0

Jan 18 20:17:27 p5bplus kernel: ata11.00: irq_stat 0x48000000

Jan 18 20:17:27 p5bplus kernel: ata11.00: failed command: READ FPDMA QUEUED

Jan 18 20:17:27 p5bplus kernel: ata11.00: cmd 60/08:00:a8:b3:87/00:00:16:00:00/40 tag 0 ncq 4096 in

Jan 18 20:17:27 p5bplus kernel: res 41/40:00:ab:b3:87/54:00:16:00:00/40 Emask 0x409 (media error) <F>

Jan 18 20:17:27 p5bplus kernel: ata11.00: status: { DRDY ERR }

Jan 18 20:17:27 p5bplus kernel: ata11.00: error: { UNC }

no clue what it means

disk is a WD EADS 1TB

on the jmicorn esata port

jmicron is set to AHCI

guess the disk is a gonner ??

SMART_status_Info_for_sdl.txt

syslog.txt

Joe L. · January 18, 2011

The disk has 44 sectors pending re-allocation.

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 44

The errors were reported when they were not able to be read. They will be re-allocated when next written. Most disks have several thousand spare sectors to use when re-allocating bad sectors.

Joe L.

removing BAD disk from array

Recommended Posts

sacretagent

Link to comment

Joe L.

Link to comment

sacretagent

Link to comment

sacretagent

Link to comment

Joe L.

Link to comment

Join the conversation