5b7 degraded array / still allows writes

MrD1234 · July 8, 2011

I had 2 read / 2 write errors occur on a disk. The statistics for that drive do not update but writes to that drive are still allowed to proceed via a samba share on a user share.

Jul  6 23:26:13 Tower2 kernel: sd 1:0:2:0: [sdc] Device not ready
Jul  6 23:26:13 Tower2 kernel: sd 1:0:2:0: [sdc]  Result: hostbyte=0x00 driverbyte=0x08
Jul  6 23:26:13 Tower2 kernel: sd 1:0:2:0: [sdc]  Sense Key : 0x2 [current]
Jul  6 23:26:13 Tower2 kernel: sd 1:0:2:0: [sdc]  ASC=0x4 ASCQ=0x2
Jul  6 23:26:13 Tower2 kernel: sd 1:0:2:0: [sdc] CDB: cdb[0]=0x28: 28 00 00 04 00 40 00 00 08 00
Jul  6 23:26:13 Tower2 kernel: end_request: I/O error, dev sdc, sector 262208
Jul  6 23:26:13 Tower2 kernel: md: disk2 read error
Jul  6 23:26:13 Tower2 kernel: handle_stripe read error: 262144/2, count: 1
Jul  6 23:26:14 Tower2 kernel: sd 1:0:2:0: [sdc] Device not ready
Jul  6 23:26:14 Tower2 kernel: sd 1:0:2:0: [sdc]  Result: hostbyte=0x00 driverbyte=0x08
Jul  6 23:26:14 Tower2 kernel: sd 1:0:2:0: [sdc]  Sense Key : 0x2 [current]
Jul  6 23:26:14 Tower2 kernel: sd 1:0:2:0: [sdc]  ASC=0x4 ASCQ=0x2
Jul  6 23:26:14 Tower2 kernel: sd 1:0:2:0: [sdc] CDB: cdb[0]=0x2a: 2a 00 00 04 00 40 00 00 08 00
Jul  6 23:26:14 Tower2 kernel: end_request: I/O error, dev sdc, sector 262208
Jul  6 23:26:14 Tower2 kernel: md: disk2 write error
Jul  6 23:26:14 Tower2 kernel: handle_stripe write error: 262144/2, count: 1
Jul  6 23:26:14 Tower2 kernel: md: recovery thread woken up ...
Jul  6 23:26:14 Tower2 kernel: md: recovery thread has nothing to resync
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc] Device not ready
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc]  Result: hostbyte=0x00 driverbyte=0x08
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc]  Sense Key : 0x2 [current]
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc]  ASC=0x4 ASCQ=0x2
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc] CDB: cdb[0]=0x28: 28 00 00 00 00 c0 00 00 08 00
Jul  6 23:26:15 Tower2 kernel: end_request: I/O error, dev sdc, sector 192
Jul  6 23:26:15 Tower2 kernel: md: disk2 read error
Jul  6 23:26:15 Tower2 kernel: handle_stripe read error: 128/2, count: 1
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc] Device not ready
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc]  Result: hostbyte=0x00 driverbyte=0x08
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc]  Sense Key : 0x2 [current]
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc]  ASC=0x4 ASCQ=0x2
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc] CDB: cdb[0]=0x2a: 2a 00 00 00 00 c0 00 00 08 00
Jul  6 23:26:15 Tower2 kernel: end_request: I/O error, dev sdc, sector 192
Jul  6 23:26:15 Tower2 kernel: md: disk2 write error
Jul  6 23:26:15 Tower2 kernel: handle_stripe write error: 128/2, count: 1
Jul  6 23:37:01 Tower2 crond[1148]: ignoring /var/spool/cron/crontabs/root- (non-existent user)

I've read the system should be in a read only mode. This feels like a bug.

Also given this is a test array, what is the best way to "clear" the error without replacing the drive?

EDIT -- and the parity check buttons are gone

lionelhutz · July 8, 2011

That's exactly how it should work. The disk is being simulated using all the other array disks. The writes are done by updating the parity only. The whole purpose of the parity is to allow any single data disk to be simulated or rebuilt without losing data.

The parity check button is gone because you need every data disk to be healthy before you can perform a successful parity build or parity check.

Not sure what you mean by "clearing" the error. You have to either replace the disk and let the data be rebuilt or abandon the disk and all the data on it by initializing the array without it.

Peter

MrD1234 · July 8, 2011

ok that makes sense. By "clear" i mean simulate replacing the drive I don't believe the drive is bad. I think it's a ESXi issue.

Joe L. · July 8, 2011

ok that makes sense. By "clear" i mean simulate replacing the drive I don't believe the drive is bad. I think it's a ESXi issue.

same method as if it was actually devective, except you need to fool unRAID into thinking the old disk has been replaced. To do that you must get it to forget the model/serial number of the old disk. To do that:

stop the array

un-assign the failed disk.

start the array with it un-assigned. (this will cause unRAID to forget the model/serial number of the old disk)

stop the array

re-assign the old disk. unRAID will think it is a replacement.

start the array, unRAID will re-construct the contents of the old disk onto itself.

Joe L.

MrD1234 · July 8, 2011

Thanks!

There is something weird going on for sure. I added another disk to the LSI 9211 controller, added it as a raw virtual disk and mapped it into Unraid. It completed the re-construction successfully, but when I checked the array after I got home, I saw the same errors on a different physical disk.

I am wondering if there is some kind of spin up delay that the controller / driver is not waiting long enough and the driver thinks it is a physical error.

5b7 degraded array / still allows writes

Recommended Posts

MrD1234

Link to comment

lionelhutz

Link to comment

MrD1234

Link to comment

Joe L.

Link to comment

MrD1234

Link to comment

Archived