5b7 degraded array / still allows writes


Recommended Posts

I had 2 read / 2 write errors occur on a disk.  The statistics for that drive do not update but writes to that drive are still allowed to proceed via a samba share on a user share.

 

Jul  6 23:26:13 Tower2 kernel: sd 1:0:2:0: [sdc] Device not ready
Jul  6 23:26:13 Tower2 kernel: sd 1:0:2:0: [sdc]  Result: hostbyte=0x00 driverbyte=0x08
Jul  6 23:26:13 Tower2 kernel: sd 1:0:2:0: [sdc]  Sense Key : 0x2 [current]
Jul  6 23:26:13 Tower2 kernel: sd 1:0:2:0: [sdc]  ASC=0x4 ASCQ=0x2
Jul  6 23:26:13 Tower2 kernel: sd 1:0:2:0: [sdc] CDB: cdb[0]=0x28: 28 00 00 04 00 40 00 00 08 00
Jul  6 23:26:13 Tower2 kernel: end_request: I/O error, dev sdc, sector 262208
Jul  6 23:26:13 Tower2 kernel: md: disk2 read error
Jul  6 23:26:13 Tower2 kernel: handle_stripe read error: 262144/2, count: 1
Jul  6 23:26:14 Tower2 kernel: sd 1:0:2:0: [sdc] Device not ready
Jul  6 23:26:14 Tower2 kernel: sd 1:0:2:0: [sdc]  Result: hostbyte=0x00 driverbyte=0x08
Jul  6 23:26:14 Tower2 kernel: sd 1:0:2:0: [sdc]  Sense Key : 0x2 [current]
Jul  6 23:26:14 Tower2 kernel: sd 1:0:2:0: [sdc]  ASC=0x4 ASCQ=0x2
Jul  6 23:26:14 Tower2 kernel: sd 1:0:2:0: [sdc] CDB: cdb[0]=0x2a: 2a 00 00 04 00 40 00 00 08 00
Jul  6 23:26:14 Tower2 kernel: end_request: I/O error, dev sdc, sector 262208
Jul  6 23:26:14 Tower2 kernel: md: disk2 write error
Jul  6 23:26:14 Tower2 kernel: handle_stripe write error: 262144/2, count: 1
Jul  6 23:26:14 Tower2 kernel: md: recovery thread woken up ...
Jul  6 23:26:14 Tower2 kernel: md: recovery thread has nothing to resync
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc] Device not ready
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc]  Result: hostbyte=0x00 driverbyte=0x08
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc]  Sense Key : 0x2 [current]
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc]  ASC=0x4 ASCQ=0x2
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc] CDB: cdb[0]=0x28: 28 00 00 00 00 c0 00 00 08 00
Jul  6 23:26:15 Tower2 kernel: end_request: I/O error, dev sdc, sector 192
Jul  6 23:26:15 Tower2 kernel: md: disk2 read error
Jul  6 23:26:15 Tower2 kernel: handle_stripe read error: 128/2, count: 1
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc] Device not ready
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc]  Result: hostbyte=0x00 driverbyte=0x08
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc]  Sense Key : 0x2 [current]
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc]  ASC=0x4 ASCQ=0x2
Jul  6 23:26:15 Tower2 kernel: sd 1:0:2:0: [sdc] CDB: cdb[0]=0x2a: 2a 00 00 00 00 c0 00 00 08 00
Jul  6 23:26:15 Tower2 kernel: end_request: I/O error, dev sdc, sector 192
Jul  6 23:26:15 Tower2 kernel: md: disk2 write error
Jul  6 23:26:15 Tower2 kernel: handle_stripe write error: 128/2, count: 1
Jul  6 23:37:01 Tower2 crond[1148]: ignoring /var/spool/cron/crontabs/root- (non-existent user)

 

 

I've read the system should be in a read only mode.  This feels like a bug.

 

Also given this is a test array, what is the best way to "clear" the error without replacing the drive?

 

EDIT -- and the parity check buttons are gone

Link to comment

That's exactly how it should work. The disk is being simulated using all the other array disks. The writes are done by updating the parity only. The whole purpose of the parity is to allow any single data disk to be simulated or rebuilt without losing data.

 

The parity check button is gone because you need every data disk to be healthy before you can perform a successful parity build or parity check.

 

Not sure what you mean by "clearing" the error. You have to either replace the disk and let the data be rebuilt or abandon the disk and all the data on it by initializing the array without it.

 

Peter

Link to comment

ok that makes sense.  By "clear" i mean simulate replacing the drive :)  I don't believe the drive is bad. :D  I think it's a ESXi issue.

 

 

same method as if it was actually devective, except you need to fool unRAID  into thinking the old disk has been replaced.  To do that you must get it to forget the model/serial number of the old disk.  To do that:

stop the array

un-assign the failed disk.

start the array with it un-assigned. (this will cause unRAID to forget the model/serial number of the old disk)

stop the array

re-assign the old disk. unRAID will think it is a replacement.

start the array, unRAID will re-construct the contents of the old disk onto itself.

 

Joe L.

Link to comment

Thanks!

 

There is something weird going on for sure.  I added another disk to the LSI 9211 controller, added it as a raw virtual disk and mapped it into Unraid.  It completed the re-construction successfully, but when I checked the array after I got home, I saw the same errors on a different physical disk.

 

I am wondering if there is some kind of spin up delay that the controller / driver is not waiting long enough and the driver thinks it is a physical error.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.