h1d3m3 Posted May 28, 2021 Share Posted May 28, 2021 Sigh, brand new system. Have run preclear a few times on this drive previously with no problem. System just reported the errors below. The array is still up (single parity disk has taken over) but it looks like all disks are now pretty busy with writes. Ran a short SMART test on that drive and it passed. Ironically, I have a different replacement drive arriving in a few days which I was going to add for additional parity, but now I want to be cautious before I lose more. What should I do next? May 28 11:58:07 secant kernel: sd 14:0:0:0: [sdi] tag#1936 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=2s May 28 11:58:07 secant kernel: sd 14:0:0:0: [sdi] tag#1936 Sense Key : 0x2 [current] May 28 11:58:07 secant kernel: sd 14:0:0:0: [sdi] tag#1936 ASC=0x4 ASCQ=0x0 May 28 11:58:07 secant kernel: sd 14:0:0:0: [sdi] tag#1936 CDB: opcode=0x88 88 00 00 00 00 00 d7 48 9c b0 00 00 01 00 00 00 May 28 11:58:07 secant kernel: blk_update_request: I/O error, dev sdi, sector 3611860144 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860080 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860088 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860096 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860104 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860112 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860120 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860128 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860136 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860144 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860152 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860160 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860168 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860176 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860184 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860192 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860200 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860208 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860216 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860224 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860232 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860240 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860248 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860256 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860264 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860272 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860280 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860288 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860296 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860304 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860312 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860320 May 28 11:58:07 secant kernel: md: disk2 read error, sector=3611860328 May 28 11:58:08 secant kernel: sd 14:0:0:0: [sdi] tag#1941 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s May 28 11:58:08 secant kernel: sd 14:0:0:0: [sdi] tag#1941 Sense Key : 0x2 [current] May 28 11:58:08 secant kernel: sd 14:0:0:0: [sdi] tag#1941 ASC=0x4 ASCQ=0x0 May 28 11:58:08 secant kernel: sd 14:0:0:0: [sdi] tag#1941 CDB: opcode=0x8a 8a 00 00 00 00 00 d7 48 9c b0 00 00 01 00 00 00 May 28 11:58:08 secant kernel: blk_update_request: I/O error, dev sdi, sector 3611860144 op 0x1:(WRITE) flags 0x0 phys_seg 32 prio class 0 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860080 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860088 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860096 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860104 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860112 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860120 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860128 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860136 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860144 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860152 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860160 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860168 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860176 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860184 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860192 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860200 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860208 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860216 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860224 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860232 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860240 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860248 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860256 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860264 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860272 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860280 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860288 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860296 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860304 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860312 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860320 May 28 11:58:08 secant kernel: md: disk2 write error, sector=3611860328 Quote Link to comment
JorgeB Posted May 28, 2021 Share Posted May 28, 2021 Please post the diagnostics: Tools -> Diagnostics Quote Link to comment
h1d3m3 Posted May 28, 2021 Author Share Posted May 28, 2021 secant-diagnostics-20210528-1301-2.zip Quote Link to comment
JorgeB Posted May 28, 2021 Share Posted May 28, 2021 Disk look healthy, if the emulated disk keeps mounting I would recommend replacing/swapping cables/slot to rule that out and rebuild on top. Also syslog is not complete, can't see the LSI firmware version, check that it's using the latest one, 20.00.07.00 Quote Link to comment
h1d3m3 Posted May 28, 2021 Author Share Posted May 28, 2021 Thanks for the feedback. A couple things: > LSI firmware version - For both HBAs May 19 21:33:35 secant kernel: mpt2sas_cm0: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00) > syslog is not complete I included more (hopefully helpful) boot details in the attached syslog. > if the emulated disk keeps mounting I guess I have some questions. How do I know when the emulated drive is done mounting? (i.e. things are stable) Is it ok to reboot at this point? Should I do a full SMART test (or pre-clear?) of that disk before trying to bring the drive back into the array? What are the steps to get that drive back into the array, if it seems safe to do so? Is it safest to just do nothing and wait 4 more days when the additional parity drive will be here? (maybe bring that into the array first) Thanks for the advice. Sucks to have to deal with this so soon into this process 😞 syslog1.zip Quote Link to comment
JorgeB Posted May 28, 2021 Share Posted May 28, 2021 9 minutes ago, h1d3m3 said: How do I know when the emulated drive is done mounting? (i.e. things are stable) Stop and re-start the array (or reboot), if it still mounts after that it should be fine. 10 minutes ago, h1d3m3 said: Is it ok to reboot at this point? Yep. 10 minutes ago, h1d3m3 said: Should I do a full SMART test (or pre-clear?) of that disk before trying to bring the drive back into the array? Not really needed IMHO, but won't hurt, except for keeping the array degraded longer than needed. 11 minutes ago, h1d3m3 said: What are the steps to get that drive back into the array, if it seems safe to do so? Stop array, unassign that disk, start array, stop array, re-assign disk, start array to begin rebuild. 11 minutes ago, h1d3m3 said: Is it safest to just do nothing and wait 4 more days when the additional parity drive will be here? (maybe bring that into the array first) No, rebuild as soon as possible. Quote Link to comment
h1d3m3 Posted May 28, 2021 Author Share Posted May 28, 2021 (edited) Awesome help. Thanks. stopped, unassigned, started, stopped, assigned and started again without issues. Rebuild is in progress. I attached the drive using a different cable. We'll see if that had anything to do with the problem. I am used to ZFS where some of the minor issues around transient read/write errors or sector problems we're handled without much intervention. In my experience, the disk pretty much has to be dead before ZFS required any action. Is Unraid a bit more sensitive? Your statement above: > Disk look healthy Was it the error message or SMART diags or just experience that led you to say that? I realize the disk might still have issues, but I'd love to understand how you came to that conclusion. (I'm pretty technical, lay it on me 🙂 ) Cheers. Edited May 28, 2021 by h1d3m3 Quote Link to comment
JorgeB Posted May 29, 2021 Share Posted May 29, 2021 10 hours ago, h1d3m3 said: Is Unraid a bit more sensitive? Yes, Unraid can't auto heal a write error, in part because each disk is an independent filesystem, it can still auto heal a read error (if the re-writes are successful). 10 hours ago, h1d3m3 said: Was it the error message or SMART diags or just experience that led you to say that? Yes, SMART report shows nothing of concern for now. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.