sexyj Posted March 30, 2022 Share Posted March 30, 2022 (edited) I am having this issue where one of the drive in the array after a few weeks will get disabled. This can only be fixed by resetting the array config and let unraid rebuild the parity data. I have swapped the raid card, the sata cable that is connected from the backpane to the controller. In the syslog I'm seeing on the disabled drive Mar 26 03:16:33 FS1 kernel: blk_update_request: critical target error, dev sdk, sector 1954442736 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0 Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1954442672 Mar 26 03:16:33 FS1 kernel: blk_update_request: critical target error, dev sdk, sector 3907028992 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 Mar 26 03:16:33 FS1 kernel: Buffer I/O error on dev sdk, logical block 488378624, async page read Mar 26 03:16:33 FS1 kernel: blk_update_request: critical target error, dev sdk, sector 1302466384 op 0x1:(WRITE) flags 0x0 phys_seg 12 prio class 0 Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466320 Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466328 Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466336 Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466344 Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466352 Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466360 Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466368 Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466376 Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466384 Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466392 Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466400 Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466408 No smart error on the drive. What is the likely culprit here? Edited March 30, 2022 by sexyj Quote Link to comment
itimpi Posted March 30, 2022 Share Posted March 30, 2022 Those error messages strongly suggest there is a physical problem with that drive. You are likely to get better informed feedback if you post your system’s diagnostics zip file. Quote Link to comment
sexyj Posted March 31, 2022 Author Share Posted March 31, 2022 I'm rebuilding the array with a new drive replacing the questionable drive. Will see in a few weeks if it happens again Quote Link to comment
sexyj Posted May 3, 2022 Author Share Posted May 3, 2022 So another drive (completely healthy according to SMART) just got disabled... Disk 9 disabled * healthy Going to replace it once more with another disk. Quote Link to comment
trurl Posted May 3, 2022 Share Posted May 3, 2022 On 3/30/2022 at 3:09 PM, sexyj said: This can only be fixed by resetting the array config and let unraid rebuild the parity data. The usual way is to rebuild the disabled disk. The initial failed write, and any subsequent writes to the emulated disk, can be recovered if you rebuild the data disk. If you New Config, rebuild parity instead all those writes are lost. Conceivably this could even result in filesystem corruption. Attach diagnostics to your NEXT post in this thread. Quote Link to comment
sexyj Posted May 3, 2022 Author Share Posted May 3, 2022 15 hours ago, trurl said: The usual way is to rebuild the disabled disk. The initial failed write, and any subsequent writes to the emulated disk, can be recovered if you rebuild the data disk. If you New Config, rebuild parity instead all those writes are lost. Conceivably this could even result in filesystem corruption. Attach diagnostics to your NEXT post in this thread. Rebuilt the array with a new drive and the drive slot is still disabled. Attached the diag fs1-diagnostics-20220503-1354.zip Quote Link to comment
Squid Posted May 3, 2022 Share Posted May 3, 2022 SATA connections are by design and definition terrible. It doesn't take much for them to lose connection (or have an intermittent connection). Probably when you replaced disk 11 you slightly jarred disk 9's connectors. Reseat them (and power), minimize the usage of any power splitters, and do not tie strap cabling trying to make things pretty Quote Link to comment
sexyj Posted May 30, 2022 Author Share Posted May 30, 2022 - I bought 3 new disks to replace the ones that had smart error. - I have 2 out of 3 new disks in Slot 9 and 12. - Create a new Array under Tools -> New config. - When I go to start the array, in the logs I'm getting the following. Quote May 30 10:32:58 FS1 kernel: md: disk12 read error, sector=18862216 and after a hard reboot I would get disk 9 read error. I'm fairly convinced that there is some sort of hardware issue with the controller ? Any thoughts? Quote Link to comment
sexyj Posted May 30, 2022 Author Share Posted May 30, 2022 (edited) So if I don't have both disk 9 and 12 in the array, no read error. Spoke too soon. Now disk 6 is having read error. Quote May 30 10:59:08 FS1 kernel: md: disk6 read error, sector=37636464 May 30 10:59:08 FS1 kernel: sd 4:0:3:0: Power-on or device reset occurred May 30 10:59:09 FS1 kernel: sd 4:0:3:0: Power-on or device reset occurred May 30 10:59:32 FS1 kernel: sd 12:0:4:0: attempting task abort!scmd(0x00000000a15e74f8), outstanding for 30171 ms & timeout 30000 ms May 30 10:59:32 FS1 kernel: sd 12:0:4:0: [sdp] tag#362 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 May 30 10:59:32 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 10:59:32 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 10:59:33 FS1 kernel: sd 12:0:4:0: task abort: SUCCESS scmd(0x00000000a15e74f8) May 30 10:59:33 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred May 30 10:59:32 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 10:59:32 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 10:59:33 FS1 kernel: sd 12:0:4:0: task abort: SUCCESS scmd(0x00000000a15e74f8) May 30 10:59:33 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred May 30 11:00:03 FS1 kernel: sd 12:0:4:0: attempting device reset! scmd(0x00000000a15e74f8) May 30 11:00:03 FS1 kernel: sd 12:0:4:0: [sdp] tag#37 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 May 30 11:00:03 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 11:00:03 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 11:00:04 FS1 kernel: sd 12:0:4:0: device reset: FAILED scmd(0x00000000a15e74f8) May 30 11:00:04 FS1 kernel: scsi target12:0:4: attempting target reset! scmd(0x00000000a15e74f8) May 30 11:00:04 FS1 kernel: sd 12:0:4:0: [sdp] tag#37 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 May 30 11:00:04 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 11:00:04 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 11:00:04 FS1 kernel: scsi target12:0:4: target reset: SUCCESS scmd(0x00000000a15e74f8) May 30 11:00:04 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred May 30 11:00:35 FS1 kernel: sd 12:0:4:0: attempting device reset! scmd(0x00000000a15e74f8) May 30 11:00:35 FS1 kernel: sd 12:0:4:0: [sdp] tag#251 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 May 30 11:00:35 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 11:00:35 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 11:00:35 FS1 kernel: sd 12:0:4:0: device reset: FAILED scmd(0x00000000a15e74f8) May 30 11:00:35 FS1 kernel: scsi target12:0:4: attempting target reset! scmd(0x00000000a15e74f8) May 30 11:00:35 FS1 kernel: sd 12:0:4:0: [sdp] tag#251 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 May 30 11:00:35 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 11:00:35 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 11:00:35 FS1 kernel: scsi target12:0:4: target reset: SUCCESS scmd(0x00000000a15e74f8) May 30 11:00:35 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred May 30 11:01:06 FS1 kernel: sd 12:0:4:0: attempting device reset! scmd(0x00000000a15e74f8) May 30 11:01:06 FS1 kernel: sd 12:0:4:0: [sdp] tag#228 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 May 30 11:01:06 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 11:01:06 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 11:01:07 FS1 kernel: sd 12:0:4:0: device reset: FAILED scmd(0x00000000a15e74f8) May 30 11:01:07 FS1 kernel: scsi target12:0:4: attempting target reset! scmd(0x00000000a15e74f8) May 30 11:01:07 FS1 kernel: sd 12:0:4:0: [sdp] tag#228 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 May 30 11:01:07 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 11:01:07 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 11:01:07 FS1 kernel: scsi target12:0:4: target reset: SUCCESS scmd(0x00000000a15e74f8) May 30 11:01:07 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred PSU issue? Edited May 30, 2022 by sexyj Quote Link to comment
sexyj Posted May 30, 2022 Author Share Posted May 30, 2022 Had a new array without disk 12 in. Then when I try to stop the array during parity, I'm getting the following Quote May 30 11:13:38 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 11:13:38 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 11:13:39 FS1 kernel: sd 12:0:4:0: device reset: FAILED scmd(0x00000000420e530a) May 30 11:13:39 FS1 kernel: scsi target12:0:4: attempting target reset! scmd(0x00000000420e530a) May 30 11:13:39 FS1 kernel: sd 12:0:4:0: [sdp] tag#1119 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00 May 30 11:13:39 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 11:13:39 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 11:13:39 FS1 kernel: scsi target12:0:4: target reset: SUCCESS scmd(0x00000000420e530a) May 30 11:13:39 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred May 30 11:14:10 FS1 kernel: sd 12:0:4:0: attempting device reset! scmd(0x00000000420e530a) May 30 11:14:10 FS1 kernel: sd 12:0:4:0: [sdp] tag#205 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00 May 30 11:14:10 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 11:14:10 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 11:14:10 FS1 kernel: sd 12:0:4:0: device reset: FAILED scmd(0x00000000420e530a) May 30 11:14:10 FS1 kernel: scsi target12:0:4: attempting target reset! scmd(0x00000000420e530a) May 30 11:14:10 FS1 kernel: sd 12:0:4:0: [sdp] tag#205 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00 May 30 11:14:10 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 11:14:10 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 11:14:11 FS1 kernel: scsi target12:0:4: target reset: SUCCESS scmd(0x00000000420e530a) May 30 11:14:11 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred May 30 11:14:41 FS1 kernel: sd 12:0:4:0: attempting device reset! scmd(0x00000000420e530a) May 30 11:14:41 FS1 kernel: sd 12:0:4:0: [sdp] tag#32 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00 May 30 11:14:41 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 11:14:41 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 11:14:42 FS1 kernel: sd 12:0:4:0: device reset: FAILED scmd(0x00000000420e530a) May 30 11:14:42 FS1 kernel: scsi target12:0:4: attempting target reset! scmd(0x00000000420e530a) May 30 11:14:42 FS1 kernel: sd 12:0:4:0: [sdp] tag#32 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00 May 30 11:14:42 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 11:14:42 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 11:14:42 FS1 kernel: scsi target12:0:4: target reset: SUCCESS scmd(0x00000000420e530a) May 30 11:14:42 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred May 30 11:15:12 FS1 kernel: sd 12:0:4:0: attempting device reset! scmd(0x00000000420e530a) May 30 11:15:12 FS1 kernel: sd 12:0:4:0: [sdp] tag#33 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00 May 30 11:15:12 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6) May 30 11:15:12 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) May 30 11:15:13 FS1 kernel: sd 12:0:4:0: device reset: FAILED scmd(0x00000000420e530a) May 30 11:15:13 FS1 kernel: scsi target12:0:4: attempting target reset! scmd(0x00000000420e530a) May 30 11:15:13 FS1 kernel: sd 12:0:4:0: [sdp] tag#33 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00 sdp is a SMART failed drive connected to the server but not mounted. Quote Link to comment
sexyj Posted May 30, 2022 Author Share Posted May 30, 2022 Lowered the array size from 30 to 15... no error so far. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.