Jump to content

Drive in array keep getting disabled.


Recommended Posts

I am having this issue where one of the drive in the array after a few weeks will get disabled. This can only be fixed by resetting the array config and let unraid rebuild the parity data. 

 

I have swapped the raid card, the sata cable that is connected from the backpane to the controller.

 

In the syslog I'm seeing on the disabled drive


 

Mar 26 03:16:33 FS1 kernel: blk_update_request: critical target error, dev sdk, sector 1954442736 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0
Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1954442672

Mar 26 03:16:33 FS1 kernel: blk_update_request: critical target error, dev sdk, sector 3907028992 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Mar 26 03:16:33 FS1 kernel: Buffer I/O error on dev sdk, logical block 488378624, async page read

Mar 26 03:16:33 FS1 kernel: blk_update_request: critical target error, dev sdk, sector 1302466384 op 0x1:(WRITE) flags 0x0 phys_seg 12 prio class 0
Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466320
Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466328
Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466336
Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466344
Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466352
Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466360
Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466368
Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466376
Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466384
Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466392
Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466400
Mar 26 03:16:33 FS1 kernel: md: disk11 write error, sector=1302466408

 

No smart error on the drive.

 

What is the likely culprit here? 

Edited by sexyj
Link to comment
  • 1 month later...
On 3/30/2022 at 3:09 PM, sexyj said:

This can only be fixed by resetting the array config and let unraid rebuild the parity data. 

The usual way is to rebuild the disabled disk. The initial failed write, and any subsequent writes to the emulated disk, can be recovered if you rebuild the data disk. If you New Config, rebuild parity instead all those writes are lost. Conceivably this could even result in filesystem corruption.

 

Attach diagnostics to your NEXT post in this thread.

Link to comment
15 hours ago, trurl said:

The usual way is to rebuild the disabled disk. The initial failed write, and any subsequent writes to the emulated disk, can be recovered if you rebuild the data disk. If you New Config, rebuild parity instead all those writes are lost. Conceivably this could even result in filesystem corruption.

 

Attach diagnostics to your NEXT post in this thread.

 

Rebuilt the array with a new drive and the drive slot is still disabled.

 

Attached the diag

fs1-diagnostics-20220503-1354.zip

Link to comment

SATA connections are by design and definition terrible.  It doesn't take much for them to lose connection (or have an intermittent connection).

 

Probably when you replaced disk 11 you slightly jarred disk 9's connectors.

 

Reseat them (and power), minimize the usage of any power splitters, and do not tie strap cabling trying to make things pretty

Link to comment
  • 4 weeks later...

- I bought 3 new disks to replace the ones that had smart error.

- I have 2 out of 3 new disks in Slot 9 and 12.

- Create a new Array under Tools -> New config.

- When I go to start the array, in the logs I'm getting the following.


 

Quote

 

May 30 10:32:58 FS1 kernel: md: disk12 read error, sector=18862216


 

and after a hard reboot

I would get disk 9 read error.

 

I'm fairly convinced that there is some sort of hardware issue with the controller ? 

 

Any thoughts?

Link to comment

So if I don't have both disk 9 and 12 in the array, no read error.

 

Spoke too soon. Now disk 6 is having read error.

 

Quote

May 30 10:59:08 FS1 kernel: md: disk6 read error, sector=37636464
May 30 10:59:08 FS1 kernel: sd 4:0:3:0: Power-on or device reset occurred
May 30 10:59:09 FS1 kernel: sd 4:0:3:0: Power-on or device reset occurred
May 30 10:59:32 FS1 kernel: sd 12:0:4:0: attempting task abort!scmd(0x00000000a15e74f8), outstanding for 30171 ms & timeout 30000 ms
May 30 10:59:32 FS1 kernel: sd 12:0:4:0: [sdp] tag#362 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
May 30 10:59:32 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 10:59:32 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 10:59:33 FS1 kernel: sd 12:0:4:0: task abort: SUCCESS scmd(0x00000000a15e74f8)
May 30 10:59:33 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred

May 30 10:59:32 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 10:59:32 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 10:59:33 FS1 kernel: sd 12:0:4:0: task abort: SUCCESS scmd(0x00000000a15e74f8)
May 30 10:59:33 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred
May 30 11:00:03 FS1 kernel: sd 12:0:4:0: attempting device reset! scmd(0x00000000a15e74f8)
May 30 11:00:03 FS1 kernel: sd 12:0:4:0: [sdp] tag#37 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
May 30 11:00:03 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 11:00:03 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 11:00:04 FS1 kernel: sd 12:0:4:0: device reset: FAILED scmd(0x00000000a15e74f8)
May 30 11:00:04 FS1 kernel: scsi target12:0:4: attempting target reset! scmd(0x00000000a15e74f8)
May 30 11:00:04 FS1 kernel: sd 12:0:4:0: [sdp] tag#37 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
May 30 11:00:04 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 11:00:04 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 11:00:04 FS1 kernel: scsi target12:0:4: target reset: SUCCESS scmd(0x00000000a15e74f8)
May 30 11:00:04 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred
May 30 11:00:35 FS1 kernel: sd 12:0:4:0: attempting device reset! scmd(0x00000000a15e74f8)
May 30 11:00:35 FS1 kernel: sd 12:0:4:0: [sdp] tag#251 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
May 30 11:00:35 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 11:00:35 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 11:00:35 FS1 kernel: sd 12:0:4:0: device reset: FAILED scmd(0x00000000a15e74f8)
May 30 11:00:35 FS1 kernel: scsi target12:0:4: attempting target reset! scmd(0x00000000a15e74f8)
May 30 11:00:35 FS1 kernel: sd 12:0:4:0: [sdp] tag#251 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
May 30 11:00:35 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 11:00:35 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 11:00:35 FS1 kernel: scsi target12:0:4: target reset: SUCCESS scmd(0x00000000a15e74f8)
May 30 11:00:35 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred
May 30 11:01:06 FS1 kernel: sd 12:0:4:0: attempting device reset! scmd(0x00000000a15e74f8)
May 30 11:01:06 FS1 kernel: sd 12:0:4:0: [sdp] tag#228 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
May 30 11:01:06 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 11:01:06 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 11:01:07 FS1 kernel: sd 12:0:4:0: device reset: FAILED scmd(0x00000000a15e74f8)
May 30 11:01:07 FS1 kernel: scsi target12:0:4: attempting target reset! scmd(0x00000000a15e74f8)
May 30 11:01:07 FS1 kernel: sd 12:0:4:0: [sdp] tag#228 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
May 30 11:01:07 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 11:01:07 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 11:01:07 FS1 kernel: scsi target12:0:4: target reset: SUCCESS scmd(0x00000000a15e74f8)
May 30 11:01:07 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred

 

PSU issue? 

Edited by sexyj
Link to comment

Had a new array without disk 12 in.

 

Then when I try to stop the array during parity, I'm getting the following

 

Quote

May 30 11:13:38 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 11:13:38 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 11:13:39 FS1 kernel: sd 12:0:4:0: device reset: FAILED scmd(0x00000000420e530a)
May 30 11:13:39 FS1 kernel: scsi target12:0:4: attempting target reset! scmd(0x00000000420e530a)
May 30 11:13:39 FS1 kernel: sd 12:0:4:0: [sdp] tag#1119 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
May 30 11:13:39 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 11:13:39 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 11:13:39 FS1 kernel: scsi target12:0:4: target reset: SUCCESS scmd(0x00000000420e530a)
May 30 11:13:39 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred
May 30 11:14:10 FS1 kernel: sd 12:0:4:0: attempting device reset! scmd(0x00000000420e530a)
May 30 11:14:10 FS1 kernel: sd 12:0:4:0: [sdp] tag#205 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
May 30 11:14:10 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 11:14:10 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 11:14:10 FS1 kernel: sd 12:0:4:0: device reset: FAILED scmd(0x00000000420e530a)
May 30 11:14:10 FS1 kernel: scsi target12:0:4: attempting target reset! scmd(0x00000000420e530a)
May 30 11:14:10 FS1 kernel: sd 12:0:4:0: [sdp] tag#205 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
May 30 11:14:10 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 11:14:10 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 11:14:11 FS1 kernel: scsi target12:0:4: target reset: SUCCESS scmd(0x00000000420e530a)
May 30 11:14:11 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred
May 30 11:14:41 FS1 kernel: sd 12:0:4:0: attempting device reset! scmd(0x00000000420e530a)
May 30 11:14:41 FS1 kernel: sd 12:0:4:0: [sdp] tag#32 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
May 30 11:14:41 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 11:14:41 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 11:14:42 FS1 kernel: sd 12:0:4:0: device reset: FAILED scmd(0x00000000420e530a)
May 30 11:14:42 FS1 kernel: scsi target12:0:4: attempting target reset! scmd(0x00000000420e530a)
May 30 11:14:42 FS1 kernel: sd 12:0:4:0: [sdp] tag#32 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
May 30 11:14:42 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 11:14:42 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 11:14:42 FS1 kernel: scsi target12:0:4: target reset: SUCCESS scmd(0x00000000420e530a)
May 30 11:14:42 FS1 kernel: sd 12:0:4:0: Power-on or device reset occurred
May 30 11:15:12 FS1 kernel: sd 12:0:4:0: attempting device reset! scmd(0x00000000420e530a)
May 30 11:15:12 FS1 kernel: sd 12:0:4:0: [sdp] tag#33 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
May 30 11:15:12 FS1 kernel: scsi target12:0:4: handle(0x000a), sas_address(0x4433221106000000), phy(6)
May 30 11:15:12 FS1 kernel: scsi target12:0:4: enclosure logical id(0x500605b004c40d70), slot(1) 
May 30 11:15:13 FS1 kernel: sd 12:0:4:0: device reset: FAILED scmd(0x00000000420e530a)
May 30 11:15:13 FS1 kernel: scsi target12:0:4: attempting target reset! scmd(0x00000000420e530a)
May 30 11:15:13 FS1 kernel: sd 12:0:4:0: [sdp] tag#33 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00

 

sdp is a SMART failed drive connected to the server but not mounted.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...