Failed disk - options


Recommended Posts

Somewhat annoyingly just a few weeks before I planned on removing it ( and three others ) I've a failed drive. I swear they must be mind readers.

 

I have space on other drives, that will not be removed, to hold this data temporarily.

 

What are my options at this point, I'm a bit confused having read the docs.

 

  • Can I get back to stable without this drive, parity is giving me the data currently
    • So can I migrate it then follow the shrink procedure without the drive working?
  • Should I just order some/all of the new drives a bit early ?
    • How do I go about adding in this situation, pre-clear all-replace failed-then add others?
    • Different order?
    • New drives will be much bigger
  • Something else

 

I've attached diagnostics.

 

Thanks in advance.

unraid-diagnostics-20181104-1547.zip

Link to comment

Disable disk looks fine, and there were errors on all disks:

Nov  4 13:31:13 unraid kernel: md: disk0 read error, sector=13820128
Nov  4 13:31:13 unraid kernel: md: disk1 read error, sector=13820136
Nov  4 13:31:13 unraid kernel: md: disk2 read error, sector=13820136
Nov  4 13:31:13 unraid kernel: md: disk3 read error, sector=13820136
Nov  4 13:31:13 unraid kernel: md: disk4 read error, sector=13820136
Nov  4 13:31:13 unraid kernel: md: disk5 read error, sector=13820136
Nov  4 13:31:13 unraid kernel: md: disk6 read error, sector=13820136

So most likely a power, controller, cable, etc problem.

Link to comment

Thanks, I was able to rebuild onto the same disk.

 

I've also bought a couple of Seagate Ironwolf 3TB to replace my 4 500GB WD RE3 as they are now over 8 years old anyway.

After seeing errors again today however these are staying in their wrappers for now.

 

Unfortunately I have not captured all the syslog of this, due to my own stupidity, but it appears my LSI card is resetting in some way and re-detecting all the disks. They then appear as new /dev/sd[a-z] devices. Array seems to find them but then I start seeing read errors.

 

Quote

Nov  5 18:20:49 unraid kernel: mpt2sas_cm0: removing handle(0x000d), sas_addr(0x4433221105000000)
Nov  5 18:20:49 unraid kernel: mpt2sas_cm0: enclosure logical id(0x500605b005fd5690), slot(6)
Nov  5 18:20:49 unraid kernel: sd 7:0:5:0: [sdg] Synchronizing SCSI cache
Nov  5 18:20:49 unraid kernel: sd 7:0:5:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00
Nov  5 18:20:49 unraid kernel: mpt2sas_cm0: removing handle(0x000e), sas_addr(0x4433221106000000)
Nov  5 18:20:49 unraid kernel: mpt2sas_cm0: enclosure logical id(0x500605b005fd5690), slot(5)
Nov  5 18:20:49 unraid kernel: sd 7:0:6:0: [sdh] Synchronizing SCSI cache
Nov  5 18:20:49 unraid kernel: sd 7:0:6:0: [sdh] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00
Nov  5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP ignored - already refreshing disk info.
Nov  5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP ignored - already refreshing disk info.
Nov  5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP ignored - already refreshing disk info.
Nov  5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP ignored - already refreshing disk info.
Nov  5 18:20:49 unraid kernel: mpt2sas_cm0: removing handle(0x000f), sas_addr(0x4433221107000000)
Nov  5 18:20:49 unraid kernel: mpt2sas_cm0: enclosure logical id(0x500605b005fd5690), slot(4)
Nov  5 18:20:49 unraid kernel: mpt2sas_cm0: sending message unit reset !!
Nov  5 18:20:49 unraid kernel: mpt2sas_cm0: message unit reset: SUCCESS
Nov  5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP ignored - already refreshing disk info.
Nov  5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP received, forcing refresh of disks info.
Nov  5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP ignored - already refreshing disk info.
Nov  5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP ignored - already refreshing disk info.
Nov  5 18:20:49 unraid kernel: vfio-pci 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Nov  5 18:20:50 unraid kernel: mpt2sas_cm1: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (32983920 kB)
Nov  5 18:20:50 unraid kernel: mpt2sas_cm1: CurrentHostPageSize is 0: Setting default host page size to 4k
Nov  5 18:20:50 unraid kernel: mpt2sas_cm1: MSI-X vectors supported: 1, no of cores: 8, max_msix_vectors: -1
Nov  5 18:20:50 unraid kernel: mpt2sas1-msix0: PCI-MSI-X enabled: IRQ 32
Nov  5 18:20:50 unraid kernel: mpt2sas_cm1: iomem(0x00000000f7700000), mapped(0x000000004f07acf1), size(16384)
Nov  5 18:20:50 unraid kernel: mpt2sas_cm1: ioport(0x000000000000d000), size(256)
Nov  5 18:20:50 unraid kernel: mpt2sas_cm1: CurrentHostPageSize is 0: Setting default host page size to 4k
Nov  5 18:20:50 unraid kernel: mpt2sas_cm1: Allocated physical memory: size(1687 kB)
Nov  5 18:20:50 unraid kernel: mpt2sas_cm1: Current Controller Queue Depth(3364),Max Controller Queue Depth(3432)
Nov  5 18:20:50 unraid kernel: mpt2sas_cm1: Scatter Gather Elements per IO(128)
Nov  5 18:20:50 unraid kernel: mpt2sas_cm1: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(07.39.02.00)
Nov  5 18:20:50 unraid kernel: mpt2sas_cm1: Protocol=(
Nov  5 18:20:50 unraid kernel: Initiator
Nov  5 18:20:50 unraid kernel: ,Target
Nov  5 18:20:50 unraid kernel: ),
Nov  5 18:20:50 unraid kernel: Capabilities=(
Nov  5 18:20:50 unraid kernel: TLR
Nov  5 18:20:50 unraid kernel: ,EEDP
Nov  5 18:20:50 unraid kernel: ,Snapshot Buffer
Nov  5 18:20:50 unraid kernel: ,Diag Trace Buffer
Nov  5 18:20:50 unraid kernel: ,Task Set Full
Nov  5 18:20:50 unraid kernel: ,NCQ
Nov  5 18:20:50 unraid kernel: )
Nov  5 18:20:50 unraid kernel: scsi host8: Fusion MPT SAS Host
Nov  5 18:20:50 unraid kernel: mpt2sas_cm1: sending port enable !!
Nov  5 18:20:50 unraid kernel: mpt2sas_cm1: host_add: handle(0x0001), sas_addr(0x500605b005fd5690), phys(8)
Nov  5 18:20:50 unraid kernel: mpt2sas_cm1: port enable: SUCCESS
Nov  5 18:20:50 unraid kernel: scsi 8:0:0:0: Direct-Access     ATA      WDC WD5002ABYS-0 3B03 PQ: 0 ANSI: 6
Nov  5 18:20:50 unraid kernel: scsi 8:0:0:0: SATA: handle(0x000c), sas_addr(0x4433221103000000), phy(3), device_name(0x50014ee2adeb8d57)
Nov  5 18:20:50 unraid kernel: scsi 8:0:0:0: enclosure logical id (0x500605b005fd5690), slot(0)
Nov  5 18:20:50 unraid kernel: scsi 8:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Nov  5 18:20:50 unraid kernel: sd 8:0:0:0: Attached scsi generic sg1 type 0
Nov  5 18:20:50 unraid kernel: scsi 8:0:1:0: Direct-Access     ATA      WDC WD5003ABYX-0 1S01 PQ: 0 ANSI: 6
Nov  5 18:20:50 unraid kernel: scsi 8:0:1:0: SATA: handle(0x0009), sas_addr(0x4433221100000000), phy(0), device_name(0x50014ee05848212f)
Nov  5 18:20:50 unraid kernel: scsi 8:0:1:0: enclosure logical id (0x500605b005fd5690), slot(3)
Nov  5 18:20:50 unraid kernel: scsi 8:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Nov  5 18:20:50 unraid kernel: sd 8:0:0:0: [sdk] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Nov  5 18:20:50 unraid kernel: sd 8:0:1:0: [sdl] Spinning up disk...
Nov  5 18:20:50 unraid kernel: sd 8:0:1:0: Attached scsi generic sg2 type 0
Nov  5 18:20:50 unraid kernel: scsi 8:0:2:0: Direct-Access     ATA      WDC WD5002ABYS-0 3B03 PQ: 0 ANSI: 6
Nov  5 18:20:50 unraid kernel: scsi 8:0:2:0: SATA: handle(0x000a), sas_addr(0x4433221101000000), phy(1), device_name(0x50014ee2adeb2498)
Nov  5 18:20:50 unraid kernel: scsi 8:0:2:0: enclosure logical id (0x500605b005fd5690), slot(2)
Nov  5 18:20:50 unraid kernel: scsi 8:0:2:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Nov  5 18:20:50 unraid kernel: sd 8:0:2:0: Attached scsi generic sg3 type 0
Nov  5 18:20:50 unraid kernel: sd 8:0:2:0: [sdm] Spinning up disk...
Nov  5 18:20:50 unraid kernel: scsi 8:0:3:0: Direct-Access     ATA      WDC WD5002ABYS-0 3B03 PQ: 0 ANSI: 6
Nov  5 18:20:50 unraid kernel: scsi 8:0:3:0: SATA: handle(0x000b), sas_addr(0x4433221102000000), phy(2), device_name(0x50014ee20340308a)
Nov  5 18:20:50 unraid kernel: scsi 8:0:3:0: enclosure logical id (0x500605b005fd5690), slot(1)
Nov  5 18:20:50 unraid kernel: scsi 8:0:3:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Nov  5 18:20:50 unraid kernel: sd 8:0:0:0: [sdk] Write Protect is off
Nov  5 18:20:50 unraid kernel: sd 8:0:0:0: [sdk] Mode Sense: 7f 00 10 08
Nov  5 18:20:50 unraid kernel: sd 8:0:3:0: [sdn] Spinning up disk...
Nov  5 18:20:50 unraid kernel: sd 8:0:3:0: Attached scsi generic sg4 type 0
Nov  5 18:20:50 unraid kernel: sd 8:0:0:0: [sdk] Write cache: enabled, read cache: enabled, supports DPO and FUA
Nov  5 18:20:50 unraid kernel: scsi 8:0:4:0: Direct-Access     ATA      WDC WD30EFRX-68E 0A82 PQ: 0 ANSI: 6
Nov  5 18:20:50 unraid kernel: scsi 8:0:4:0: SATA: handle(0x000d), sas_addr(0x4433221105000000), phy(5), device_name(0x50014ee2b87d1542)
Nov  5 18:20:50 unraid kernel: sd 8:0:1:0: [sdl] Spinning up disk...
Nov  5 18:20:50 unraid kernel: sd 8:0:1:0: Attached scsi generic sg2 type 0
Nov  5 18:20:50 unraid kernel: scsi 8:0:2:0: Direct-Access     ATA      WDC WD5002ABYS-0 3B03 PQ: 0 ANSI: 6
Nov  5 18:20:50 unraid kernel: scsi 8:0:2:0: SATA: handle(0x000a), sas_addr(0x4433221101000000), phy(1), device_name(0x50014ee2adeb2498)
Nov  5 18:20:50 unraid kernel: scsi 8:0:2:0: enclosure logical id (0x500605b005fd5690), slot(2)
Nov  5 18:20:50 unraid kernel: scsi 8:0:2:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Nov  5 18:20:50 unraid kernel: sd 8:0:2:0: Attached scsi generic sg3 type 0
Nov  5 18:20:50 unraid kernel: sd 8:0:2:0: [sdm] Spinning up disk...
Nov  5 18:20:50 unraid kernel: scsi 8:0:3:0: Direct-Access     ATA      WDC WD5002ABYS-0 3B03 PQ: 0 ANSI: 6
Nov  5 18:20:50 unraid kernel: scsi 8:0:3:0: SATA: handle(0x000b), sas_addr(0x4433221102000000), phy(2), device_name(0x50014ee20340308a)
Nov  5 18:20:50 unraid kernel: scsi 8:0:3:0: enclosure logical id (0x500605b005fd5690), slot(1)
Nov  5 18:20:50 unraid kernel: scsi 8:0:3:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Nov  5 18:20:50 unraid kernel: scsi 8:0:4:0: Direct-Access     ATA      WDC WD30EFRX-68E 0A82 PQ: 0 ANSI: 6
Nov  5 18:20:50 unraid kernel: scsi 8:0:4:0: SATA: handle(0x000d), sas_addr(0x4433221105000000), phy(5), device_name(0x50014ee2b87d1542)
Nov  5 18:20:50 unraid kernel: scsi 8:0:4:0: enclosure logical id (0x500605b005fd5690), slot(6)
Nov  5 18:20:50 unraid kernel: scsi 8:0:4:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Nov  5 18:20:50 unraid kernel: sd 8:0:4:0: Attached scsi generic sg5 type 0
Nov  5 18:20:50 unraid kernel: scsi 8:0:5:0: Direct-Access     ATA      WDC WD30EFRX-68A 0A80 PQ: 0 ANSI: 6
Nov  5 18:20:50 unraid kernel: scsi 8:0:5:0: SATA: handle(0x000e), sas_addr(0x4433221106000000), phy(6), device_name(0x50014ee60325df2b)
Nov  5 18:20:50 unraid kernel: scsi 8:0:5:0: enclosure logical id (0x500605b005fd5690), slot(5)

 

If this happens again I will capture this fully, but wanted to shut the system down quickly as the drives were unresponsive and didn't want to be in a rebuild again.

 

I am wondering is this is related to recent major changes to the mpt3sas driver, so may go back to 6.5 if it happens again to see if that fixes things.

 

Does anyone have any other ideas on this?

 

 

 

Link to comment

I finally worked this out after it happened again. 

 

I was trying to pass through a USB controller to a VM and didn't notice that the HBA was listed as the only device in the pass through list rather than the USB card.

 

As I was only expecting to see one device, the USB controller, I had just been selecting the single item in the list without reading it, which then obviously causes these issues when the VM is started.

 

I feel somewhat stupid, but I think there is a bug somewhere as the 'Other PCI devices' list is now empty, it no longer shows the HBA, and presumably never should.

 

So I'm left wondering how it ever got into the list in the first place.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.