dsmith44 Posted November 4, 2018 Share Posted November 4, 2018 Somewhat annoyingly just a few weeks before I planned on removing it ( and three others ) I've a failed drive. I swear they must be mind readers. I have space on other drives, that will not be removed, to hold this data temporarily. What are my options at this point, I'm a bit confused having read the docs. Can I get back to stable without this drive, parity is giving me the data currently So can I migrate it then follow the shrink procedure without the drive working? Should I just order some/all of the new drives a bit early ? How do I go about adding in this situation, pre-clear all-replace failed-then add others? Different order? New drives will be much bigger Something else I've attached diagnostics. Thanks in advance. unraid-diagnostics-20181104-1547.zip Quote Link to comment
JorgeB Posted November 4, 2018 Share Posted November 4, 2018 Disable disk looks fine, and there were errors on all disks: Nov 4 13:31:13 unraid kernel: md: disk0 read error, sector=13820128 Nov 4 13:31:13 unraid kernel: md: disk1 read error, sector=13820136 Nov 4 13:31:13 unraid kernel: md: disk2 read error, sector=13820136 Nov 4 13:31:13 unraid kernel: md: disk3 read error, sector=13820136 Nov 4 13:31:13 unraid kernel: md: disk4 read error, sector=13820136 Nov 4 13:31:13 unraid kernel: md: disk5 read error, sector=13820136 Nov 4 13:31:13 unraid kernel: md: disk6 read error, sector=13820136 So most likely a power, controller, cable, etc problem. Quote Link to comment
dsmith44 Posted November 5, 2018 Author Share Posted November 5, 2018 Thanks, I was able to rebuild onto the same disk. I've also bought a couple of Seagate Ironwolf 3TB to replace my 4 500GB WD RE3 as they are now over 8 years old anyway. After seeing errors again today however these are staying in their wrappers for now. Unfortunately I have not captured all the syslog of this, due to my own stupidity, but it appears my LSI card is resetting in some way and re-detecting all the disks. They then appear as new /dev/sd[a-z] devices. Array seems to find them but then I start seeing read errors. Quote Nov 5 18:20:49 unraid kernel: mpt2sas_cm0: removing handle(0x000d), sas_addr(0x4433221105000000) Nov 5 18:20:49 unraid kernel: mpt2sas_cm0: enclosure logical id(0x500605b005fd5690), slot(6) Nov 5 18:20:49 unraid kernel: sd 7:0:5:0: [sdg] Synchronizing SCSI cache Nov 5 18:20:49 unraid kernel: sd 7:0:5:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00 Nov 5 18:20:49 unraid kernel: mpt2sas_cm0: removing handle(0x000e), sas_addr(0x4433221106000000) Nov 5 18:20:49 unraid kernel: mpt2sas_cm0: enclosure logical id(0x500605b005fd5690), slot(5) Nov 5 18:20:49 unraid kernel: sd 7:0:6:0: [sdh] Synchronizing SCSI cache Nov 5 18:20:49 unraid kernel: sd 7:0:6:0: [sdh] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00 Nov 5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP ignored - already refreshing disk info. Nov 5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP ignored - already refreshing disk info. Nov 5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP ignored - already refreshing disk info. Nov 5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP ignored - already refreshing disk info. Nov 5 18:20:49 unraid kernel: mpt2sas_cm0: removing handle(0x000f), sas_addr(0x4433221107000000) Nov 5 18:20:49 unraid kernel: mpt2sas_cm0: enclosure logical id(0x500605b005fd5690), slot(4) Nov 5 18:20:49 unraid kernel: mpt2sas_cm0: sending message unit reset !! Nov 5 18:20:49 unraid kernel: mpt2sas_cm0: message unit reset: SUCCESS Nov 5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP ignored - already refreshing disk info. Nov 5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP received, forcing refresh of disks info. Nov 5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP ignored - already refreshing disk info. Nov 5 18:20:49 unraid rc.diskinfo[10248]: SIGHUP ignored - already refreshing disk info. Nov 5 18:20:49 unraid kernel: vfio-pci 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none Nov 5 18:20:50 unraid kernel: mpt2sas_cm1: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (32983920 kB) Nov 5 18:20:50 unraid kernel: mpt2sas_cm1: CurrentHostPageSize is 0: Setting default host page size to 4k Nov 5 18:20:50 unraid kernel: mpt2sas_cm1: MSI-X vectors supported: 1, no of cores: 8, max_msix_vectors: -1 Nov 5 18:20:50 unraid kernel: mpt2sas1-msix0: PCI-MSI-X enabled: IRQ 32 Nov 5 18:20:50 unraid kernel: mpt2sas_cm1: iomem(0x00000000f7700000), mapped(0x000000004f07acf1), size(16384) Nov 5 18:20:50 unraid kernel: mpt2sas_cm1: ioport(0x000000000000d000), size(256) Nov 5 18:20:50 unraid kernel: mpt2sas_cm1: CurrentHostPageSize is 0: Setting default host page size to 4k Nov 5 18:20:50 unraid kernel: mpt2sas_cm1: Allocated physical memory: size(1687 kB) Nov 5 18:20:50 unraid kernel: mpt2sas_cm1: Current Controller Queue Depth(3364),Max Controller Queue Depth(3432) Nov 5 18:20:50 unraid kernel: mpt2sas_cm1: Scatter Gather Elements per IO(128) Nov 5 18:20:50 unraid kernel: mpt2sas_cm1: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(07.39.02.00) Nov 5 18:20:50 unraid kernel: mpt2sas_cm1: Protocol=( Nov 5 18:20:50 unraid kernel: Initiator Nov 5 18:20:50 unraid kernel: ,Target Nov 5 18:20:50 unraid kernel: ), Nov 5 18:20:50 unraid kernel: Capabilities=( Nov 5 18:20:50 unraid kernel: TLR Nov 5 18:20:50 unraid kernel: ,EEDP Nov 5 18:20:50 unraid kernel: ,Snapshot Buffer Nov 5 18:20:50 unraid kernel: ,Diag Trace Buffer Nov 5 18:20:50 unraid kernel: ,Task Set Full Nov 5 18:20:50 unraid kernel: ,NCQ Nov 5 18:20:50 unraid kernel: ) Nov 5 18:20:50 unraid kernel: scsi host8: Fusion MPT SAS Host Nov 5 18:20:50 unraid kernel: mpt2sas_cm1: sending port enable !! Nov 5 18:20:50 unraid kernel: mpt2sas_cm1: host_add: handle(0x0001), sas_addr(0x500605b005fd5690), phys(8) Nov 5 18:20:50 unraid kernel: mpt2sas_cm1: port enable: SUCCESS Nov 5 18:20:50 unraid kernel: scsi 8:0:0:0: Direct-Access ATA WDC WD5002ABYS-0 3B03 PQ: 0 ANSI: 6 Nov 5 18:20:50 unraid kernel: scsi 8:0:0:0: SATA: handle(0x000c), sas_addr(0x4433221103000000), phy(3), device_name(0x50014ee2adeb8d57) Nov 5 18:20:50 unraid kernel: scsi 8:0:0:0: enclosure logical id (0x500605b005fd5690), slot(0) Nov 5 18:20:50 unraid kernel: scsi 8:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Nov 5 18:20:50 unraid kernel: sd 8:0:0:0: Attached scsi generic sg1 type 0 Nov 5 18:20:50 unraid kernel: scsi 8:0:1:0: Direct-Access ATA WDC WD5003ABYX-0 1S01 PQ: 0 ANSI: 6 Nov 5 18:20:50 unraid kernel: scsi 8:0:1:0: SATA: handle(0x0009), sas_addr(0x4433221100000000), phy(0), device_name(0x50014ee05848212f) Nov 5 18:20:50 unraid kernel: scsi 8:0:1:0: enclosure logical id (0x500605b005fd5690), slot(3) Nov 5 18:20:50 unraid kernel: scsi 8:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Nov 5 18:20:50 unraid kernel: sd 8:0:0:0: [sdk] 976773168 512-byte logical blocks: (500 GB/466 GiB) Nov 5 18:20:50 unraid kernel: sd 8:0:1:0: [sdl] Spinning up disk... Nov 5 18:20:50 unraid kernel: sd 8:0:1:0: Attached scsi generic sg2 type 0 Nov 5 18:20:50 unraid kernel: scsi 8:0:2:0: Direct-Access ATA WDC WD5002ABYS-0 3B03 PQ: 0 ANSI: 6 Nov 5 18:20:50 unraid kernel: scsi 8:0:2:0: SATA: handle(0x000a), sas_addr(0x4433221101000000), phy(1), device_name(0x50014ee2adeb2498) Nov 5 18:20:50 unraid kernel: scsi 8:0:2:0: enclosure logical id (0x500605b005fd5690), slot(2) Nov 5 18:20:50 unraid kernel: scsi 8:0:2:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Nov 5 18:20:50 unraid kernel: sd 8:0:2:0: Attached scsi generic sg3 type 0 Nov 5 18:20:50 unraid kernel: sd 8:0:2:0: [sdm] Spinning up disk... Nov 5 18:20:50 unraid kernel: scsi 8:0:3:0: Direct-Access ATA WDC WD5002ABYS-0 3B03 PQ: 0 ANSI: 6 Nov 5 18:20:50 unraid kernel: scsi 8:0:3:0: SATA: handle(0x000b), sas_addr(0x4433221102000000), phy(2), device_name(0x50014ee20340308a) Nov 5 18:20:50 unraid kernel: scsi 8:0:3:0: enclosure logical id (0x500605b005fd5690), slot(1) Nov 5 18:20:50 unraid kernel: scsi 8:0:3:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Nov 5 18:20:50 unraid kernel: sd 8:0:0:0: [sdk] Write Protect is off Nov 5 18:20:50 unraid kernel: sd 8:0:0:0: [sdk] Mode Sense: 7f 00 10 08 Nov 5 18:20:50 unraid kernel: sd 8:0:3:0: [sdn] Spinning up disk... Nov 5 18:20:50 unraid kernel: sd 8:0:3:0: Attached scsi generic sg4 type 0 Nov 5 18:20:50 unraid kernel: sd 8:0:0:0: [sdk] Write cache: enabled, read cache: enabled, supports DPO and FUA Nov 5 18:20:50 unraid kernel: scsi 8:0:4:0: Direct-Access ATA WDC WD30EFRX-68E 0A82 PQ: 0 ANSI: 6 Nov 5 18:20:50 unraid kernel: scsi 8:0:4:0: SATA: handle(0x000d), sas_addr(0x4433221105000000), phy(5), device_name(0x50014ee2b87d1542) Nov 5 18:20:50 unraid kernel: sd 8:0:1:0: [sdl] Spinning up disk... Nov 5 18:20:50 unraid kernel: sd 8:0:1:0: Attached scsi generic sg2 type 0 Nov 5 18:20:50 unraid kernel: scsi 8:0:2:0: Direct-Access ATA WDC WD5002ABYS-0 3B03 PQ: 0 ANSI: 6 Nov 5 18:20:50 unraid kernel: scsi 8:0:2:0: SATA: handle(0x000a), sas_addr(0x4433221101000000), phy(1), device_name(0x50014ee2adeb2498) Nov 5 18:20:50 unraid kernel: scsi 8:0:2:0: enclosure logical id (0x500605b005fd5690), slot(2) Nov 5 18:20:50 unraid kernel: scsi 8:0:2:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Nov 5 18:20:50 unraid kernel: sd 8:0:2:0: Attached scsi generic sg3 type 0 Nov 5 18:20:50 unraid kernel: sd 8:0:2:0: [sdm] Spinning up disk... Nov 5 18:20:50 unraid kernel: scsi 8:0:3:0: Direct-Access ATA WDC WD5002ABYS-0 3B03 PQ: 0 ANSI: 6 Nov 5 18:20:50 unraid kernel: scsi 8:0:3:0: SATA: handle(0x000b), sas_addr(0x4433221102000000), phy(2), device_name(0x50014ee20340308a) Nov 5 18:20:50 unraid kernel: scsi 8:0:3:0: enclosure logical id (0x500605b005fd5690), slot(1) Nov 5 18:20:50 unraid kernel: scsi 8:0:3:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Nov 5 18:20:50 unraid kernel: scsi 8:0:4:0: Direct-Access ATA WDC WD30EFRX-68E 0A82 PQ: 0 ANSI: 6 Nov 5 18:20:50 unraid kernel: scsi 8:0:4:0: SATA: handle(0x000d), sas_addr(0x4433221105000000), phy(5), device_name(0x50014ee2b87d1542) Nov 5 18:20:50 unraid kernel: scsi 8:0:4:0: enclosure logical id (0x500605b005fd5690), slot(6) Nov 5 18:20:50 unraid kernel: scsi 8:0:4:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Nov 5 18:20:50 unraid kernel: sd 8:0:4:0: Attached scsi generic sg5 type 0 Nov 5 18:20:50 unraid kernel: scsi 8:0:5:0: Direct-Access ATA WDC WD30EFRX-68A 0A80 PQ: 0 ANSI: 6 Nov 5 18:20:50 unraid kernel: scsi 8:0:5:0: SATA: handle(0x000e), sas_addr(0x4433221106000000), phy(6), device_name(0x50014ee60325df2b) Nov 5 18:20:50 unraid kernel: scsi 8:0:5:0: enclosure logical id (0x500605b005fd5690), slot(5) If this happens again I will capture this fully, but wanted to shut the system down quickly as the drives were unresponsive and didn't want to be in a rebuild again. I am wondering is this is related to recent major changes to the mpt3sas driver, so may go back to 6.5 if it happens again to see if that fixes things. Does anyone have any other ideas on this? Quote Link to comment
John_M Posted November 9, 2018 Share Posted November 9, 2018 Disks disappearing and then reappearing as different devices suggests a possible power problem. Quote Link to comment
dsmith44 Posted November 9, 2018 Author Share Posted November 9, 2018 I finally worked this out after it happened again. I was trying to pass through a USB controller to a VM and didn't notice that the HBA was listed as the only device in the pass through list rather than the USB card. As I was only expecting to see one device, the USB controller, I had just been selecting the single item in the list without reading it, which then obviously causes these issues when the VM is started. I feel somewhat stupid, but I think there is a bug somewhere as the 'Other PCI devices' list is now empty, it no longer shows the HBA, and presumably never should. So I'm left wondering how it ever got into the list in the first place. Quote Link to comment
John_M Posted November 9, 2018 Share Posted November 9, 2018 (edited) Your syslinux config has this: BOOT_IMAGE=/bzimage pcie_acs_override=downstream pci-stub.ids=1b73:1009 initrd=/bzroot kvm_intel.nested=1 Is 1b73:1009 the correct ID for the device you want to pass through? EDIT: Hmmm. I see it's a USB 3 controller. Edited November 9, 2018 by John_M It's a USB controller Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.