unRAID loses disk (sporadic) in VM running ESXi

October 29, 201213 yr

So I decided to post my issue in as part of a new thread as to get more eyes on this.

Oct 28 15:21:20 Clara-Belle kernel: sas: ata1: end_device-0:0: dev error handler
Oct 28 15:21:20 Clara-Belle kernel: sas: ata2: end_device-0:1: dev error handler

Oct 28 15:21:20 Clara-Belle kernel: sas: ata3: end_device-0:2: dev error handler

Oct 28 15:21:20 Clara-Belle kernel: sas: ata4: end_device-0:3: dev error handler

Oct 28 15:21:20 Clara-Belle kernel: ata4.00: ATA-8: ST31000528AS, CC38, max UDMA/133

Oct 28 15:21:20 Clara-Belle kernel: ata4.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)

Oct 28 15:21:20 Clara-Belle kernel: ata4.00: qc timeout (cmd 0xef)

Oct 28 15:21:20 Clara-Belle kernel: ata4.00: failed to set xfermode (err_mask=0x4)

Oct 28 15:21:20 Clara-Belle kernel: drivers/scsi/mvsas/mv_sas.c 1522:mvs_I_T_nexus_reset for device[3]:rc= 0

Oct 28 15:21:20 Clara-Belle kernel: ata4.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)

Oct 28 15:21:20 Clara-Belle kernel: ata4.00: revalidation failed (errno=-5)

Oct 28 15:21:20 Clara-Belle kernel: ata4.00: qc timeout (cmd 0xec)

Oct 28 15:21:20 Clara-Belle kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Oct 28 15:21:20 Clara-Belle kernel: ata4.00: revalidation failed (errno=-5)

Oct 28 15:21:20 Clara-Belle kernel: ata4.00: disabled

Oct 28 15:21:20 Clara-Belle kernel: ata4: hard resetting link

Oct 28 15:21:20 Clara-Belle kernel: mvsas 0000:0b:00.0: Phy3 : No sig fis

Oct 28 15:21:20 Clara-Belle kernel: drivers/scsi/mvsas/mv_sas.c 1522:mvs_I_T_nexus_reset for device[3]:rc= 0

Oct 28 15:21:20 Clara-Belle kernel: ata4: EH complete

Oct 28 15:21:20 Clara-Belle kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0

On the next boot it was fine. This is not the first time that this happens. Is it related to irqs? Why I say that, was on shutdown I noticed the following:

Disabling IRQ #16

http://i.imgur.com/1rhiU.png

When it boots up fine (detects all disks), I do not see this.

This is how my vm config looks like: http://i.imgur.com/VzX0N.png

1) Possible cause:

Passing through both a M1015 and a MV8. This disk looks like to be on the MV8, or is it? I cant tell at this point... If it is on the MV8 then these are the cables that are connected, got these off ebay:

http://www.ebay.com/itm/110931840838?ssPageName=STRK:MEWNX:IT&_trksid=p3984.m1439.l2649

Will purchase these to test: http://www.newegg.com/Product/Product.aspx?Item=N82E16816133033

2) Possible cause:

Another random user claims irqpoll seems to help resolve similar issues, not sure if this is relavent for recent kernels (v5rc8a)

append initrd=bzroot irqpoll

http://lime-technology.com/forum/index.php?topic=918.msg6193#msg6193

Not sure what this does..

EDIT: did not solved it, happened again with the parameter in syslinux.cfg

Will try: "noirqdebug" as per this thread http://lime-technology.com/forum/index.php?topic=19593.msg175182#msg175182 and report back

EDIT: "noirqdebug" did not solved it, happened again with the parameter in syslinux.cfg

Here is /proc/interrupts

CPU0 CPU1 CPU2 CPU3

0: 23 0 0 0 IO-APIC-edge timer

1: 9 0 0 0 IO-APIC-edge i8042

6: 0 3 0 0 IO-APIC-edge floppy

7: 0 0 0 0 IO-APIC-edge parport0

9: 0 0 0 0 IO-APIC-fasteoi acpi

12: 4 0 0 0 IO-APIC-edge i8042

14: 43 0 0 0 IO-APIC-edge ide0

15: 0 0 0 0 IO-APIC-edge ide1

16: 8822 0 0 0 IO-APIC-fasteoi ehci_hcd:usb1

17: 6382 7547 7397 7241 IO-APIC-fasteoi ioc0

18: 434044 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2

19: 1726732 0 0 0 IO-APIC-fasteoi mvsas

40: 0 0 0 0 PCI-MSI-edge PCIe PME

41: 0 0 0 0 PCI-MSI-edge PCIe PME

42: 0 0 0 0 PCI-MSI-edge PCIe PME

43: 0 0 0 0 PCI-MSI-edge PCIe PME

44: 0 0 0 0 PCI-MSI-edge PCIe PME

45: 0 0 0 0 PCI-MSI-edge PCIe PME

46: 0 0 0 0 PCI-MSI-edge PCIe PME

47: 0 0 0 0 PCI-MSI-edge PCIe PME

48: 0 0 0 0 PCI-MSI-edge PCIe PME

49: 0 0 0 0 PCI-MSI-edge PCIe PME

50: 0 0 0 0 PCI-MSI-edge PCIe PME

51: 0 0 0 0 PCI-MSI-edge PCIe PME

52: 0 0 0 0 PCI-MSI-edge PCIe PME

53: 0 0 0 0 PCI-MSI-edge PCIe PME

54: 0 0 0 0 PCI-MSI-edge PCIe PME

55: 0 0 0 0 PCI-MSI-edge PCIe PME

56: 0 0 0 0 PCI-MSI-edge PCIe PME

57: 0 0 0 0 PCI-MSI-edge PCIe PME

58: 0 0 0 0 PCI-MSI-edge PCIe PME

59: 0 0 0 0 PCI-MSI-edge PCIe PME

60: 0 0 0 0 PCI-MSI-edge PCIe PME

61: 0 0 0 0 PCI-MSI-edge PCIe PME

62: 0 0 0 0 PCI-MSI-edge PCIe PME

63: 0 0 0 0 PCI-MSI-edge PCIe PME

64: 0 0 0 0 PCI-MSI-edge PCIe PME

65: 0 0 0 0 PCI-MSI-edge PCIe PME

66: 0 0 0 0 PCI-MSI-edge PCIe PME

67: 0 0 0 0 PCI-MSI-edge PCIe PME

68: 0 0 0 0 PCI-MSI-edge PCIe PME

69: 0 0 0 0 PCI-MSI-edge PCIe PME

70: 0 0 0 0 PCI-MSI-edge PCIe PME

71: 0 0 0 0 PCI-MSI-edge PCIe PME

72: 27511 6575494 40006 18756 PCI-MSI-edge eth0-rxtx-0

73: 14498 24692 6116212 16706 PCI-MSI-edge eth0-rxtx-1

74: 15964 23463 20818 4689849 PCI-MSI-edge eth0-rxtx-2

75: 5796124 23256 15030 20082 PCI-MSI-edge eth0-rxtx-3

76: 0 0 0 0 PCI-MSI-edge eth0-event-4

77: 1462937 191977 217932 213623 PCI-MSI-edge mpt2sas0-msix0

78: 0 0 0 0 PCI-MSI-edge vmci

79: 0 0 0 0 PCI-MSI-edge vmci

NMI: 0 0 0 0 Non-maskable interrupts

LOC: 22483901 22483964 22483853 22483873 Local timer interrupts

SPU: 0 0 0 0 Spurious interrupts

PMI: 0 0 0 0 Performance monitoring interrupts

IWI: 0 0 0 0 IRQ work interrupts

RTR: 0 0 0 0 APIC ICR read retries

RES: 10282630 12919166 10952486 11598716 Rescheduling interrupts

CAL: 69033 50148 571038 597827 Function call interrupts

TLB: 532814 415962 428039 359889 TLB shootdowns

TRM: 0 0 0 0 Thermal event interrupts

THR: 0 0 0 0 Threshold APIC interrupts

MCE: 0 0 0 0 Machine check exceptions

MCP: 750 750 750 750 Machine check polls

ERR: 0

MIS: 0

3) Possible cause:

PSU related?

Case: SC933T-R760B has a Triple Redundant PSU, but only one is plugged in at this point

http://www.servethehome.com/supermicro-sc933t-r760b-3u-15x-35-sassata-storage-chassis-review/

Not very likely.

4) Possible cause:

Drive related?

Output of "smartctl -a -d ata /dev/sdn" attached, this doesn't look good or does it? Reallocated sector ct is 0, the drive should be ok. In any, I pulled it from the array and I'm currently running Spinrite on it. Will report back...

1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 144973736

195 Hardware_ECC_Recovered 0x001a 035 020 000 Old_age Always - 144973736

5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0

EDIT: I now think this is more drive related. I shutdown the unRAID vm and the LED on the drive bay has yet to go off...remains solid. very weird

5) Possible cause:

Kernel bug or ESXi incompatibly related?

Any thoughts would be really appreciated.

esxi 5.1

unraid v5rc8a

mobo: supermicro x8sil

smart_output_sdn.txt

syslog.txt.zip

Quote

February 25, 201313 yr

Did you ever figure this out? Sounds like the issues you are having are similar to me.

Quote

February 25, 201313 yr

Author

Hi jesseasi,

I just noticed your reply.

Basically, I did just about everything to try to solve this issue. Ultimately, I solved the issue by swapping the MV8 for another M1015. I had really wished that I had not, as I had plans for the M1015 but alas, I couldn't live with the MV8 in passthrough mode. It was just unreliable. I know it works well for some.

As you can see, I tried swapping out cables and I knew it that the drive in question was good, and tried various kernel parameters. The MV8 worked flawlessly in bare metal. To me, it seemed it was a low-level/driver problem that was beyond me. As a consequence, I have a spare MV8...

Quote

March 21, 201313 yr

I have this same issue with random drives randomely not showing up to assign to unraid. I also use MV8's and it happens to drives that are on each of them.

Unless someone comes up with a better idea than to buy the LSI card, I'll probably just restart the unraid VM 2x to get them to show up.

Luckily I don't restart unraid that often.

Quote

unRAID loses disk (sporadic) in VM running ESXi

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)