Jump to content

Msh100

Members
  • Posts

    7
  • Joined

  • Last visited

Posts posted by Msh100

  1. Sorry to bring up such an old thread but I never got this solved and recently I have had some time on my hands!

     

    Basically I am getting the same symptoms still. In the original post I mentioned about VMs, well this isn't just the case and happens anyway. I have replaced the SAS card and that has not helped.

     

    I have also upgraded to the latest unraid as of today. To trigger this issue I am simply changing one disk from "unassigned" to an actual drive. When I do that, all the disks go unassigned and the similar syslog messages appear.

     

    I have attached the diagnostics which you asked for before. Any help would be greatly appriciated! Failing drives are of course a possibility, though I just wouldn't expect this outcome.

    homeserver-diagnostics-20221231-1721.zip

  2. I have disabled IOMMU so it's unrelated to that, however in the logs, I have also noticed

     

    Jul 31 12:44:10 HomeServer kernel: mpt2sas_cm0: SAS host is non-operational !!!!
    Jul 31 12:44:11 HomeServer kernel: mpt2sas_cm0: SAS host is non-operational !!!!
    Jul 31 12:44:12 HomeServer kernel: mpt2sas_cm0: SAS host is non-operational !!!!
    Jul 31 12:44:13 HomeServer kernel: mpt2sas_cm0: SAS host is non-operational !!!!
    Jul 31 12:44:14 HomeServer kernel: mpt2sas_cm0: SAS host is non-operational !!!!
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: SAS host is non-operational !!!!
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: _base_fault_reset_work: Running mpt3sas_dead_ioc thread success !!!!
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221103000000)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: removing handle(0x000a), sas_addr(0x4433221103000000)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: enclosure logical id(0x590b11c01210fd00), slot(0)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221101000000)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: removing handle(0x0009), sas_addr(0x4433221101000000)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: enclosure logical id(0x590b11c01210fd00), slot(2)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221104000000)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: removing handle(0x000b), sas_addr(0x4433221104000000)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: enclosure logical id(0x590b11c01210fd00), slot(7)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221106000000)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: removing handle(0x000c), sas_addr(0x4433221106000000)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: enclosure logical id(0x590b11c01210fd00), slot(5)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221105000000)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: removing handle(0x000d), sas_addr(0x4433221105000000)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: enclosure logical id(0x590b11c01210fd00), slot(6)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221107000000)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: removing handle(0x000e), sas_addr(0x4433221107000000)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: enclosure logical id(0x590b11c01210fd00), slot(4)
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: unexpected doorbell active!
    Jul 31 12:44:15 HomeServer kernel: mpt2sas_cm0: sending diag reset !!
    Jul 31 12:44:16 HomeServer kernel: mpt2sas_cm0: Invalid host diagnostic register value
    Jul 31 12:44:16 HomeServer kernel: mpt2sas_cm0: System Register set:
    Jul 31 12:44:16 HomeServer kernel: mpt2sas_cm0: diag reset: FAILED

     

    I guess the best lead right now is there's something up with the SAS controller?

  3. [   97.777213] tun: Universal TUN/TAP device driver, 1.6
    [   97.824344] mdcmd (36): check
    [   97.824353] md: recovery thread: recon D1 D3 ...
    [   98.034791] mpt3sas 0000:06:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM
    [  201.059073] br0: port 2(vnet0) entered blocking state
    [  201.059077] br0: port 2(vnet0) entered disabled state
    [  201.059107] device vnet0 entered promiscuous mode
    [  201.059189] br0: port 2(vnet0) entered blocking state
    [  201.059190] br0: port 2(vnet0) entered forwarding state
    [  221.602074] mpt2sas_cm0: SAS host is non-operational !!!!
    [  222.627071] mpt2sas_cm0: SAS host is non-operational !!!!
    [  223.650069] mpt2sas_cm0: SAS host is non-operational !!!!
    [  224.674075] mpt2sas_cm0: SAS host is non-operational !!!!
    [  225.699076] mpt2sas_cm0: SAS host is non-operational !!!!
    [  226.722072] mpt2sas_cm0: SAS host is non-operational !!!!
    [  226.722176] mpt2sas_cm0: _base_fault_reset_work: Running mpt3sas_dead_ioc thread success !!!!
    [  226.727073] blk_update_request: I/O error, dev sdd, sector 41710208 op 0x0:(READ) flags 0x0 phys_seg 72 prio class 0
    [  226.727080] md: disk0 read error, sector=41710144
    [  226.727082] md: disk0 read error, sector=41710152
    --- many of the same message ---
    [  226.727147] md: disk0 read error, sector=41710704
    [  226.727148] md: disk0 read error, sector=41710712
    [  226.730101] sd 7:0:1:0: [sde] tag#2947 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=DRIVER_OK cmd_age=5s
    [  226.730105] sd 7:0:1:0: [sde] tag#2947 CDB: opcode=0x88 88 00 00 00 00 00 02 7c 72 80 00 00 02 40 00 00
    [  226.730106] blk_update_request: I/O error, dev sde, sector 41710208 op 0x0:(READ) flags 0x0 phys_seg 72 prio class 0
    [  226.730110] md: disk29 read error, sector=41710144
    [  226.730111] md: disk29 read error, sector=41710152
    --- many of the same message ---
    [  226.730166] md: disk29 read error, sector=41710704
    [  226.730167] md: disk29 read error, sector=41710712
    [  226.730182] sd 7:0:3:0: [sdg] tag#2948 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=DRIVER_OK cmd_age=5s
    [  226.730184] sd 7:0:3:0: [sdg] tag#2948 CDB: opcode=0x88 88 00 00 00 00 00 02 7c 72 80 00 00 02 40 00 00
    [  226.730185] blk_update_request: I/O error, dev sdg, sector 41710208 op 0x0:(READ) flags 0x0 phys_seg 72 prio class 0
    [  226.730187] md: disk4 read error, sector=41710144
    [  226.730188] md: disk4 read error, sector=41710152

    So it's clear at 201, that I am starting the VM.

     

    It's a lot of error messages, so I guess nothing too interesting, however there's one line that jumped out:

    [  227.026975] pci 0000:06:00.0: Removing from iommu group 14
    

     

    IOMMU Group 14 is indeed the disk controller. So why is this happening? How can I see how this is tied to the VM in any way because I can't connect the dots.

  4. A couple of days ago I started to get errors on one of my drives. I thought nothing of it and resumed as normal, until I decided to simply try and rebuild that drive. During the rebuild process, a second drive encountered the same issue. 

     

    Something is probably off here, I decided to rebuild again and it ran for many many hours without error, until I started a VM and pretty much instantly began to get errors on all the drives.

     

    Consistenty, I can replicate this problem. I boot unraid, the rebuild starts, runs for however long, until I start a VM at which point errors from all directions.

     

    I have attempted to Google this and a couple of suggestions pointed to passthrough, but as far as I am aware, I am not passing through anything other than the unraid mount itself (which I have also tried to remove with no luck). I have also changed the PCIe ACS override from multifunctional to disabled.

     

    As far as I am aware, I am not doing anything out of the ordinary and this setup has been running months prior to the first errors, without incident. I won't dismiss that something may be failing, but the fact it is perfectly replicatable simply by starting a VM, makes me think there's something else at play. Any advice would be greatly appriciated! 

  5. I am trying to setup an Ubuntu 20.04 VM with limited success.

     

    I'm not entirely sure where to start debugging this one. The VM setup is nothing special, here's the configuration:

    8ZJNiZn.png

    HNA9mOy.png

     

    The only "funky" thing I have done is set the primary drive to be a partition on an unmanaged volume. I have changed this to Auto but the results are the same.

     

    When I boot, the Ubuntu installer simply does the following:

     

    nG7d4ZN.png

     

    Googling this only showed cases of this happening on Virtualbox.

     

    Any tips would be appreciated. If any more debugging information is needed, please let me know what to send!

     

    Thanks

×
×
  • Create New...