Pillendreher

Members
  • Posts

    140
  • Joined

Report Comments posted by Pillendreher

  1. Well I have Adguard Home running as a docker container, but I don't have my Unraid server connected to it as an DNS server.

     

    Interestingly enough, the response from the server in my W10 VM running on the server is instantaneous; the same goes for my smartphone. So maybe it's my Laptop...I'll try to figure it out :)

  2. Anybody else suddenly getting PCI Express errors on a NVMe drive since upgrading to RC4?

     

    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Multiple Corrected error received: 0000:00:1b.4
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a3eb] error status/mask=00000041/00002000
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:    [ 6] BadTLP                
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a3eb] error status/mask=00000001/00002000
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:03:00.0
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:   device [2646:2263] error status/mask=00000001/0000e000
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a3eb] error status/mask=00000001/00002000
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:03:00.0
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:   device [2646:2263] error status/mask=00000001/0000e000
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a3eb] error status/mask=00000001/00002000
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:03:00.0
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:   device [2646:2263] error status/mask=00000001/0000e000
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a3eb] error status/mask=00000001/00002000
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:03:00.0
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:   device [2646:2263] error status/mask=00000001/0000e000
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a3eb] error status/mask=00000001/00002000
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:03:00.0
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:   device [2646:2263] error status/mask=00000001/0000e000
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a3eb] error status/mask=00000001/00002000
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:03:00.0
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:   device [2646:2263] error status/mask=00000001/0000e000
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a3eb] error status/mask=00000001/00002000
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:03:00.0
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:   device [2646:2263] error status/mask=00000001/0000e000
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a3eb] error status/mask=00000001/00002000
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:03:00.0
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:   device [2646:2263] error status/mask=00000001/0000e000
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a3eb] error status/mask=00000001/00002000
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:03:00.0
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:   device [2646:2263] error status/mask=00000001/0000e000
    Apr 18 18:34:03 Tower kernel: nvme 0000:03:00.0:    [ 0] RxErr                 
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
    Apr 18 18:34:03 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)

     

    Sometimes when this happens, the drive somehow "gets dropped" - it's still visible in the drive overview in the UI, yet I can't access it anymore and it's missing in /mnt.

  3. On 1/23/2021 at 1:53 PM, cholzer said:

     

    My HBA is 7 months old, and it started to fail shortly after I upgraded to RC2. So just because your hardware is new does not mean that it is okay. ;)
    The only way how you can make sure that RC2 is to blame is by downgrading.

     

    Went back to Beta35 - not a single unwarranted spin-up in two full days. Meanwhile not a day went by without multiple random spin-ups of my drives.

    • Thanks 1
  4. 18 hours ago, cholzer said:

    If the issue goes away when you downgrade to the latest stable, then yeah - it would appear that RC2 is the problem.

    But just because you don't use the same HBA as I do does not mean that it is software related. In my case the HBA began to die, that is what caused my issue. Your onboard SATA controller can malfunction just as my HBA did.

    So unless downgrading Unraid to the latest stable fixes your issue, you can not rule out a hardware fault. In my case the issue started after the upgrade, which is why I first also thought that RC2 was to blame, while it was the HBA.

     

    Well sure, faulty hardware is always a possibility, but since my hardware is barely more than a month old, the issues started the minute I upgrade to RC2 and multiple posters have mentioned this issue over the last couple of weeks, I'm still fairly certain that this is a software issue. Going from Beta35 to RC2, something changed about the way hard drives are handled.

  5. 4 hours ago, mdoom said:

    The interesting thing, is if i force a spin-down of all disks, its always shortly after I do that that they all wake up with that READ SMART message.

     

    Yep. Although I have to say it doesn't happen for all disks. Sometimes it's disk1, sometimes (like today) it's disk 4. And the worst thing about this is: The spindown routine does not work with this kind of spinup, ie the drive keeps going until I spin it down manually.

  6. 3 hours ago, alturismo said:

    not a bug but may a general question about the SMART actions lately

    i wonder a little why there are these smart actions without activity ... and how long it takes to spindown ....

     

    as sample

    ...

    Dec 30 05:54:24 AlsServer emhttpd: read SMART /dev/sdb

    Dec 30 05:54:42 AlsServer emhttpd: read SMART /dev/sdd

    Dec 30 05:54:54 AlsServer emhttpd: read SMART /dev/sdc

    Dec 30 08:23:59 AlsServer emhttpd: read SMART /dev/sdc

    Dec 30 23:00:49 AlsServer emhttpd: read SMART /dev/sdd <<- here i really had access to a file for 30 minutes

    --- spindown 7 hours later ... manually triggered by me today morning ....

    Dec 31 05:51:23 AlsServer emhttpd: spinning down /dev/sdd

    --- now again spinup and still running while there should be no activity ...

    Dec 31 07:37:20 AlsServer emhttpd: read SMART /dev/sdc

    Dec 31 07:37:35 AlsServer emhttpd: read SMART /dev/sdb

     

    when i spin them down its all good until the next SMART Trigger coming ... is this meanwhile by purpose ? i know we need access to the drive to get SMART values, but extra spinning up the disks only therefore ? cant be done while the disk is active anyway ?

     

    when the disk would be really active due activity, lets say plex is accessing the file due playback, then triggering the spindown wont help as its immediately up again as plex needs the access, thats why i wonder why a manual spindown is always fine when i see no activity and the auto spindown sometimes just doesnt come, even better would be to avoid these spinups with no activity.

     

    i triggered now the diags in case of interest, and this also spinups the discs also due SMART ;) but i see its by demand ...

    save system variables. + SMART reports

    alsserver-diagnostics-20201231-0807.zip 134.84 kB · 0 downloads

    Same here. I reckon something went wrong during a modification of how smart information is gathered (If I remember correctly I read something regarding SMART in the changelog)

  7. With the RC2, my drives keep waking up, apparently because Unraid is pulling smart data:

     

    Dec 29 00:03:47 Tower emhttpd: spinning down /dev/sdd
    
    Dec 29 00:07:22 Tower emhttpd: read SMART /dev/sdd

     

    Is there any way to keep this from happening? I didn't have this problem with Beta 35.

     

    EDIT:

     

    Oh, and another thing: Twice now my Docker image seemed to be broken (got a weird error message that looked like something about the webgui was broken: "Docker service wont start"), but instead of having to rebuild the image, a restart fixed it. That didn't happen either with Beta 35.

     

    EDIT2:
     

    Just to make sure I disabled telegraf - my disks still wake up due to smart readings according to the log.

     

  8. On 11/30/2020 at 2:46 AM, Zonediver said:

    Use this under Settings, Sleep Settings, "Custom commands after wake-up":

    
    sleep 120;
    hdparm -y $(ls /dev/sd*|grep "[a-z]$") >/dev/null 2>&1

    120 sek after wakup, this command sends all disk into sleep.

    FYI: After every reboot of the server, you must send all disk to sleep by hand one time.

    Thanks, just got around to enable it - works flawlessly.