Jump to content

bmrowe

Members
  • Posts

    25
  • Joined

  • Last visited

Posts posted by bmrowe

  1. An update - I changed several things:

    -I updated my DHCP server to only share out IPs in the .200-250 range thinking that perhaps there was an IP address conflict despite having IPs reserved for most clients including unraid.

    -I have had issues with dockers talking to eachother and had added a script to check if the unraid shim was in place, and if not, add it. I disabled this script.

    -I disabled netdata, cadvisor, and the prometheus exporter for adguard. I did this primarily because there was so much network chatter it was hard to troubleshoot w/ wireshark

     

    None of these changes solved the problem. However, I noticed CPU use was pretty high and the top two processes were firefox-bin being launched from the /tmp/ directory. I had no clue what this process was, so I decided to restart.

    Since restarting, this issue has not reoccured once. I'm not sure which change, if any, fixed this problem - but I wanted to share the latest.

  2. 2 hours ago, dlandon said:

    Start by setting up a DNS server for your Unraid server.  This is normally your router.  You don't have a DNS server defined.

    Thanks for chiming in. I might be missing a setting, but I have this configured in network:

    image.thumb.jpeg.cec4e44f87cb675e00a54b47c37936fb.jpeg

     

    I'll do a basic rundown of my network too:

    192.168.1.1 is my router
    192.168.1.3 is an adguard home docker running as DNS
    192.168.1.24 is a pi hole docker running as secondary DNS
    192.168.1.13 is an unbound docker running as upstream DNS for both adguard and the pi hole.

    Here is the config in the router:

    image.thumb.jpeg.cbc366c91830f30eed0ce2a6cfa01967.jpeg

     

    Finally, I will also add that when unraid is unreachable on one pc, it may still be reachable on another. 

  3. I have a super weird issue where my unraid server will frequently become inaccessible from any computer on the network. The web gui will not be accessible and pings will fail. All dockers are still accessible, only unraid itself it impacted. This will only be resolved by either removing the entry in the arp table for the ip address of the unraid server, or turning wifi off and back on for a client. I have also verified that the router cannot ping unraid during these outages.

     

    I've made a few changes lately - adding nginx as an internal proxy to avoid memorizing ip/port combos, and sonarr/radarr/prowlarr/overseerr. I don't see how either would be impacting this.

     

    Thoughts?

    Attaching diagnostics.

    unraid-diagnostics-20231107-1714.zip

  4. I am having a weird issue with, what has been, a working Adguard Home install on unraid. The symptom I see is that on computers with adguard home as the DNS provider, they will occasionally stop being able to get to any app hosted on docker within unraid, including unraid itself, until I turn their wifi off and back on. While a device has no access, I can ping adguard home, but get no ping response when I ping unraid. I also see some weird connectivity issues within unraid:

     

    -I cannot access the console of adguard home docker from the web ui. I get this error:

    Nov  8 11:45:29 unraid nginx: 2023/11/08 11:45:29 [error] 7285#7285: *696028 connect() to unix:/var/tmp/AdGuard-Home.sock failed (111: Connection refused) while connecting to upstream, client: 192.168.1.184, server: , request: "GET /logterminal/AdGuard-Home/ws HTTP/1.1", upstream: "http://unix:/var/tmp/AdGuard-Home.sock:/ws", host: "192.168.1.198"

    Adguard-exporter will work for hours and then randomly:

    2023/11/08 10:05:10 An error has occurred during login to AdguardGet "http://192.168.1.3:80/control/status": dial tcp 192.168.1.3:80: connect: no route to host

     

    I have made a couple changes to the system recently. Primarily adding nginx as an internal reverse proxy (basically to give friendlier names to things that you had to memorize the port on). I also added radarr/sonarr/prowlarr/overseerr. Those all are working fine.

    Edit: Some updates on testing - if I remove the entry for the unraid server ip from the arp table on the client, things go back to normal. I'm way beyond my depth with networking at this point - what could be causing this to be required?

    Attaching diags as well. Appreciate the help in advance.

    unraid-diagnostics-20231108-0952.zip

  5. Reporting back. Nothing changed here. I was fixing up some VMs and thought to retry connecting. The media share showed up. Super weird. The only thing I can think of is that there were file not found errors being thrown because I lost a couple .iso files that were stored on the array. I replaced those files as part of fixing up a couple VMs. I don't see how that could have fixed it, but sharing for potential future searches.

  6. I replaced a hard drive and now only two of my three shared SMB folders (all three are user shares) are showing up. 'media' is the missing share. I see appdata and isos. I have done the following so far:

     

    • Double checked global share settings:
      image.thumb.jpeg.05c08d028ee6fd39d152d69fbe059d7e.jpeg

     
     

    • Double checked user share setting to make sure export is on for the media share:
      image.thumb.jpeg.c494f2facd8506df6bca5288359816d9.jpeg

     

    • Despite this, I only see appdata and isos but not media:
      image.jpeg.49f97d93ed91085410128dbfb567aed5.jpeg

     

    I thought maybe permissions related, but I'm a bit out of my depth there. Things seem (?) fine:
    image.jpeg.17ad0f6a2a49ac9f34597091e0ea8424.jpeg

  7. Following up on this. Replaced motherboard - same issue. However, I tried the new power supply and it worked! I'm super confused how a power supply could fail at the exact time I was replacing other parts. Has to somehow be connected, right? I thought maybe cable related and a cable got pinched while doing the install, but I used the old cables with the new power supply.

  8. I updated a working system to an ASUS TUF b760M D4 and an intel 13500 (from an 11500 and b560m motherboard). Things look fine in bios, and I can leave the machine in bios forever.

    However once I boot to unraid, the system will restart after exactly 15 seconds and do this over and over. I think I've eliminated most potential hardware issues - I've simplified the system down to just the motherboard and CPU. I've tried GUI mode and safe mode, and all restart in 15s. The memtest sends it back to the bios splash screen. 

    I tried a random distro of linux on a different USB. Same issue occurred once I booted into it. I also tried a fresh download of unraid, same problem.

    Any ideas on things to try or what might be happening? 

  9. I don't think so. This is the manual: https://images-na.ssl-images-amazon.com/images/I/A1kRDk8X1iL.pdf It mentions: "USB port of your PC must support power-off function so that the device would go to sleeping mode. Setting up motherboard’s (power management ) in S3 is strongly recommended. For more details, please refer to user guide of motherboard BIOS setting."

     

    I'm not sure if that is referencing the functionality it has called 'sync' where it sleeps when the pc sleeps. I have that turned off.

  10. I ordered a new 10TB WD drive (a shucked WD Elements external) and replaced my existing 8TB parity drive. I also moved that existing parity drive to become another data drive. However, building parity on the new hard drive has failed twice and not had a successful run yet. The first time, it made it to around 20% and the second time close to 95%. Both times, I've gotten warnings about read errors on the other drives and then the parity drive becomes disabled and I can't bring it back. The array is connected via USB-C and is a Mediasonic hf2-su31c.

     

    The log for the drive right before failure shows:

    Jan 27 15:42:13 Tower kernel: usb 2-4.1: Failed to set U1 timeout to 0x0,error code -19
    Jan 27 15:42:13 Tower kernel: usb 2-4.1: Set SEL for device-initiated U1 failed.
    Jan 27 15:42:13 Tower kernel: usb 2-4.1: Set SEL for device-initiated U2 failed.
    Jan 27 15:42:13 Tower kernel: usb 2-4.1: usb_reset_and_verify_device Failed to disable LPM
    Jan 27 15:42:13 Tower kernel: sd 1:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s
    Jan 27 15:42:13 Tower kernel: sd 1:0:0:0: [sdb] tag#0 CDB: opcode=0x8a 8a 00 00 00 00 04 50 f6 f3 48 00 00 08 00 00 00
    Jan 27 15:42:13 Tower kernel: blk_update_request: I/O error, dev sdb, sector 18538230600 op 0x1:(WRITE) flags 0x4000 phys_seg 256 prio class 0
    Jan 27 15:42:13 Tower kernel: md: disk0 write error, sector=18538230536

     

    After the first time this happened, I reseated the drive thinking that perhaps that was the issue. And while it did get further, it did not complete.

     

    Diagnostics attached. Ideas on what to try next?

    tower-diagnostics-20220127-1751.zip

  11. Hey all,

     

    I upgraded my nvme drive on a HP Prodesk 400 G6 mini pc to a 1TB XPG S70. Since doing so, the drive will be fine for a bit, but then start cluttering the logs with errors and then becoming unavailable. Upon stopping the array, the drive vanishes. A reboot "fixes" things until the errors start again. I've tried reseating the drive once, thinking maybe that was the problem. I've also done a BIOS update, hoping that would help. Thoughts?

    Jan 16 13:24:42 Tower kernel: nvme nvme0: I/O 115 QID 2 timeout, aborting
    Jan 16 13:24:50 Tower kernel: nvme nvme0: I/O 480 QID 1 timeout, aborting
    Jan 16 13:24:50 Tower kernel: nvme nvme0: I/O 481 QID 1 timeout, aborting
    Jan 16 13:24:50 Tower kernel: nvme nvme0: I/O 482 QID 1 timeout, aborting
    Jan 16 13:25:12 Tower kernel: nvme nvme0: I/O 115 QID 2 timeout, reset controller
    Jan 16 13:25:42 Tower kernel: nvme nvme0: I/O 2 QID 0 timeout, reset controller
    Jan 16 13:27:05 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
    Jan 16 13:27:05 Tower kernel: nvme nvme0: Abort status: 0x371
    ### [PREVIOUS LINE REPEATED 3 TIMES] ###
    Jan 16 13:27:55 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
    Jan 16 13:27:55 Tower kernel: nvme nvme0: Removing after probe failure status: -19
    Jan 16 13:28:46 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
    Jan 16 13:28:46 Tower kernel: blk_update_request: I/O error, dev nvme0n1, sector 1001944288 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio class 0
    Jan 16 13:28:46 Tower kernel: blk_update_request: I/O error, dev nvme0n1, sector 1002150864 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
    Jan 16 13:28:46 Tower kernel: blk_update_request: I/O error, dev nvme0n1, sector 1001530688 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio class 0
    Jan 16 13:28:46 Tower kernel: blk_update_request: I/O error, dev nvme0n1, sector 1001196896 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio class 0
    Jan 16 13:28:46 Tower kernel: blk_update_request: I/O error, dev nvme0n1, sector 25776 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
    Jan 16 13:28:46 Tower kernel: blk_update_request: I/O error, dev nvme0n1, sector 1000205688 op 0x1:(WRITE) flags 0x1000 phys_seg 2 prio class 0
    Jan 16 13:28:46 Tower kernel: blk_update_request: I/O error, dev nvme0n1, sector 1000205666 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 0
    Jan 16 13:28:46 Tower kernel: blk_update_request: I/O error, dev nvme0n1, sector 500657648 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
    Jan 16 13:28:46 Tower kernel: blk_update_request: I/O error, dev nvme0n1, sector 500818448 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio class 0
    Jan 16 13:28:46 Tower kernel: blk_update_request: I/O error, dev nvme0n1, sector 25768 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
    Jan 16 13:28:46 Tower kernel: nvme0n1p1: writeback error on inode 1075631154, offset 1212416, sector 1002150872
    Jan 16 13:28:46 Tower kernel: XFS (nvme0n1p1): metadata I/O error in "xfs_buf_ioend+0x12d/0x284 [xfs]" at daddr 0x3bb86ce0 len 32 error 5
    Jan 16 13:28:46 Tower kernel: nvme0n1p1: writeback error on inode 537586352, offset 0, sector 500788960
    Jan 16 13:28:46 Tower kernel: nvme0n1p1: writeback error on inode 537403597, offset 0, sector 500657656
    Jan 16 13:28:46 Tower kernel: nvme0n1p1: writeback error on inode 537586346, offset 0, sector 500832176
    Jan 16 13:28:46 Tower kernel: nvme0n1p1: writeback error on inode 537586348, offset 0, sector 500788960
    Jan 16 13:28:46 Tower kernel: nvme0n1p1: writeback error on inode 1075631154, offset 1212416, sector 1002150864
    Jan 16 13:28:46 Tower kernel: nvme0n1p1: writeback error on inode 1611570215, offset 0, sector 1501284352
    Jan 16 13:28:46 Tower kernel: nvme0n1p1: writeback error on inode 1611570215, offset 16384, sector 1501284392
    Jan 16 13:28:46 Tower kernel: nvme0n1p1: writeback error on inode 537586350, offset 0, sector 500786856
    Jan 16 13:28:46 Tower kernel: nvme0n1p1: writeback error on inode 232, offset 761856, sector 25760
    Jan 16 13:28:46 Tower kernel: XFS (nvme0n1p1): log I/O error -5
    Jan 16 13:28:46 Tower kernel: XFS (nvme0n1p1): xfs_do_force_shutdown(0x2) called from line 1196 of file fs/xfs/xfs_log.c. Return address = 00000000b5b54af3
    Jan 16 13:28:46 Tower kernel: XFS (nvme0n1p1): Log I/O Error Detected. Shutting down filesystem
    Jan 16 13:28:46 Tower kernel: XFS (nvme0n1p1): Please unmount the filesystem and rectify the problem(s)
    Jan 16 13:28:46 Tower kernel: XFS (nvme0n1p1): log I/O error -5
    ### [PREVIOUS LINE REPEATED 1 TIMES] ###
    Jan 16 13:28:46 Tower kernel: nvme nvme0: failed to set APST feature (-19)

     

    tower-diagnostics-20220116-1600.zip

  12. 3 hours ago, itimpi said:

    It looks as if the serial number information being reported for that disk has changed :(  How is it connected?

    Its plugged into a mediasonic array. That hasn't been touched or changed. Should I just try clearing and restoring from parity? Its a brand new shucked drive that has been in the array for maybe 3 weeks at this point.

     

  13. Hey all,

     

    New to the unraid ecosystem and loving it. I was cleaning up a series of empty directories and after doing so, I couldn't start any of the docker images (403 error). At the same time, I also had my Plex storage moving to cache accidentally (had set Cache to prefer), so I set Cache to 'yes' and ran the mover. That seemed to work just fine, but wanted to mention it as well.

     

    I thought a reboot might solve it, but after reboot, one of the discs is showing up unassigned. When I try to assign it, it warns that the disc is 'wrong' and will erase all content if I assign it. Curious if that is my only path forward, or if the diagnostics tell you guys any easier steps. 

     

     

     

×
×
  • Create New...