spalmisano

Members
  • Posts

    42
  • Joined

  • Last visited

Everything posted by spalmisano

  1. Thank you very much for the review and suggestions. I had rebooted UR1 during the day and left UR2 alone. Once I realized UR2 was due for a reboot as a resolution step for this, doing so did resolve the NFS connectivity issue. No other changes were needed. Once the parity checks are done on both devices, I'll upgrade each to 6.12.4. Marked as solved.
  2. I have two Unraid 6.12.3 machines (UR1 and UR2) that are serving as NFS servers and clients to each other. Today UR1 started giving NFS protocol errors and is unable to mount shares or otherwise 'see' UR2 via NFS. The full error is unassigned.devices: NFS mount failed: 'mount.nfs: requested NFS version or transport protocol is not supported '. The only thing I’ve changed on both machines is to upgrade parity on UR1 and start a parity check, and UR2 had an array disk upgraded and also has a parity check running. UR1 cannot mount anything NFS on UR2, throwing the protocol error. UR2 has no problem mounting NFS on UR1. Both machines have the Unassigned Devices plugin installed and at the latest version. I’ve tried stopping and starting the NFS service on both machines, without impact. I’ve changed the NFS settings on the individual shares to No and then back to Yes, also without impact. A set of diagnostic files is enclosed from UR2, and I’m not sure where else to look to determine what settings are different on UR2 vs. UR1. What am I missing? Any help is appreciated. utility-diagnostics-20230905-1947.zip
  3. Thats a good thought, but no VPN on this system. I do use WireGuard for remote access but its configured on a separate UnRAID server and the server with the network issue doesn’t connect to the WireGuard one. I double checked that WireGuard isn't enabled on this server, and there aren’t any other VPN services/apps either. I am using Pi-hole as a Docker container on this server, and also have a secondary Pi-hole on the WireGuard server. Both UnRAID servers have hardcoded public DNS as their DNS servers and to my knowledge (and the logs) dont hit the Pi-hole containers.
  4. Update/Additional detail: This doesn’t seem to impact incoming traffic to the same machine, just the outgoing traffic initiated from a high-volume request. Plex users’ activity is unaffected, but any time I make a request from any of the Arr’s the outbound network traffic ceases for several minutes. This is a Dell r720 that’s been in (my) service for years without issue. Would a bad network cable cause this kind of inconsistent behavior? Bad driver? Anything suspicious in the diagnostics? Not sure where else to diagnose; anyone with a thought?
  5. I had a USB failure recently and due to lack of viable backup (since fixed) needed to start with a fresh configuration. I had the drives labeled and didn’t lose any of the array data, and the parity check finished successfully. What Im experiencing now is a temporary complete loss of outgoing network connectivity when initiating any sort of mass request. Example: while browsing Radarr’s Discover section I’ll add several selections for grabbing. Radarr processes them as expected but after one or two I start getting DNS errors in the logs for ‘resource temporarily unavailable’ trying to hit api.Radarr.video (or anything else). Same goes for Sonarr, NZBGet, NZBHydra, and even the console on UnRAID itself. Doing an nslookup/ping/et cetera on anything will fail. What this feels like is DNS saying ‘too much too fast, wait a few before trying again…’. Sonarr/Radarr will keep trying their requests, and NZBHydra will keep failing and eventually block all of the indexers. After a few minutes the whole thing clears itself up and traffic resumes, but all of the requests made during the blackout are lost. Ive tried both Cloudflare and Google’s DNS, and am still currently on Google’s 8.8.8.8 and 8.8.4.4 and DNS 1 and 2 for UnRAID. What else can I provide that will help point in the right direction for what’s going on? Diagnostics from when this happened just now are attached. Any suggestions are welcomed. media-diagnostics-20220720-2021.zip
  6. I re-assigned all of the data drives and all but one was mountable. Stopped the array, added the parity drives, and restarted to kick off the parity rebuild. Im working on getting everything back from the appdata backup now. Thanks again for the help.
  7. My USB stick failed recently and I don’t have a valid backup of it or the contents or config (I know, I know). There are 28 disks in the array and I have them all physically numbered so I know which is which. After booting from a new USB Im obviously back at a fresh UnRAID install. What are my options for re-adding the disks and starting the array without losing anything on them? Im ok with re-doing the UnRAID settings/containers/et cetera, but definitely don’t want to lose anything from the array. Thanks.
  8. I should have been more clear; that’s what I was referring to. Thanks for the clarification. I’ll replace the drive with two additional, and it looks like I need to replace the USB as well. Appreciate the help.
  9. Does a cache pool provide any redundancy? If one in the pool fails will it still operate as a cache of one, or is it simply additional storage?
  10. I was able to reformat the cache drive and, while the tar from the appdata backup was also corrupted (it was on the array and Im guessing just bad luck that it too wasn’t viable), I re-added the containers and just reconfigured everything on the cache drive. The system had halted sometime overnight and the part of the trace I saw read ‘sdb1 corruption’ (the cache is sdb) and to run an xfs repair on it. While typing this Im also seeing the ‘your flash drive is corrupted or offline’ and the contents of /boot is null. So now I need to replace the USB drive, and also the cache SSD? Is it worth doing a cache pool if it’ll help make SSD failure/corruption more easily weatherable? Diagnostics attached. Anything in here to suggest corruption on both devices so close together is more than coincidence? media-diagnostics-20220713-0936.zip
  11. This morning the cache drive started throwing 'read-only file system' errors. It wasn't full, and research showed a possible corrupted docker.img as the culprit. I backed up that file and stopped the docker service in hopes of recreating it. Since the cache drive was read only it wouldn't delete, and although the docker service did show as re-starting in the UI, clicking the docker tab gives the 'Docker service failed to re-start.' error. Ive attached the latest diagnostics, can still SSH in and read/copy the cache drive, and I do have a CA backup of appdata. Is this just a matter of reformatting (switching to xfs?), restoring the appdata backup, and starting again? Anything else? Is it better to replace the drive instead? Thanks for the help. media-diagnostics-20220711-1303.zip
  12. Ran into this issue on my Dell r720, but not the r710. Disabling virtualization allows the system to boot, and everything appears normal, but VMs refuse to start. I’m assuming we need to wait until there’s an UnRAID fix and its not as simple as a configuration change? Regardless, thanks for all the help this forum affords.
  13. Did the boot arguments not work for you? I’m able to boot on both an R710 and R720 for the 6.10.x versions after adding them.
  14. Mod/Dev team: Sounds like there are several of us with similar gear and experiences with the RC. If you want us to supply anything diagnostic or try something that will help diagnose, let us know.
  15. I tried each of the ports on my r710 and had the same checksum error. Only when I moved back to 6.9.2 did it start working.
  16. I did, and also tried creating an RC install on the new stick, and then copying over the delta files from the old stick. Each time resulted in the same checksum error. Im guessing there was something incorrect or incomplete with how I was hacking at things.
  17. I was able to find a link to the zip and extract over the existing files, with the same result. The same checksum failure also occurred when creating a fresh install of the RC using the Windows USB creator and a new USB stick. I created another fresh install with the stable branch and it seems to be working now. I was able to transfer my license key to the new stick, and most of the Docker/VM configs were still present on boot. It rebooted successfully without intervention a few times with the new stick and stable build, so Im keeping it there instead of moving to the RC for now. If details about the hardware or other diagnostic information would be helpful for diagnosing a potential RC issue, let me know and I’ll collect whatever is needed. Marking as solved.
  18. I set up a second unRAID server recently and today shut it down to add an SSD as a cache drive. On restart I get to the unRAID boot screen, get through /bzroot...ok and then end up (eventually) at an error message reading that bzimage fails its checksum and to press any key to restart. There are several pages of messages leading up to that, but the scroll is too fast to parse anything. I installed originally as 6.9.2 and then updated to the latest release candidate. Ive tried to boot from the USB stick in multiple ports on that server with the same issue, and its completely readable from another computer. Ive copied its contents off just in case. There doesn't seem to be a log or other diagnostic information on the boot USB. If the bzroot executable was corrupted somehow can I simply unpack one from the RC build and copy that over? Any ideas are welcomed, and let me know what diagnostic info I can provide. This server isn't doing anything critical yet, but I do want it back in action. Thanks.
  19. This was obviously the issue. Simply choosing another port in ident.cfg for SSL allowed the web UI to start. Editing that file and rebooting had things back to normal. A simple review of syslog on my part as a first step likely would have pushed me into the right direction. Won’t forget that next time. Thanks again for all of the help.
  20. That makes sense given I just enabled SSL for the web UI. Thanks for the detective work. A few of my Plex users have started up for the night, but I’ll change this and then reboot when I can. Once I’ve confirmed this is fixed I’ll post notes and mark as solved.
  21. Yes see previous post. I removed those two from the array and they’re now in unassigned devices. I’ll remove them from the shelf as well.
  22. With the extra syntax commented out isn’t it now the same as stock?
  23. Go is now: #!/bin/bash # Start the Management Utility /usr/local/sbin/emhttp # cp /boot/config/.bash_profile /root/ # install sg3 utility to control SA120 fan speed # installpkg /boot/packages/sg3_utils-1.42-x86_64-1.txz With no change in behavior after a reboot. Aside from addressing the certificate error recently, the only other maintenance/changes have been ‘Fixed’ the DNS rebinding issue with adding *****.unraid.net entries in PiHole and Unbound. Those entries only have 10.10.0.2 and not the port number (81) since you can’t specify a port in PiHole for custom DNS. Removing those entries doesn’t fix this UI access issue. Added a disk shelf with six more drives: one 10TB and five 4TBs. Two of the 4TBs showed up in the array but were never assigned to the array. I assumed Unraid saw a fault with them and wouldn’t allow them to be added. I’ve since removed both from the array and they show up in unassigned devices. I know troubleshooting remotely is a pain, but I appreciate your willingness. If there’s anything detail-wise I can provide, let me know.
  24. Successfully rebooted into GUI mode. Everything comes up (services, containers, VMs, et cetera) as before. I can log into the server directly but still cannot access the web UI. Firefox reads it cannot establish a connection. Normal browsing, both internal and external works as expected, including other sites on the 10.10.0.* subnet. What other information can I give or try on the server?
  25. Ok thanks. I’ll do a reboot in a bit and report back. Appreciate the help.