aeleos

Members
  • Posts

    29
  • Joined

  • Last visited

Everything posted by aeleos

  1. After downgrading its been over 3 days with no crashes so I think this was the issue. Can't believe an auto update could trigger a bug like this, thanks so much for your help!
  2. Interesting you may just be right, one of my torrent containers was running libtorrent >2. I'm downgrading and I will mark this if it resolves the issue
  3. I have been battling random lockups on my usually stable server. The only change I made recently was to add some docker containers that have a relatively high CPU and memory usage. I was able to capture some logs using a remote syslog server, as as soon as the lockup happened I wouldn't get any logs written. I noticed some logs with 'shfs invoked oom-killer: ' errors but I was seeing them both before and during the lockup. The logs that I have from the time period of the crash have this, with a series of oom errors after eventually having having the full lockup around 45 minutes later. I can post the full remote syslog if someone thinks it would help but I would need to anonymize it. I have run out ideas other than maybe replacing the RAM with 64GB instead of 32, or getting a new CPU/MB as I can only suspect its a hardware issue. Jan 24 09:46:51 Tower kernel: BUG: kernel NULL pointer dereference, address: 0000000000000116 Jan 24 09:46:51 Tower kernel: #PF: supervisor read access in kernel mode Jan 24 09:46:51 Tower kernel: #PF: error_code(0x0000) - not-present page Jan 24 09:46:51 Tower kernel: PGD 164e69067 P4D 164e69067 PUD 59bea3067 PMD 0 Jan 24 09:46:51 Tower kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI Jan 24 09:46:51 Tower kernel: CPU: 10 PID: 27565 Comm: traefik Tainted: G O 5.19.17-Unraid #2 Jan 24 09:46:51 Tower kernel: Hardware name: Gigabyte Technology Co., Ltd. X470 AORUS GAMING 5 WIFI/X470 AORUS GAMING 5 WIFI-CF, BIOS F63a 02/17/2022 Jan 24 09:46:51 Tower kernel: RIP: 0010:folio_try_get_rcu+0x0/0x21 Jan 24 09:46:51 Tower kernel: Code: e8 9d fd 67 00 48 8b 84 24 80 00 00 00 65 48 2b 04 25 28 00 00 00 74 05 e8 c1 35 69 00 48 81 c4 88 00 00 00 5b e9 ef 59 a6 00 <8b> 57 34 85 d2 74 10 8d 4a 01 89 d0 f0 0f b1 4f 34 74 04 89 c2 eb Jan 24 09:46:51 Tower kernel: RSP: 0000:ffffc90001dd7cc0 EFLAGS: 00010246 Jan 24 09:46:51 Tower kernel: RAX: 00000000000000e2 RBX: 00000000000000e2 RCX: 00000000000000e2 Jan 24 09:46:51 Tower kernel: RDX: 0000000000000001 RSI: ffff88830490afe8 RDI: 00000000000000e2 Jan 24 09:46:51 Tower kernel: RBP: 0000000000000000 R08: 000000000000003c R09: ffffc90001dd7cd0 Jan 24 09:46:51 Tower kernel: R10: ffffc90001dd7cd0 R11: ffffc90001dd7d48 R12: 0000000000000000 Jan 24 09:46:51 Tower kernel: R13: ffff888186926f38 R14: 0000000000004dfe R15: ffff888186926f40 Jan 24 09:46:51 Tower kernel: FS: 000000c000570090(0000) GS:ffff88881ea80000(0000) knlGS:0000000000000000 Jan 24 09:46:51 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 24 09:46:51 Tower kernel: CR2: 0000000000000116 CR3: 00000002fb6ce000 CR4: 00000000003506e0 Jan 24 09:46:51 Tower kernel: Call Trace: Jan 24 09:46:51 Tower kernel: <TASK> Jan 24 09:46:51 Tower kernel: __filemap_get_folio+0x98/0x1ff Jan 24 09:46:51 Tower kernel: filemap_fault+0x6e/0x524 Jan 24 09:46:51 Tower kernel: __do_fault+0x30/0x6e Jan 24 09:46:51 Tower kernel: __handle_mm_fault+0x9a5/0xc7d Jan 24 09:46:51 Tower kernel: ? __fget_light+0x3d/0x4c Jan 24 09:46:51 Tower kernel: handle_mm_fault+0x113/0x1d7 Jan 24 09:46:51 Tower kernel: do_user_addr_fault+0x36a/0x514 Jan 24 09:46:51 Tower kernel: exc_page_fault+0xfc/0x11e Jan 24 09:46:51 Tower kernel: asm_exc_page_fault+0x22/0x30 Jan 24 09:46:51 Tower kernel: RIP: 0033:0x45f173 Jan 24 09:46:51 Tower kernel: Code: 94 24 08 01 00 00 48 39 c6 0f 8e d8 0b 00 00 4c 89 9c 24 00 01 00 00 4d 89 e0 4c 8b a4 24 08 03 00 00 4c 8b 9c 24 10 03 00 00 <41> 83 7c 24 14 00 0f 84 b1 0b 00 00 4c 89 9c 24 f0 02 00 00 4c 89 Jan 24 09:46:51 Tower kernel: RSP: 002b:000000c00058d8e0 EFLAGS: 00010206 Jan 24 09:46:51 Tower kernel: RAX: 0000000000000005 RBX: 0000000000000000 RCX: 000000c000bfa4e0 Jan 24 09:46:51 Tower kernel: RDX: 0000000000c08500 RSI: 000000007fffffff RDI: 0000000000000000 Jan 24 09:46:51 Tower kernel: RBP: 000000c00058dc40 R08: 0000000000000000 R09: 000000000043ce36 Jan 24 09:46:51 Tower kernel: R10: 000000c000c08498 R11: 0000000005d7be80 R12: 00000000051fe940 Jan 24 09:46:51 Tower kernel: R13: 0000000000000000 R14: 000000c000561380 R15: 0000000000000000 Jan 24 09:46:51 Tower kernel: </TASK> Jan 24 09:46:51 Tower kernel: Modules linked in: vhost_net tun vhost tap kvm_amd ccp kvm xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb veth xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs dm_crypt dm_mod dax md_mod it87 hwmon_vid efivarfs iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc bonding tls mpt3sas igb btusb btrtl btbcm raid_class gigabyte_wmi wmi_bmof mxm_wmi edac_mce_amd edac_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl btintel k10temp bluetooth nvme i2c_algo_bit i2c_piix4 apex(O) scsi_transport_sas gasket(O) i2c_core ahci nvme_core ecdh_generic ecc libahci thermal Jan 24 09:46:51 Tower kernel: tpm_crb tpm_tis tpm_tis_core tpm wmi button unix [last unloaded: tun] Jan 24 09:46:51 Tower kernel: CR2: 0000000000000116 Jan 24 09:46:51 Tower kernel: ---[ end trace 0000000000000000 ]--- Jan 24 09:46:51 Tower kernel: RIP: 0010:folio_try_get_rcu+0x0/0x21 Jan 24 09:46:51 Tower kernel: Code: e8 9d fd 67 00 48 8b 84 24 80 00 00 00 65 48 2b 04 25 28 00 00 00 74 05 e8 c1 35 69 00 48 81 c4 88 00 00 00 5b e9 ef 59 a6 00 <8b> 57 34 85 d2 74 10 8d 4a 01 89 d0 f0 0f b1 4f 34 74 04 89 c2 eb Jan 24 09:46:51 Tower kernel: RSP: 0000:ffffc90001dd7cc0 EFLAGS: 00010246 Jan 24 09:46:51 Tower kernel: RAX: 00000000000000e2 RBX: 00000000000000e2 RCX: 00000000000000e2 Jan 24 09:46:51 Tower kernel: RDX: 0000000000000001 RSI: ffff88830490afe8 RDI: 00000000000000e2 Jan 24 09:46:51 Tower kernel: RBP: 0000000000000000 R08: 000000000000003c R09: ffffc90001dd7cd0 Jan 24 09:46:51 Tower kernel: R10: ffffc90001dd7cd0 R11: ffffc90001dd7d48 R12: 0000000000000000 Jan 24 09:46:51 Tower kernel: R13: ffff888186926f38 R14: 0000000000004dfe R15: ffff888186926f40 Jan 24 09:46:51 Tower kernel: FS: 000000c000570090(0000) GS:ffff88881ea80000(0000) knlGS:0000000000000000 Jan 24 09:46:51 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 24 09:46:51 Tower kernel: CR2: 0000000000000116 CR3: 00000002fb6ce000 CR4: 00000000003506e0 tower-diagnostics-20230124-1604.zip
  4. If you are willing to look at the official docs you can modify the docker to only run with a token in the extra parameters, and the configuration can be done on the CF website. This should solve any permission issues. On the part about 6.10, my understanding is any permission issues aren't anything to do with the container or unraid but with incorrect permissions that unraid wasn't respecting before. Its possible you may need to force the container to use the user id you want, which can be done with --user 99:100 (for nobody:users) in extra parameters.
  5. I was able to fix this by adding --user 99:100 to extra parameters. You can also fix it by setting the grafana appdata folders to 472:root, which is the user/group the grafana container tries to use (and creates these permission issues)
  6. So I purchased a LSI controller and everything is working great so far, however now I am getting this error. fstrim: /mnt/cache: FITRIM ioctl failed: Remote I/O error Based on some other posts it looks like this is related to the LSI card not supporting fstrim. Should I move my cache drive back onto the onboard sata ports since I moved it to the controller as part of this? or is that likely to give me more issues with the sata controller? I could also try to experiment with changing the firmware version but that isn't ideal.
  7. Unfortunately I have already tried upgrading the BIOS to the latest version. Is there anything else I can do besides buying a PCI card?
  8. I am running into an issue where after a period of uptime my server fails with a bunch of read errors. Here is the logs from one instance. May 10 12:35:18 Tower kernel: ahci 0000:01:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0x95e7e000 flags=0x0000] May 10 12:35:19 Tower kernel: ata4.00: exception Emask 0x10 SAct 0x400045f SErr 0x0 action 0x6 frozen May 10 12:35:19 Tower kernel: ata4.00: irq_stat 0x08000000, interface fatal error May 10 12:35:19 Tower kernel: ata4.00: failed command: WRITE FPDMA QUEUED May 10 12:35:19 Tower kernel: ata4.00: cmd 61/08:00:b8:ce:27/00:00:2d:00:00/40 tag 0 ncq dma 4096 out May 10 12:35:19 Tower kernel: res 40/00:00:61:d6:27/00:00:2d:00:00/40 Emask 0x10 (ATA bus error) May 10 12:35:19 Tower kernel: ata4.00: status: { DRDY } May 10 12:35:19 Tower kernel: ata4.00: failed command: WRITE FPDMA QUEUED May 10 12:35:19 Tower kernel: ata4.00: cmd 61/08:08:48:d1:27/00:00:2d:00:00/40 tag 1 ncq dma 4096 out May 10 12:35:19 Tower kernel: res 40/00:00:61:d6:27/00:00:2d:00:00/40 Emask 0x10 (ATA bus error) May 10 12:35:19 Tower kernel: ata4.00: status: { DRDY } May 10 12:35:19 Tower kernel: ata4.00: failed command: WRITE FPDMA QUEUED May 10 12:35:19 Tower kernel: ata4.00: cmd 61/08:10:f8:d2:27/00:00:2d:00:00/40 tag 2 ncq dma 4096 out May 10 12:35:19 Tower kernel: res 40/00:00:61:d6:27/00:00:2d:00:00/40 Emask 0x10 (ATA bus error) May 10 12:35:19 Tower kernel: ata4.00: status: { DRDY } May 10 12:35:19 Tower kernel: ata4.00: failed command: WRITE FPDMA QUEUED May 10 12:35:19 Tower kernel: ata4.00: cmd 61/08:18:f0:d3:27/00:00:2d:00:00/40 tag 3 ncq dma 4096 out May 10 12:35:19 Tower kernel: res 40/00:00:61:d6:27/00:00:2d:00:00/40 Emask 0x10 (ATA bus error) May 10 12:35:19 Tower kernel: ata4.00: status: { DRDY } May 10 12:35:19 Tower kernel: ata4.00: failed command: WRITE FPDMA QUEUED May 10 12:35:19 Tower kernel: ata4.00: cmd 61/08:20:c0:d4:27/00:00:2d:00:00/40 tag 4 ncq dma 4096 out May 10 12:35:19 Tower kernel: res 40/00:00:61:d6:27/00:00:2d:00:00/40 Emask 0x10 (ATA bus error) May 10 12:35:19 Tower kernel: ata4.00: status: { DRDY } May 10 12:35:19 Tower kernel: ata4.00: failed command: WRITE FPDMA QUEUED May 10 12:35:19 Tower kernel: ata4.00: cmd 61/08:30:a0:d5:27/00:00:2d:00:00/40 tag 6 ncq dma 4096 out May 10 12:35:19 Tower kernel: res 40/00:00:61:d6:27/00:00:2d:00:00/40 Emask 0x10 (ATA bus error) May 10 12:35:19 Tower kernel: ata4.00: status: { DRDY } May 10 12:35:19 Tower kernel: ata4.00: failed command: WRITE FPDMA QUEUED May 10 12:35:19 Tower kernel: ata4.00: cmd 61/08:50:f8:d5:27/00:00:2d:00:00/40 tag 10 ncq dma 4096 out May 10 12:35:19 Tower kernel: res 40/00:00:61:d6:27/00:00:2d:00:00/40 Emask 0x10 (ATA bus error) May 10 12:35:19 Tower kernel: ata4.00: status: { DRDY } May 10 12:35:19 Tower kernel: ata4.00: failed command: WRITE FPDMA QUEUED May 10 12:35:19 Tower kernel: ata4.00: cmd 61/08:d0:38:ce:27/00:00:2d:00:00/40 tag 26 ncq dma 4096 out May 10 12:35:19 Tower kernel: res 40/00:00:61:d6:27/00:00:2d:00:00/40 Emask 0x10 (ATA bus error) May 10 12:35:19 Tower kernel: ata4.00: status: { DRDY } May 10 12:35:19 Tower kernel: ata4: hard resetting link May 10 12:35:29 Tower kernel: ata4: softreset failed (1st FIS failed) May 10 12:35:29 Tower kernel: ata4: hard resetting link May 10 12:35:39 Tower kernel: ata4: softreset failed (1st FIS failed) May 10 12:35:39 Tower kernel: ata4: hard resetting link May 10 12:35:49 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x600000 SErr 0x0 action 0x6 frozen May 10 12:35:49 Tower kernel: ata3.00: failed command: READ FPDMA QUEUED May 10 12:35:49 Tower kernel: ata3.00: cmd 60/80:a8:18:eb:2f/00:00:53:00:00/40 tag 21 ncq dma 65536 in May 10 12:35:49 Tower kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Hardware Info: Model: Custom M/B: Gigabyte Technology Co., Ltd. X470 AORUS GAMING 5 WIFI-CF Version Default string - s/n: Default string BIOS: American Megatrends International, LLC. Version F63a. Dated: 02/17/2022 CPU: AMD Ryzen 7 2700X Eight-Core @ 3700 MHz HVM: Enabled IOMMU: Enabled Cache: 768 KiB, 4 MB, 16 MB Memory: 32 GiB DDR4 (max. installable capacity 128 GiB) Network: bond0: fault-tolerance (active-backup), mtu 1500 eth0: 1000 Mbps, full duplex, mtu 1500 Kernel: Linux 5.10.28-Unraid x86_64 OpenSSL: 1.1.1j Uptime: 0 days, 03:25:40 // after restart Ram is running at 2133 MHz, 4 8GB sticks
  9. You can use the GUI easily by taking the run token the tunnel gives you and modifying the command in the docker template. Something like Post Arguments: tunnel run --token <Your Tunnel Token from GUI>
  10. 1) I handle local access using PiHole on a raspberry pi, with Local DNS entires for each of my subdomains to point them back to the internal unraid IP address. I have a dd-wrt router that points DHCP clients to the pihole for DNS requests. I have a reverse proxy (Previously SWAG but now Traefik) on port 443 on unraid, so that the redirected requests look the same from inside and outside the network (same subdomain, https, etc). 2) I'm not exactly sure what you mean here, but you should be able to set up pihole and nextcloud as I have them working with this. In general you can either specify a different origin ip address for a specific subdomain with the cloudflare configuration itself, or you can have everything go to a reverse proxy and have the proxy point to a different IP address. 3) I'm also not exactly sure what you mean here but if you are talking about running the CF tunnels connection through a proxy it should be possible but it would likely increase the latency a fair bit.
  11. My understanding based on the TOS is that there is no XXGB traffic limit listed, and that as long as you comply with the restrictions around what traffic you serve you are good to go. In reality, you can likely get away with some amount of video streaming and traffic isn't closely monitored. However if you run a video streaming service on your free tier of tunnels you will likely hit some sort of internal limit (50+ GB per month) and get your account terminated, or moved to higher tier plan with a cost per gb. You may be conflicting smart route traffic and regular tunnel traffic? For the smart routing, there is a free tier limit and you will get charged for additional traffic. However this is something you have to enable manually. Feel free to correct me if I am wrong but this is my understanding.
  12. Nice, I'm glad you were able to get it working!
  13. You will need to register for an account and add a credit card to sign up for the free tier (I'm not 100% sure on this but this is what I had to do.) but there is no cost for bandwidth cost. The terms of service only allow for regular website traffic (not video streaming like plex), so you aren't supposed to use a lot of bandwidth. If you do, it will likely trigger something in their system and you make get taken off the free plan.
  14. You likely have your UUID for the tunnel slightly miswritten or misconfigured, maybe a leading or trailing space.
  15. Are you actually having any issues, or are you just seeing those errors appear in the logs? If the issue is that your cloudflared container is stopping, you will want to add "--restart unless-stopped" to your extra parameters in the advanced view. Additionally you might want to try an older version of cloudflared like 2021.8.2 or a newer one like 2022.3.1, although the container way update itself anyway.
  16. @Profezor can you provide any more info on what proxy manager you are passing the cloudflare traffic to and how its configured? That is likely the source of the issue, the error message indicates that cloudflared doesn't like the certificate your proxy manager is providing. Can you also post a redacted version of your cloudflared config?
  17. @LeoRX I'm glad I was able to help out with the instructions. There is something very elegant about the tunnel setup so I was happy to be able to get the information out to more people. Traefik felt the exact same way to me. Ibracorps video does a great job to break it down and see how to use it. Its a little bit of a jump from SWAG and NPM but sometimes better tools have a bigger learning curve.
  18. @portonalga Hmm that is strange, I would expect that certificate to work. Other options include trying to use a cloudflare certificate rather than letsencrypt but that takes a fair bit of manual work. Also, its possible to locate the logs of where the actual 502 error is coming from. In NPM you should be able to find a folder for each service where the logs are kept, and any 502 errors should show there. That might help to tell you where the actual error is generating from and why. As much fun as it is to get everything working the right way, I wouldn't get to hung up on it. Sometimes in the end its better to have it working, although doing things like this are a big part of the learning process. I would recommend IBRACORPs video on SWAG if you plan on using it. However if you are dead set on doing it the right way Traefik (Ibracorp also has a great video) is much more of a purpose built tool for this. SWAG and NPM are very much applications built around other applications to create a manageable reverse proxy setup. It also makes debugging much more manageable as it actually shows you the traffic path, where errors are happening, etc. Debugging with NPM is almost impossible, SWAG is somewhat manageable but not ideal.
  19. Also @kakmoster and @portonalga just a note about the subdomain.mydomain vs root domain question. You "should" be able to get it to work with the root domain, and that is the way its intended to set up. The subdomain trick is likely something that is needed only because your SSL certificates aren't what CF is expecting to get for all of the different domain traffic it is receiving. The way to get it working is that in NPM, you can create one certificate with multiple domains, where you want one for your root domain and one a subdomain wildcard. These should be part of one certificate, and this should allow you to use your root domain as the origin server domain. From my memory, if you use traefik or SWAG you won't run into this issue because it creates that certificate automatically.
  20. Hey @portonalga no problem. Don't feel like you wasted our time, it takes a lot of effort to get to the point where you have set something up enough that you can ask in depth troubleshooting questions, so its always worth it helping someone out regardless if the problem is actually what they think it is. But yes, looking back at your post that is likely the issue you were seeing. Likely you could have found the information you are looking for, but NPM hides the individual NGINX logs for each service fairly deep in the filesystem, behind folders that don't tell you what service it is for. That's one of the reasons I personally switched away from it, it makes it very hard to debug why one service is not working when others are. SWAG and traefik (which really wasn't as bad to set up as I expected) do this much better. I'm glad you were able to get it working and hopefully your family are now able to experience the SLA uptime that they need.
  21. Could you try switching your originServerName from nc.my-domain.com to just my-domain.com ? It definitely possible that this is an issue with NPM and not CF and related to the certificates that are being returned. The issue may be that on non the non nc subdomain the certificate that is being returned is upsetting CF somehow. In general I recommend for CF and NPM to make one SSL certificate with *.your-domain.com and your-domain.com. You would then attach this to all of your subdomains and root domain under CF. It may be that on the non NC subdomain CF is expecting NC (because of the originServerName) where it gets a different one and rejects it. However, I'm not really sure why noTLSVerify wouldn't have fixed it unless it still verifies the origin regardless. You may also want to try doing a test setup with another proxy manager (SWAG), which will automatically generate the certs. If you are able to get swag working with CF then its likely an NPM certificate issue.
  22. Overview: Support for Cloudflare Tunnels using the cloudflared docker image Application: Cloudflared- https://github.com/cloudflare/cloudflared Docker Hub: https://hub.docker.com/r/cloudflare/cloudflared/ GitHub: https://github.com/aeleos/cloudflared Documentation: https://github.com/aeleos/cloudflared
  23. I have been a user for a couple years and something about the amount of control unraid gives me while also being user friendly has made me love it since the moment I started.
  24. That fixed it, can't believe I never knew that you can just manually force drive formats on a disk by disk basis. Thanks for the help.
  25. I have been having a weird issue where when I add a new drive to my encrypted array it doesn't get formatted to an encrypted fs. See below for a screenshot of my drives, for whatever reason Disk 3 doesn't get encrypted. When adding the new drive I remove any partitions, add it to the array, and it gets formatted but not to encrypted xfs. After first having this issue I noticed that in my Disk Settings my default disk format had reverted to regular xfs somehow, but changing it to xfs - encrypted didn't fix the issue. Also, a slightly unrelated question, is it possible to have my cache drive be anything other than encrypted btrfs? I would like try it with xfs or just remove the encryption from just the cache drive. Is this at all possible or is the cache drive locked into btrfs? Thanks for any help