TurkeyPerson

Members
  • Posts

    25
  • Joined

  • Last visited

Everything posted by TurkeyPerson

  1. So it's my syslog that's filling up that directory - this is what keeps repeating: Jan 11 04:40:58 Tower nginx: 2024/01/11 04:40:58 [error] 7029#7029: shpool alloc failed Jan 11 04:40:58 Tower nginx: 2024/01/11 04:40:58 [error] 7029#7029: nchan: Out of shared memory while allocating message of size 14047. Increase nchan_max_reserved_memory. Jan 11 04:40:58 Tower nginx: 2024/01/11 04:40:58 [error] 7029#7029: *4418775 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/devices?buffer_length=1 HTTP/1.> Jan 11 04:40:58 Tower nginx: 2024/01/11 04:40:58 [error] 7029#7029: MEMSTORE:00: can't create shared message for channel /devices Jan 11 04:40:59 Tower nginx: 2024/01/11 04:40:59 [crit] 7029#7029: ngx_slab_alloc() failed: no memory My usage habits have not changed so I'm not leaving more/less GUI windows than before. In fact, trying to make a greater effort to close them but this has consistently been an issue since it started. Is that really the only possibility? Fresh diagnostics taken without a reset attached. tower-diagnostics-20240112-0849.zip Edit: After doing some brief reading, I think this may be caused by NetData. Going to disable it and see if this comes back unless someone has another idea.
  2. I already rebooted the server so not sure that output is helpful but good to know if/when it happens again. Thanks. Yeah - I'll do this and see if I run into other issues. Thanks.
  3. Woke up to this error and common issues suggests posting diagnostic online and temporary fix of restarting: error: Compressing program wrote following message to stderr when compressing log /var/log/nginx/error.log.1: gzip: stdout: No space left on device error: failed to compress log /var/log/nginx/error.log.1 Diagnostic is uploaded. tower-diagnostics-20231221-0742.zip
  4. This solution appears to be working so far. Thank you!
  5. New diagnostic after changing cables and reboooting tower-diagnostics-20230506-1519.zip
  6. Hi all, I'm seeing this in my logs until they fill up. It's happened twice now since I got new cache disks. I've tried adjusting the cable and am about to try replacing it. Could it be something other than a hardware issue? I have attached diagnostics below, here's an excerpt: May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077984, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077985, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#11 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#11 CDB: opcode=0x2a 2a 00 00 62 7c 40 00 00 60 00 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 6454336 op 0x1:(WRITE) flags 0x1800 phys_seg 12 prio class 2 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#12 CDB: opcode=0x2a 2a 00 00 62 7c e0 00 00 40 00 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 6454496 op 0x1:(WRITE) flags 0x1800 phys_seg 8 prio class 2 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077986, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077987, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077988, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#22 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#22 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077988, rd 49379, flush 459478, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#6 CDB: opcode=0x2a 2a 00 00 00 08 80 00 00 08 00 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 2176 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 2 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 2176 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 2 May 6 08:49:20 Tower kernel: btrfs_end_super_write: 17 callbacks suppressed May 6 08:49:20 Tower kernel: BTRFS warning (device sdg1): lost page write due to IO error on /dev/sdg1 (-5) May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077989, rd 49379, flush 459478, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): error writing primary super block to device 1 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#7 CDB: opcode=0x2a 2a 00 97 cd a6 a8 00 00 08 00 May 6 08:49:20 Tower kernel: BTRFS warning (device sdg1): lost page write due to IO error on /dev/sdg1 (-5) May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): error writing primary super block to device 1 I think the issues begin with this error which may suggest that my PCI SATA controller is the underlying culprit? May 5 16:18:04 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen May 5 16:18:04 Tower kernel: ata6.00: failed command: DATA SET MANAGEMENT May 5 16:18:04 Tower kernel: ata6.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 21 dma 512 out May 5 16:18:04 Tower kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) May 5 16:18:04 Tower kernel: ata6.00: status: { DRDY } May 5 16:18:04 Tower kernel: ata6: hard resetting link May 5 16:18:10 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) May 5 16:18:14 Tower kernel: ata6: COMRESET failed (errno=-16) May 5 16:18:14 Tower kernel: ata6: hard resetting link May 5 16:18:20 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) May 5 16:18:24 Tower kernel: ata6: COMRESET failed (errno=-16) May 5 16:18:24 Tower kernel: ata6: hard resetting link May 5 16:18:30 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) May 5 16:18:59 Tower kernel: ata6: COMRESET failed (errno=-16) May 5 16:18:59 Tower kernel: ata6: limiting SATA link speed to 3.0 Gbps May 5 16:18:59 Tower kernel: ata6: hard resetting link May 5 16:19:04 Tower kernel: ata6: COMRESET failed (errno=-16) May 5 16:19:04 Tower kernel: ata6: reset failed, giving up May 5 16:19:04 Tower kernel: ata6.00: disable device May 5 16:19:04 Tower kernel: ata6: EH complete May 5 16:19:04 Tower kernel: sd 6:0:0:0: [sdg] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=89s May 5 16:19:04 Tower kernel: sd 6:0:0:0: [sdg] tag#5 CDB: opcode=0x28 28 00 2d a9 29 40 00 00 08 00 May 5 16:19:04 Tower kernel: I/O error, dev sdg, sector 766060864 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 2 May 5 16:19:04 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 May 5 16:19:04 Tower kernel: sd 6:0:0:0: [sdg] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s Update: I added iommu=pt in case it is my Marvell controller causing issues. Prior to doing that, I switched cables (sata and sata power adaptor) and rebooted. The log for each cache drive are littered in errors so im going to keep the array offline till I have a clue whats happening as im concerned about data corruption/loss. Thanks! tower-diagnostics-20230506-0836.zip
  7. Did you get anywhere? Getting similar errors but only managed to get a screenshot of the log. Upgrading off stable and will see if it reappears in the coming days.
  8. Reseated the cables yesterday and ordered new ones to try replacing (will test this evenimg) Seems odd that a bad cache cable would mess with the array parity no?
  9. tower-diagnostics-20230426-0733.zipYesterday my parity drive got corrupt or out of sync (fixed). Now my VMs are dead. I'm seeing errors that appear related to one of my cache drives and intend to try changing the SATA cable (reseated everything yesterday). Anyone got any ideas? Will upload diagnostic
  10. Hi, As it turns out, I got many errors within the first hour running memtest. Second RAM issue in the lastt year (other one was a different PC) - never had bad ram prior, perhaps I got hit with some savage solar radiation or something. Anyway, going to RMA and to avoid downtime, take the opportunity to upgrade! Quick follow up: what should I do after? I believe unraid has a method of checking the arrays for corruption. I suppose I should do that, recreate my VMs, and run diskcheck (or whatever else as the case may be) from inside each VM? Is there a better approach? Thanks again!
  11. Thank you, I'll get on that this week and report back. Where did you see the errors BTW? Would be useful for me to check more regularly I think. The dashboard and vms stopped responding so I restarted. I checked syslog after and found this: Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4 Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.4: device [8086:a294] error status/mask=00000001/00002000 Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.4: [ 0] RxErr Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:09:00.0: hcc params 0x0200ef80 hci version 0x110 quirks 0x0000000000800010 Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:09:00.0: xHCI Host Controller Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:09:00.0: new USB bus registered, assigned bus number 6 Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:09:00.0: Host supports USB 3.1 Enhanced SuperSpeed Jan 22 18:04:19 Tower kernel: hub 5-0:1.0: USB hub found Jan 22 18:04:19 Tower kernel: hub 5-0:1.0: 2 ports detected Jan 22 18:04:19 Tower kernel: usb usb6: We don't know the algorithms for LPM for this host, disabling LPM. Jan 22 18:04:19 Tower kernel: hub 6-0:1.0: USB hub found Jan 22 18:04:19 Tower kernel: hub 6-0:1.0: 2 ports detected Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:0a:00.0: xHCI Host Controller Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:0a:00.0: new USB bus registered, assigned bus number 7 Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.6: AER: Corrected error received: 0000:00:1c.6 Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.6: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.6: device [8086:a296] error status/mask=00000001/00002000 Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.6: [ 0] RxErr Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:0a:00.0: hcc params 0x0200ef80 hci version 0x110 quirks 0x0000000000800010 Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:0a:00.0: xHCI Host Controller Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:0a:00.0: new USB bus registered, assigned bus number 8 Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:0a:00.0: Host supports USB 3.1 Enhanced SuperSpeed
  12. Not clear what you are trying to do. I thought this was to set up WG to vpn into your server. If you're using an external VPN to go to another network then you won't be able to access the UI.
  13. Hi all, My VM tab has been loading very very slowly and I can't seem to figure out why. Furthermore, and I'm not actually sure if the two issues are related, I noticed the following error constantly repeating in my logs: virNetSocketReadWire:1791 : End of file while reading data: Input/output error I did some research and I found that a few people have posted about this error over the last three or so years, but they either never get a response, or mysteriously stop responding when asked to post their diagnostics. I've attached my diagnostics to break the cycle! In the interest of providing a maximum of information I'll mention a few seemingly unrelated issues I've run into in the last two years or so: - Corrupted USB (has happened 3 times -- I actually switched USBs and it happened again) - Some SMART errors -- I think these were related to a loose cable - Corruption in my dockers -- deleted and recreated In fact, writing all that out makes me feel like I should probably do some tests on my ram -- thoughts? (sorry for the tangent!) Your help is greatly appreciated! tower-diagnostics-20230121-1753.ziptower-diagnostics-20230121-1753.zip
  14. I didn't realize it was passed through. I'll check that out. I guess I probably did it in the XML and then changed the slots. Very helpful information, thank you!
  15. So i've been having 2 drives basically die together constantly and both happen to be attached to a SATA pcie card. The two drives get a billion read/writes and errors suddenly, but as far as I can tell the SMART reports look fine. I think it might be getting messed up again each time I load a VM - but i'm really unsure. I tried switching the PCIE slot but it happened anyway. Could the controller be dead (it's only ~2 years old I think), should I try some different ports? To make matters worse, at some point I tried to revert to an older Unraid version using UnraidDVB plug in and had to restore everything by just keeping my config folder - i'm not sure if that could be part of the issue so i'm mentioning it. Thoughts? (For clarity the screenshot below was when this happened after I was rebuilding parity from this happening the first time. Shutting down server till I know what's going on.) EDIT: OH! And I checked for loose cables and replaced one of them just in case - did not help. tower-diagnostics-20200730-2309.zip
  16. Also think this is a great idea - I think a webgui option would be sweet though. By the way, is there some type of sound virtualization that's possible ?
  17. Looking for a sound card that plays nicely and has SPDIF. Cheaper is better. The CM8888 chipset does NOT work (see post history). Anyone running something like this successfully?
  18. Getting this same error on a sound card: [13f6:5011] 04:00.0 Audio device: C-Media Electronics Inc CM8888 [Oxygen Express]
  19. Also have the same error with the same card in 6.8.3 (perhaps a different packaging though): https://www.amazon.ca/gp/product/B00X7B2NY0/ Any updates on this? Anyone got any thoughts? I could return it - but I feel like I'm constantly struggling to get a functioning VM (and I need SPDIF because I have an old receiver). For awhile, I had onboard audio passed through, but it was flaky and eventually just stopped working altogether. tower-diagnostics-20200507-0955.zip
  20. Oh! I see. Okay I'll try to boot the server with a windows 10 drive, thank you.
  21. When I run "wmic logicaldisk get caption" through windows install disk troubleshooter via CLI, I get three drives D, E, and X. It seems like D is the install disk (UDF?), and E is "RAW" and I can't run chkdsk on it, and finally x which I assume is just a live disk of some kind is write protected. So, E is the drive and the partition is gone? If that's the case, I assume I need to use some kind of recovery software? Does that mean building a new VM with something like EaseUS partition recovery and adding the drive to that VM as a secondary? EDIT: diskpart > list disk tells me that there are no fixed disks to show. EDIT2: Alright so list volume shows me VirtIO and what I assume is the windows install disk. So I guess it's not even seeing the drive. I just didn't know any better. Thank you i'll look into that. Thank you for both replies!
  22. Hey! Thanks for the reply! Sorry about that, I didn't see the option in the original submission. Here you go. tower-diagnostics-20200507-0955.zip
  23. Hey all, Basically, I restarted my server yesterday and when I tried to boot my windows 10 VM, I got the oxc00000f error from windows. Attempts to repair indicate it can't find a windows installation. Looking at the drive it's running on which is an unassigned device (not passed through), an NVME drive, I saw the following errors in the log (repeating) every time I load the VM: Tower kernel: print_req_error: critical medium error, dev nvme0n1, sector 30345600 Tower kernel: Buffer I/O error on dev nvme0n1p1, logical block 3792944, async page read I did some research, and that lead me to believe maybe the SSD was dying - seemed unusual, as it's not that old - but possible (it's a Force MP500). I did some testing, copied a file onto it and off of it - everything seemed fine. When I tried to copy the ISO using dolphin it fails because it "cannot read" it. I assume this is some kind of file corruption? Any tips/fixes? Even how to avoid this in the future. I know I should have taken back ups - so I'll be doing that going forward, but I'm really hoping not to lose everything. Note: I'm currently on 6.8 because I downgraded to test whether it could help - it didn't. I was on 6.8.2 to begin with. Thanks for reading! tower-diagnostics-20200507-0955.zip