TurkeyPerson

Members
  • Posts

    25
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

TurkeyPerson's Achievements

Noob

Noob (1/14)

2

Reputation

  1. So it's my syslog that's filling up that directory - this is what keeps repeating: Jan 11 04:40:58 Tower nginx: 2024/01/11 04:40:58 [error] 7029#7029: shpool alloc failed Jan 11 04:40:58 Tower nginx: 2024/01/11 04:40:58 [error] 7029#7029: nchan: Out of shared memory while allocating message of size 14047. Increase nchan_max_reserved_memory. Jan 11 04:40:58 Tower nginx: 2024/01/11 04:40:58 [error] 7029#7029: *4418775 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/devices?buffer_length=1 HTTP/1.> Jan 11 04:40:58 Tower nginx: 2024/01/11 04:40:58 [error] 7029#7029: MEMSTORE:00: can't create shared message for channel /devices Jan 11 04:40:59 Tower nginx: 2024/01/11 04:40:59 [crit] 7029#7029: ngx_slab_alloc() failed: no memory My usage habits have not changed so I'm not leaving more/less GUI windows than before. In fact, trying to make a greater effort to close them but this has consistently been an issue since it started. Is that really the only possibility? Fresh diagnostics taken without a reset attached. tower-diagnostics-20240112-0849.zip Edit: After doing some brief reading, I think this may be caused by NetData. Going to disable it and see if this comes back unless someone has another idea.
  2. I already rebooted the server so not sure that output is helpful but good to know if/when it happens again. Thanks. Yeah - I'll do this and see if I run into other issues. Thanks.
  3. Woke up to this error and common issues suggests posting diagnostic online and temporary fix of restarting: error: Compressing program wrote following message to stderr when compressing log /var/log/nginx/error.log.1: gzip: stdout: No space left on device error: failed to compress log /var/log/nginx/error.log.1 Diagnostic is uploaded. tower-diagnostics-20231221-0742.zip
  4. This solution appears to be working so far. Thank you!
  5. New diagnostic after changing cables and reboooting tower-diagnostics-20230506-1519.zip
  6. Hi all, I'm seeing this in my logs until they fill up. It's happened twice now since I got new cache disks. I've tried adjusting the cable and am about to try replacing it. Could it be something other than a hardware issue? I have attached diagnostics below, here's an excerpt: May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077984, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077985, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#11 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#11 CDB: opcode=0x2a 2a 00 00 62 7c 40 00 00 60 00 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 6454336 op 0x1:(WRITE) flags 0x1800 phys_seg 12 prio class 2 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#12 CDB: opcode=0x2a 2a 00 00 62 7c e0 00 00 40 00 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 6454496 op 0x1:(WRITE) flags 0x1800 phys_seg 8 prio class 2 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077986, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077987, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077988, rd 49379, flush 459477, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#22 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#22 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077988, rd 49379, flush 459478, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#6 CDB: opcode=0x2a 2a 00 00 00 08 80 00 00 08 00 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 2176 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 2 May 6 08:49:20 Tower kernel: I/O error, dev sdg, sector 2176 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 2 May 6 08:49:20 Tower kernel: btrfs_end_super_write: 17 callbacks suppressed May 6 08:49:20 Tower kernel: BTRFS warning (device sdg1): lost page write due to IO error on /dev/sdg1 (-5) May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 8077989, rd 49379, flush 459478, corrupt 0, gen 0 May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): error writing primary super block to device 1 May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s May 6 08:49:20 Tower kernel: sd 6:0:0:0: [sdg] tag#7 CDB: opcode=0x2a 2a 00 97 cd a6 a8 00 00 08 00 May 6 08:49:20 Tower kernel: BTRFS warning (device sdg1): lost page write due to IO error on /dev/sdg1 (-5) May 6 08:49:20 Tower kernel: BTRFS error (device sdg1): error writing primary super block to device 1 I think the issues begin with this error which may suggest that my PCI SATA controller is the underlying culprit? May 5 16:18:04 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen May 5 16:18:04 Tower kernel: ata6.00: failed command: DATA SET MANAGEMENT May 5 16:18:04 Tower kernel: ata6.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 21 dma 512 out May 5 16:18:04 Tower kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) May 5 16:18:04 Tower kernel: ata6.00: status: { DRDY } May 5 16:18:04 Tower kernel: ata6: hard resetting link May 5 16:18:10 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) May 5 16:18:14 Tower kernel: ata6: COMRESET failed (errno=-16) May 5 16:18:14 Tower kernel: ata6: hard resetting link May 5 16:18:20 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) May 5 16:18:24 Tower kernel: ata6: COMRESET failed (errno=-16) May 5 16:18:24 Tower kernel: ata6: hard resetting link May 5 16:18:30 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) May 5 16:18:59 Tower kernel: ata6: COMRESET failed (errno=-16) May 5 16:18:59 Tower kernel: ata6: limiting SATA link speed to 3.0 Gbps May 5 16:18:59 Tower kernel: ata6: hard resetting link May 5 16:19:04 Tower kernel: ata6: COMRESET failed (errno=-16) May 5 16:19:04 Tower kernel: ata6: reset failed, giving up May 5 16:19:04 Tower kernel: ata6.00: disable device May 5 16:19:04 Tower kernel: ata6: EH complete May 5 16:19:04 Tower kernel: sd 6:0:0:0: [sdg] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=89s May 5 16:19:04 Tower kernel: sd 6:0:0:0: [sdg] tag#5 CDB: opcode=0x28 28 00 2d a9 29 40 00 00 08 00 May 5 16:19:04 Tower kernel: I/O error, dev sdg, sector 766060864 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 2 May 5 16:19:04 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 May 5 16:19:04 Tower kernel: sd 6:0:0:0: [sdg] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s Update: I added iommu=pt in case it is my Marvell controller causing issues. Prior to doing that, I switched cables (sata and sata power adaptor) and rebooted. The log for each cache drive are littered in errors so im going to keep the array offline till I have a clue whats happening as im concerned about data corruption/loss. Thanks! tower-diagnostics-20230506-0836.zip
  7. Did you get anywhere? Getting similar errors but only managed to get a screenshot of the log. Upgrading off stable and will see if it reappears in the coming days.
  8. Reseated the cables yesterday and ordered new ones to try replacing (will test this evenimg) Seems odd that a bad cache cable would mess with the array parity no?
  9. tower-diagnostics-20230426-0733.zipYesterday my parity drive got corrupt or out of sync (fixed). Now my VMs are dead. I'm seeing errors that appear related to one of my cache drives and intend to try changing the SATA cable (reseated everything yesterday). Anyone got any ideas? Will upload diagnostic
  10. Hi, As it turns out, I got many errors within the first hour running memtest. Second RAM issue in the lastt year (other one was a different PC) - never had bad ram prior, perhaps I got hit with some savage solar radiation or something. Anyway, going to RMA and to avoid downtime, take the opportunity to upgrade! Quick follow up: what should I do after? I believe unraid has a method of checking the arrays for corruption. I suppose I should do that, recreate my VMs, and run diskcheck (or whatever else as the case may be) from inside each VM? Is there a better approach? Thanks again!
  11. Thank you, I'll get on that this week and report back. Where did you see the errors BTW? Would be useful for me to check more regularly I think. The dashboard and vms stopped responding so I restarted. I checked syslog after and found this: Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.4: AER: Corrected error received: 0000:00:1c.4 Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.4: device [8086:a294] error status/mask=00000001/00002000 Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.4: [ 0] RxErr Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:09:00.0: hcc params 0x0200ef80 hci version 0x110 quirks 0x0000000000800010 Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:09:00.0: xHCI Host Controller Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:09:00.0: new USB bus registered, assigned bus number 6 Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:09:00.0: Host supports USB 3.1 Enhanced SuperSpeed Jan 22 18:04:19 Tower kernel: hub 5-0:1.0: USB hub found Jan 22 18:04:19 Tower kernel: hub 5-0:1.0: 2 ports detected Jan 22 18:04:19 Tower kernel: usb usb6: We don't know the algorithms for LPM for this host, disabling LPM. Jan 22 18:04:19 Tower kernel: hub 6-0:1.0: USB hub found Jan 22 18:04:19 Tower kernel: hub 6-0:1.0: 2 ports detected Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:0a:00.0: xHCI Host Controller Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:0a:00.0: new USB bus registered, assigned bus number 7 Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.6: AER: Corrected error received: 0000:00:1c.6 Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.6: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.6: device [8086:a296] error status/mask=00000001/00002000 Jan 22 18:04:19 Tower kernel: pcieport 0000:00:1c.6: [ 0] RxErr Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:0a:00.0: hcc params 0x0200ef80 hci version 0x110 quirks 0x0000000000800010 Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:0a:00.0: xHCI Host Controller Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:0a:00.0: new USB bus registered, assigned bus number 8 Jan 22 18:04:19 Tower kernel: xhci_hcd 0000:0a:00.0: Host supports USB 3.1 Enhanced SuperSpeed
  12. Not clear what you are trying to do. I thought this was to set up WG to vpn into your server. If you're using an external VPN to go to another network then you won't be able to access the UI.
  13. Hi all, My VM tab has been loading very very slowly and I can't seem to figure out why. Furthermore, and I'm not actually sure if the two issues are related, I noticed the following error constantly repeating in my logs: virNetSocketReadWire:1791 : End of file while reading data: Input/output error I did some research and I found that a few people have posted about this error over the last three or so years, but they either never get a response, or mysteriously stop responding when asked to post their diagnostics. I've attached my diagnostics to break the cycle! In the interest of providing a maximum of information I'll mention a few seemingly unrelated issues I've run into in the last two years or so: - Corrupted USB (has happened 3 times -- I actually switched USBs and it happened again) - Some SMART errors -- I think these were related to a loose cable - Corruption in my dockers -- deleted and recreated In fact, writing all that out makes me feel like I should probably do some tests on my ram -- thoughts? (sorry for the tangent!) Your help is greatly appreciated! tower-diagnostics-20230121-1753.ziptower-diagnostics-20230121-1753.zip
  14. I didn't realize it was passed through. I'll check that out. I guess I probably did it in the XML and then changed the slots. Very helpful information, thank you!