shaunsund

Members
  • Posts

    81
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

shaunsund's Achievements

Rookie

Rookie (2/14)

7

Reputation

  1. As the attached diagnostics should show, every now and then, /var/log fills up and I have to reboot. I can't narrow it down to a certain docker image or plug-in. Normally, I couldn't catch it filling up until I made a user script to check the space. I did notice that the Dashboard won't show CPU usage and the log viewer page's close button is labeled 'undefined' I can change this script I made to grab certain info if anyone has some suggestions. fractal-diagnostics-20210413-1257.zip
  2. well if a drive disappeared then the calculation for disk free would change -- although I would expect that to cause a warning before disk usage. I did check while writing the first post, every disk's utilization warning and critical are equal to the global. Also, even if they were different, why did they alarm for 2 disks and report ok for 6 disks a minute later? Like JorgeB mentioned it seems to be "an elusive issue and difficult to replicate"
  3. So this is odd. I have 8 data disks and all except #1 are nearly full so I set my global warning at 93% and critical is 97% usage so that everything is OK utilization wise. Today was the second time I got the below messages: and also one for disk 8. Yet, a minute later I get the following: also for disk 3, 4, 5, 7, and 8 I got the Warning at 5:19pm and then the Notice emails at 5:20pm. Disks are practically idle during this time. I don't know why they would hiccup these utilization errors. If I had a disk or cache drive drop out then maybe the totals might change but there is no indication in syslog as to a cause for these warnings. fractal-diagnostics-20201104-1804.zip
  4. Has anyone else been getting Out of memory errors when it comes to qbittorent? I've had the max memory set to 2G for the longest time but within the last few weeks I always get a OOM error from Unraid. Even upping the memory to 2.5G still gives OOM. Seems like there is now a memory leak. Oct 29 21:10:58 fractal kernel: Memory cgroup out of memory: Kill process 24455 (qbittorrent-nox) score 984 or sacrifice child
  5. I've got 2 other drives on the same controller in addition to the SSDs. No problems with those. Why the other 2 drives didn't also complain is curious. Actually, I didn't get enough sleep. ~4hrs. Confused the Free space with the FS size. Smart reports are OK on each drive. btrfs reports OK. It must be the motherboard. Which, seeing how its out of warranty by 3 months seems even more likely.
  6. Last night I noticed many errors from my cache drives: FS went RO, Docker not responsive and the logs started doing: Oct 12 23:46:00 fractal kernel: ata8.00: exception Emask 0x0 SAct 0xfffff03f SErr 0x0 action 0x6 frozen Oct 12 23:46:00 fractal kernel: ata8.00: failed command: WRITE FPDMA QUEUED Oct 12 23:46:00 fractal kernel: ata8.00: cmd 61/18:00:a8:16:ff/00:00:0b:00:00/40 tag 0 ncq dma 12288 out Oct 12 23:46:00 fractal kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 12 23:46:00 fractal kernel: ata8.00: status: { DRDY } Oct 12 23:46:00 fractal kernel: ata8.00: failed command: WRITE FPDMA QUEUED Oct 12 23:46:00 fractal kernel: ata8.00: cmd 61/20:08:c0:16:ff/00:00:0b:00:00/40 tag 1 ncq dma 16384 out Oct 12 23:46:00 fractal kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) and then Oct 13 00:27:29 fractal rsyslogd: action 'action-3-builtin:omfile' (module 'builtin:omfile') message lost, could not be processed. Check for additional error messages before this one. [v8.1908.0 try https://www.rsyslog.com/e/2027 ] Oct 13 00:27:29 fractal rsyslogd: file '/mnt/user/meta/syslog-10.10.10.10.log'[2] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: Read-only file system [v8.1908.0 try https://www.rsyslog.com/e/2027 ] Oct 13 00:27:29 fractal rsyslogd: action 'action-3-builtin:omfile' (module 'builtin:omfile') message lost, could not be processed. Check for additional error messages before this one. [v8.1908.0 try https://www.rsyslog.com/e/2027 ] Oct 13 00:27:29 fractal kernel: scsi_io_completion_action: 127 callbacks suppressed Oct 13 00:27:29 fractal kernel: sd 7:0:0:0: [sdd] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Oct 13 00:27:29 fractal kernel: sd 7:0:0:0: [sdd] tag#5 CDB: opcode=0x28 28 00 03 61 18 e0 00 00 20 00 Oct 13 00:27:29 fractal kernel: print_req_error: 133 callbacks suppressed Oct 13 00:27:29 fractal kernel: print_req_error: I/O error, dev sdd, sector 56695008 Oct 13 00:27:29 fractal kernel: btrfs_dev_stat_print_on_error: 127 callbacks suppressed Oct 13 00:27:29 fractal kernel: BTRFS error (device dm-8): bdev /dev/mapper/sdd1 errs: wr 74, rd 10858, flush 0, corrupt 0, gen 0 Oct 13 00:27:29 fractal kernel: sd 8:0:0:0: [sde] tag#25 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Oct 13 00:27:29 fractal kernel: sd 8:0:0:0: [sde] tag#25 CDB: opcode=0x28 28 00 0e 20 18 e0 00 00 20 00 Oct 13 00:27:29 fractal kernel: print_req_error: I/O error, dev sde, sector 236984544 Oct 13 00:27:29 fractal kernel: BTRFS error (device dm-8): bdev /dev/mapper/sde1 errs: wr 417, rd 9685, flush 0, corrupt 0, gen 0 Oct 13 00:27:29 fractal kernel: sd 7:0:0:0: [sdd] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Oct 13 00:27:29 fractal kernel: sd 7:0:0:0: [sdd] tag#3 CDB: opcode=0x28 28 00 11 86 70 70 00 00 08 00 I was able to shutdown after grabbing diagnostics and went to bed. Woke up this morning to a working array but the cache drive was missing 173G. From 500G to 327G. Does anyone know what happened to my drives? I fear that I am a victim of the excessive cache write bug and it has killed my SSDs although I would have expected them just to die rather than loose space. I've included the diags from last night and this morning after a successful boot. I'm going to look into new SSDs (any suggestions?) My motherboard can also support 2x M.2 drives. Thanks! fractal-diagnostics-20201013-0027.zip fractal-diagnostics-20201013-0658.zip
  7. I have been trying to add a single data drive and a single cache drive with encryption. After entering the password for the encryption, it would show both drives as needing formatting, But after formatting them both, it then says the cache drive needs to be formatted. Allowing it to format the cache drive a second time results in a non-encrypted cache drive. After going through this several times with different disk settings (xfs or Btrfs encryption) I did the format with encryption and then reverted to 6.8.3 without the second format. The cache tab shows it formatted with odd partitioning (can't recall the exact wording) and a correcting format results in a encrypted cache drive. Screenshots and diags included beta29-always formatting cache.zip
  8. I've had /var/log hit 100% 2x in the last week, Was able to get some error and syslog files off as, until I ran sudo /etc/rc.d/rc.nginx restart and then was able to pull diagnostics. In the logs I found thousands of line like: Aug 17 04:40:15 fractal nginx: 2020/08/17 04:40:15 [error] 31737#31737: *955988 nchan: error publishing message (HTTP status code 507), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost" Aug 17 04:40:15 fractal nginx: 2020/08/17 04:40:15 [crit] 31737#31737: ngx_slab_alloc() failed: no memory Aug 17 04:40:15 fractal nginx: 2020/08/17 04:40:15 [error] 31737#31737: shpool alloc failed Aug 17 04:40:15 fractal nginx: 2020/08/17 04:40:15 [error] 31737#31737: nchan: Out of shared memory while allocating channel /disks. Increase nchan_max_reserved_memory. Aug 17 04:40:15 fractal nginx: 2020/08/17 04:40:15 [error] 31737#31737: *955989 nchan: error publishing message (HTTP status code 507), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost" Aug 17 04:40:16 fractal nginx: 2020/08/17 04:40:16 [crit] 31737#31737: ngx_slab_alloc() failed: no memory Aug 17 04:40:16 fractal nginx: 2020/08/17 04:40:16 [error] 31737#31737: shpool alloc failed Aug 17 04:40:16 fractal nginx: 2020/08/17 04:40:16 [error] 31737#31737: nchan: Out of shared memory while allocating channel /cpuload. Increase nchan_max_reserved_memory. Aug 17 04:40:16 fractal nginx: 2020/08/17 04:40:16 [error] 31737#31737: *955990 nchan: error publishing message (HTTP status code 507), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost" Aug 17 04:40:16 fractal nginx: 2020/08/17 04:40:16 [crit] 31737#31737: ngx_slab_alloc() failed: no memory Aug 17 04:40:16 fractal nginx: 2020/08/17 04:40:16 [error] 31737#31737: shpool alloc failed Aug 17 04:40:16 fractal nginx: 2020/08/17 04:40:16 [error] 31737#31737: nchan: Out of shared memory while allocating channel /var. Increase nchan_max_reserved_memory. Aug 17 04:40:16 fractal nginx: 2020/08/17 04:40:16 [alert] 31737#31737: *955991 header already sent while keepalive, client: 10.10.10.14, server: 0.0.0.0:443 Aug 17 04:40:16 fractal kernel: nginx[31737]: segfault at 0 ip 0000000000000000 sp 00007ffea6b931f8 error 14 in nginx[400000+21000] Aug 17 04:40:16 fractal kernel: Code: Bad RIP value. Aug 17 04:40:16 fractal nginx: 2020/08/17 04:40:16 [alert] 17122#17122: worker process 31737 exited on signal 11 Another thing I noticed was that on the Dashboard page, the bars for the cores weren't showing activity. From what I have read, this can be from 'other' browsers. I use Opera and Chromium. fractal-diagnostics-20200817-1654-PLUS.zip
  9. Loving this plugin, but did notice something odd: Image named '0' with wrong Icon for the image name. It can't be a orphaned image; I have a User Script that removes those.
  10. Eureka moment! I have a raspberry pi I can use. Thanks!
  11. I'll post this incase someone has experienced this or can come up with something. Was copying files over nfs to another unraid server. after about 4+ hours the GUI wasn't loading, my load in htop was high 30s. Was able to login via ssh and kill some dockers but the load never got better than 25. diagnostics command wasn't responsive. Had to power off the hard way. Brought it back up and of course Parity check starts, but did get a message on startup about one of my cache drives: Warning [FRACTAL] - Cache pool BTRFS missing device(s)Samsung_SSD_850_EVO_500GB_S3PTNB0JC12576E (sdh) But as it checks parity all seems to be working fine. I find it hard to troubleshoot instances when the system is unresponsive to a point where gathering diagnostics is impossible. Does anyone have tips to get some information or 'refresh' the GUI so that a sluggish system can be recovered without a power reset? The attached diag is after the reboot. fractal-smart-20200625-2333.zip
  12. Apr 6 01:58:25 fractal nginx: 2020/04/06 01:58:25 [alert] 16754#16754: worker process 27639 exited on signal 6 Apr 6 01:58:27 fractal nginx: 2020/04/06 01:58:27 [alert] 16754#16754: worker process 27676 exited on signal 6 Apr 6 01:58:28 fractal nginx: 2020/04/06 01:58:28 [crit] 27766#27766: ngx_slab_alloc() failed: no memory Apr 6 01:58:28 fractal nginx: 2020/04/06 01:58:28 [error] 27766#27766: shpool alloc failed Apr 6 01:58:28 fractal nginx: 2020/04/06 01:58:28 [error] 27766#27766: nchan: Out of shared memory while allocating message of size 6320. Increase nchan_max_reserved_memory. Apr 6 01:58:28 fractal nginx: 2020/04/06 01:58:28 [error] 27766#27766: *3034108 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost" Apr 6 01:58:28 fractal nginx: 2020/04/06 01:58:28 [error] 27766#27766: MEMSTORE:00: can't create shared message for channel /disks Apr 6 01:58:29 fractal nginx: 2020/04/06 01:58:29 [alert] 16754#16754: worker process 27766 exited on signal 6 Apr 6 01:58:30 fractal nginx: 2020/04/06 01:58:30 [crit] 27792#27792: ngx_slab_alloc() failed: no memory Submitting for info/solution. I woke up today and saw 30%+ of log space used overnight. Looks like the UI had a hiccup with the logs filling up with the above. Initial search made it sound like the iOS Safari browser was to blame but still saw errors after closing that tab. Any other browsers are Firefox. If the UI log in syslog included IP addresses, I could narrow the culprit down. fractal-diagnostics-20200406-0821.zip
  13. Saw the above link which is my current NIC so I guess this is common, don't know why I have only experienced it with 6.8.2 Just so future searches find this here's what my onboard NIC ids as: Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V I have ordered a Intel Ethernet Server Adapter I340-T4. Don't need the 4 ports but a 2 port wasn't available for 2-day shipping.