jsmontague

Members
  • Posts

    60
  • Joined

  • Last visited

jsmontague's Achievements

Rookie

Rookie (2/14)

3

Reputation

  1. I've been waiting for the next issue to pop up but we're back to stable since disabling flash backup and connect. Thanks for all the help JorgeB!
  2. I have removed it for now. Could that have lead to the SMB issue?
  3. I knew I shouldn't have posted Every minute my logs are shows the UpdateFlashBackup update log, tried to use git show and git log in boot to find commits and what is being saved however there was no updates today even with the posts. Noticed that SMB also didn't work (went to go look at syslog to see if I could find what triggered it). Tried to restart SMB and now have permission errors and the flash backup is started again with no changes. alexandria-diagnostics-20231113-1330.zip syslog
  4. Rebuilt server has no ongoing errors, and I accidentally left safari browsers open last couple days that did not cause the same bug (was prior using edge).
  5. Tried reseating RAM and what ever was on its way out has finally given up the ghost, it no longer will turn on or post. At this point I think it's just new MB/CPU/RAM time
  6. Thats what I thought I had originally (a RAM issue), so I swapped my 128GB out for 32GB from my lab box. All of it is enterprise ECC RAM never had a problem with RAM before in any system. Could I really have 2 different sets of RAM with issues or is it possibly CPU/board related?
  7. So far so good, but I saw the below in the log. Nov 2 15:59:17 Alexandria kernel: mce: [Hardware Error]: Machine check events logged Nov 2 15:59:17 Alexandria kernel: mce: [Hardware Error]: Machine check events logged mcelog: failed to prefill DIMM database from DMI data Kernel does not support page offline interface mcelog: Cannot read sysfs field /sys/kernel/security/lockdown: No such file or directory Kernel in lockdown. Cannot enable DIMM error location reportingFallback Socket memory error count 163 exceeded threshold: 163 in 24h Location SOCKET:1 CHANNEL:? DIMM:? [] Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback) Hardware event. This is not a software error. MCE 0 not finished? CPU 8 BANK 7 TSC 1e0037342d89a MISC 152561e86 ADDR 52d7e2940 TIME 1698789886 Tue Oct 31 17:04:46 2023 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER RD_CHANNEL2_ERR Transaction: Memory read error STATUS cc00290000010092 MCGSTATUS 0 MCGCAP 1000c19 APICID 20 SOCKETID 1 MICROCODE 42e CPUID Vendor Intel Family 6 Model 62 Step 4 Hardware event. This is not a software error. MCE 0 CPU 8 BANK 7 TSC 189d3a0f42916 MISC 30684286 ADDR 4ab0d1940 TIME 1698958757 Thu Nov 2 15:59:17 2023 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER RD_CHANNEL2_ERR Transaction: Memory read error STATUS cc00040000010092 MCGSTATUS 0 MCGCAP 1000c19 APICID 20 SOCKETID 1 MICROCODE 42e CPUID Vendor Intel Family 6 Model 62 Step 4 Hardware event. This is not a software error. MCE 1 CPU 8 BANK 11 TSC 189d3a0f42916 MISC 90000000000208c ADDR 742c52000 TIME 1698958757 Thu Nov 2 15:59:17 2023 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c000050000800c2 MCGSTATUS 0 MCGCAP 1000c19 APICID 20 SOCKETID 1 MICROCODE 42e CPUID Vendor Intel Family 6 Model 62 Step 4 Hardware event. This is not a software error. MCE 2 CPU 8 BANK 7 TSC 189d3a0f501de MISC 202ebe86 ADDR 4d4647700 TIME 1698958757 Thu Nov 2 15:59:17 2023 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER RD_CHANNEL2_ERR Transaction: Memory read error STATUS cc00044000010092 MCGSTATUS 0 MCGCAP 1000c19 APICID 20 SOCKETID 1 MICROCODE 42e CPUID Vendor Intel Family 6 Model 62 Step 4 Hardware event. This is not a software error. MCE 3 CPU 8 BANK 7 TSC 189d3a0f571d2 MISC 2076e086 ADDR 4ab0d5540 TIME 1698958757 Thu Nov 2 15:59:17 2023 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER RD_CHANNEL2_ERR Transaction: Memory read error STATUS cc00030000010092 MCGSTATUS 0 MCGCAP 1000c19 APICID 20 SOCKETID 1 MICROCODE 42e CPUID Vendor Intel Family 6 Model 62 Step 4 Hardware event. This is not a software error. MCE 4 CPU 8 BANK 7 TSC 189d3a0f5e316 MISC 205a9686 ADDR 4ab0d5d40 TIME 1698958757 Thu Nov 2 15:59:17 2023 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER RD_CHANNEL2_ERR Transaction: Memory read error STATUS cc0000c000010092 MCGSTATUS 0 MCGCAP 1000c19 APICID 20 SOCKETID 1 MICROCODE 42e CPUID Vendor Intel Family 6 Model 62 Step 4
  8. Yup synching is trying to rebuild its database, just found it. Thanks Jorge! Still feels like the system is not performing as it has in the past but can't sort out why so everything little thing has me running down rabbit holes. Appreciate your help!
  9. The server started a parity check and is running at 1-5 MBs when it typically does a parity check around 150MBs. I don't see anything in the logs outside of the below. I'm going to cancel the parity and I'll restart the server into safe mode today. Nov 1 22:00:01 Alexandria kernel: mdcmd (39): check NOCORRECT Nov 1 22:00:01 Alexandria kernel: Nov 1 22:00:01 Alexandria kernel: md: recovery thread: check P Q ... alexandria-diagnostics-20231102-0641.zip
  10. I did not know that..... Next lockup I'll put it in safe mode to rule out plugins! I confirmed with netstat that I have no open HTTPS session to the box, will report back in couple days if were still up! Thanks Jorge!
  11. I have my server boot into WEB mode but leave the connected monitor/keyboard+mouse off. Could it be that web session?
  12. I can't boot into safe mode as I need the services to be working, but I did shutdown my browser before I put my laptop to sleep. Is there anything else that could cause that? Is this a bug, never had to close a browser to any system before to keep it from crashing. The server was running hours after those logs so I dismissed them as root cause, will see if I can identify any other system that might have a browser open to unraid.
  13. Looking for other ideas to pinpoint my issue, server locked up yesterday evening a little after 5:00pm local time but provided nothing in syslog prior and was not able to use onboard keyboard/mouse or SSH to try and pull DIAG. Had to hard power cycle it to get it back up. alexandria-diagnostics-20231101-0830.zip syslog-192.168.36.150.log
  14. My logs are full of nginx memory full errors. Attached new diag. Is this tied to a page being left open on browser of a device that is asleep overnight or something else? Can't believe that's a thing but it's a few of the google results I found when searching for the log messages. Oct 30 20:00:33 Alexandria kernel: clocksource: timekeeping watchdog on CPU6: hpet wd-wd read-back delay of 54266ns Oct 30 20:00:33 Alexandria kernel: clocksource: wd-tsc-wd read-back delay of 106577ns, clock-skew test skipped! Oct 30 20:24:51 Alexandria nginx: 2023/10/30 20:24:51 [crit] 12347#12347: ngx_slab_alloc() failed: no memory Oct 30 20:24:51 Alexandria nginx: 2023/10/30 20:24:51 [error] 12347#12347: shpool alloc failed Oct 30 20:24:51 Alexandria nginx: 2023/10/30 20:24:51 [error] 12347#12347: nchan: Out of shared memory while allocating message of size 27101. Increase nchan_max_reserved_memory. Oct 30 20:24:51 Alexandria nginx: 2023/10/30 20:24:51 [error] 12347#12347: *1306818 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/devices?buffer_length=1 HTTP/1.1", host: "localhost" Oct 30 20:24:51 Alexandria nginx: 2023/10/30 20:24:51 [error] 12347#12347: MEMSTORE:00: can't create shared message for channel /devices Oct 30 20:24:52 Alexandria nginx: 2023/10/30 20:24:52 [crit] 12347#12347: ngx_slab_alloc() failed: no memory Oct 30 20:24:52 Alexandria nginx: 2023/10/30 20:24:52 [error] 12347#12347: shpool alloc failed Oct 30 20:24:52 Alexandria nginx: 2023/10/30 20:24:52 [error] 12347#12347: nchan: Out of shared memory while allocating message of size 27101. Increase nchan_max_reserved_memory. Oct 30 20:24:52 Alexandria nginx: 2023/10/30 20:24:52 [error] 12347#12347: *1306826 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/devices?buffer_length=1 HTTP/1.1", host: "localhost" Oct 30 20:24:52 Alexandria nginx: 2023/10/30 20:24:52 [error] 12347#12347: MEMSTORE:00: can't create shared message for channel /devices Oct 30 20:24:56 Alexandria nginx: 2023/10/30 20:24:56 [crit] 12347#12347: ngx_slab_alloc() failed: no memory Oct 30 20:24:56 Alexandria nginx: 2023/10/30 20:24:56 [error] 12347#12347: shpool alloc failed Oct 30 20:24:56 Alexandria nginx: 2023/10/30 20:24:56 [error] 12347#12347: nchan: Out of shared memory while allocating message of size 27106. Increase nchan_max_reserved_memory. Oct 30 20:24:56 Alexandria nginx: 2023/10/30 20:24:56 [error] 12347#12347: *1306868 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/devices?buffer_length=1 HTTP/1.1", host: "localhost" Oct 30 20:24:56 Alexandria nginx: 2023/10/30 20:24:56 [error] 12347#12347: MEMSTORE:00: can't create shared message for channel /devices Oct 30 20:24:57 Alexandria nginx: 2023/10/30 20:24:57 [crit] 12347#12347: ngx_slab_alloc() failed: no memory Oct 30 20:24:57 Alexandria nginx: 2023/10/30 20:24:57 [error] 12347#12347: shpool alloc failed Oct 30 20:24:57 Alexandria nginx: 2023/10/30 20:24:57 [error] 12347#12347: nchan: Out of shared memory while allocating message of size 27111. Increase nchan_max_reserved_memory. Oct 30 20:24:57 Alexandria nginx: 2023/10/30 20:24:57 [error] 12347#12347: *1306881 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/devices?buffer_length=1 HTTP/1.1", host: "localhost" Oct 30 20:24:57 Alexandria nginx: 2023/10/30 20:24:57 [error] 12347#12347: MEMSTORE:00: can't create shared message for channel /devices Oct 30 20:24:58 Alexandria nginx: 2023/10/30 20:24:58 [crit] 12347#12347: ngx_slab_alloc() failed: no memory Oct 30 20:24:58 Alexandria nginx: 2023/10/30 20:24:58 [error] 12347#12347: shpool alloc failed Oct 30 20:24:58 Alexandria nginx: 2023/10/30 20:24:58 [error] 12347#12347: nchan: Out of shared memory while allocating message of size 27111. Increase nchan_max_reserved_memory. Oct 30 20:24:58 Alexandria nginx: 2023/10/30 20:24:58 [error] 12347#12347: *1306889 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/devices?buffer_length=1 HTTP/1.1", host: "localhost" Oct 30 20:24:58 Alexandria nginx: 2023/10/30 20:24:58 [error] 12347#12347: MEMSTORE:00: can't create shared message for channel /devices alexandria-diagnostics-20231031-0844.zip
  15. Perfect, I'll add recreating the docker image to the list! Appreciate it