Jump to content

New server keeps crashing


Recommended Posts

I built an unraid server about 6 weeks ago and it has been working flawlessly until two days ago.

Starting two days ago, the server becomes unresponsive. I can not access network shares, ssh to the server, or access the GUI. The LAN lights are flashing and the switchport it is plugged into is still lit. The only resolution is to power the server off and turn it back on.

I have not been able to console into the server when it crashes to determine if there is any GUI output or directly connect in any way due to the location. 

Here are the only warning/error syslog messages on the logs.

Apr 1 21:03:58 NAS kernel: ACPI: Early table checksum verification disabled Apr 1 21:03:58 NAS kernel: floppy0: no floppy controllers found
Apr 1 21:03:58 NAS kernel: i915 0000:00:02.0: [drm] failed to retrieve link info, disabling eDP
Apr 1 21:04:02 NAS mcelog: failed to prefill DIMM database from DMI data Apr 1 21:04:14 NAS rpc.statd[2134]: Failed to read /var/lib/nfs/state: Success
Apr 1 21:04:28 NAS kernel: nvme 0000:05:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update Apr 1 21:04:28 NAS kernel: nvme 0000:05:00.0: failed VPD read at offset 1

 

 

Edited by liquidrt
Link to comment

Thanks @JorgeB. I did set up syslog mirroring to the flash. For some reason, when I run:

 

tail -f -n 500000 /var/log/syslog



I am still only seeing messages directly after reboot. I am not seeing logs persist prior to the crash or reboot. Maybe I am not looking in the correct spot.

I am going to continue to read the syslog messages while the server is operating and hopefully catch the condition before it crashes next time since the logs are not persisting.

 

unraid.png

Edited by liquidrt
Link to comment

Thanks @JorgeB That showed up!


I am not seeing anything directly before the crashes which looks concerning. Please see below:

 

Apr  2 08:55:22 GNAS root: Reloading Nginx configuration...
Apr  2 21:02:31 GNAS kernel: eth0: renamed from veth69d7509
Apr  2 21:02:34 GNAS kernel: eth0: renamed from veth92c1b6c
Apr  2 22:06:14 GNAS kernel: microcode: microcode updated early to revision 0xf0, date = 2021-11-15
Apr  2 22:06:14 GNAS kernel: Linux version 5.19.17-Unraid (root@Develop) (gcc (GCC) 12.2.0, GNU ld version 2.39-slack151) #2 SMP PREEMPT_DYNAMIC Wed Nov 2 11:54:15 PDT 2022
Apr  2 22:06:14 GNAS kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot




Apr  3 07:25:20 GNAS kernel: eth0: renamed from veth278b16c
Apr  3 08:23:21 GNAS kernel: md: sync done. time=36997sec
Apr  3 08:23:21 GNAS kernel: md: recovery thread: exit status: 0
Apr  3 11:00:24 GNAS kernel: microcode: microcode updated early to revision 0xf0, date = 2021-11-15
Apr  3 11:00:24 GNAS kernel: Linux version 5.19.17-Unraid (root@Develop) (gcc (GCC) 12.2.0, GNU ld version 2.39-slack151) #2 SMP PREEMPT_DYNAMIC Wed Nov 2 11:54:15 PDT 2022
Apr  3 11:00:24 GNAS kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot
Apr  3 11:00:24 GNAS kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'



I do see some panic messages from earlier on and some memory messages

 

Apr  3 01:19:19 GNAS  smbd[2800]: [2023/04/03 01:19:19.395299,  0] ../../source3/smbd/close.c:312(close_remove_share_mode)
Apr  3 01:19:19 GNAS  smbd[2800]:   close_remove_share_mode: Could not get share mode lock for file 2023/phone/rob/PXL_20230402_174218674.jpg.tacitpart
Apr  3 01:19:19 GNAS  smbd[2800]: [2023/04/03 01:19:19.395377,  0] ../../source3/smbd/fd_handle.c:39(fd_handle_destructor)
Apr  3 01:19:19 GNAS  smbd[2800]:   PANIC: assert failed at ../../source3/smbd/fd_handle.c(39): (fh->fd == -1) || (fh->fd == AT_FDCWD)
Apr  3 01:19:19 GNAS  smbd[2800]: [2023/04/03 01:19:19.395387,  0] ../../lib/util/fault.c:173(smb_panic_log)
Apr  3 01:19:19 GNAS  smbd[2800]:   ===============================================================
Apr  3 01:19:19 GNAS  smbd[2800]: [2023/04/03 01:19:19.395399,  0] ../../lib/util/fault.c:174(smb_panic_log)
Apr  3 01:19:19 GNAS  smbd[2800]:   INTERNAL ERROR: assert failed: (fh->fd == -1) || (fh->fd == AT_FDCWD) in pid 2800 (4.17.3)
Apr  3 01:19:19 GNAS  smbd[2800]: [2023/04/03 01:19:19.395406,  0] ../../lib/util/fault.c:178(smb_panic_log)
Apr  3 01:19:19 GNAS  smbd[2800]:   If you are running a recent Samba version, and if you think this problem is not yet fixed in the latest versions, please consider reporting this bug, see https://wiki.samba.org/index.php/Bug_Reporting
Apr  3 01:19:19 GNAS  smbd[2800]: [2023/04/03 01:19:19.395414,  0] ../../lib/util/fault.c:183(smb_panic_log)
Apr  3 01:19:19 GNAS  smbd[2800]:   ===============================================================
Apr  3 01:19:19 GNAS  smbd[2800]: [2023/04/03 01:19:19.395426,  0] ../../lib/util/fault.c:184(smb_panic_log)
Apr  3 01:19:19 GNAS  smbd[2800]:   PANIC (pid 2800): assert failed: (fh->fd == -1) || (fh->fd == AT_FDCWD) in 4.17.3
Apr  3 01:19:19 GNAS  smbd[2800]: [2023/04/03 01:19:19.395669,  0] ../../lib/util/fault.c:292(log_stack_trace)
Apr  3 01:19:19 GNAS  smbd[2800]:   BACKTRACE: 32 stack frames:



Apr  3 00:51:19 GNAS nginx: 2023/04/03 00:51:19 [crit] 2237#2237: ngx_slab_alloc() failed: no memory
Apr  3 00:51:19 GNAS nginx: 2023/04/03 00:51:19 [error] 2237#2237: shpool alloc failed
Apr  3 00:51:19 GNAS nginx: 2023/04/03 00:51:19 [error] 2237#2237: nchan: Out of shared memory while allocating message of size 387. Increase nchan_max_reserved_memory.
Apr  3 00:51:19 GNAS nginx: 2023/04/03 00:51:19 [error] 2237#2237: *82179 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost"
Apr  3 00:51:19 GNAS nginx: 2023/04/03 00:51:19 [error] 2237#2237: MEMSTORE:00: can't create shared message for channel /cpuload
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [crit] 2237#2237: ngx_slab_alloc() failed: no memory
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: shpool alloc failed
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: nchan: Out of shared memory while allocating message of size 11163. Increase nchan_max_reserved_memory.
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: *82180 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/devices?buffer_length=1 HTTP/1.1", host: "localhost"
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: MEMSTORE:00: can't create shared message for channel /devices
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [crit] 2237#2237: ngx_slab_alloc() failed: no memory
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: shpool alloc failed
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: nchan: Out of shared memory while allocating message of size 234. Increase nchan_max_reserved_memory.
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: *82181 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/arraymonitor?buffer_length=1 HTTP/1.1", host: "localhost"
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: MEMSTORE:00: can't create shared message for channel /arraymonitor
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [crit] 2237#2237: ngx_slab_alloc() failed: no memory
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: shpool alloc failed
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: nchan: Out of shared memory while allocating message of size 311. Increase nchan_max_reserved_memory.
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: *82182 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/parity?buffer_length=1 HTTP/1.1", host: "localhost"
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: MEMSTORE:00: can't create shared message for channel /parity
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [crit] 2237#2237: ngx_slab_alloc() failed: no memory
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: shpool alloc failed
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: nchan: Out of shared memory while allocating message of size 492. Increase nchan_max_reserved_memory.
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: *82183 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/diskload?buffer_length=1 HTTP/1.1", host: "localhost"
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: MEMSTORE:00: can't create shared message for channel /diskload
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [crit] 2237#2237: ngx_slab_alloc() failed: no memory
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: shpool alloc failed
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: nchan: Out of shared memory while allocating message of size 3596. Increase nchan_max_reserved_memory.
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: *82184 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/var?buffer_length=1 HTTP/1.1", host: "localhost"
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [error] 2237#2237: MEMSTORE:00: can't create shared message for channel /var
Apr  3 00:51:20 GNAS nginx: 2023/04/03 00:51:20 [crit] 2237#2237: ngx_slab_alloc() failed: no memory

 

 

Edited by liquidrt
Link to comment

Nothing relevant logged before the crashes, this usually suggests a hardware problem, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment

Thanks! I am assuming the culprits for hardware problems are typically RAM? 

This use to be a desktop computer that was running fine for a year and a half or so. The only things added to this server was a generic PCI to Sata card from amazon and some third party RAM. I will start there and see what I can come up with, thank you!

Link to comment
  • 4 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...