Frostbite2600 Posted April 7, 2020 Share Posted April 7, 2020 I had this (Among other weirdness with UnRaid) issue with a previous version of UnRaid, so I decided to start fresh and install a new version of UnRaid and only import my data. I rebuilt my docker containers (did not re-use docker.img) and did resuse my libert.img so my VM's came across no issues. After everything was said and done, everything was working great for a couple days. This morning though once again I'm seeing these errors in the syslog which I was also seeing in the logs before I did the reinstall: Apr 7 06:55:06 Unraid-Host nginx: 2020/04/07 06:55:06 [alert] 25415#25415: worker process 4423 exited on signal 11 Apr 7 06:55:07 Unraid-Host kernel: nginx[4424]: segfault at 38 ip 00000000004dc37e sp 00007ffd526fdad0 error 6 in nginx[421000+105000] Apr 7 06:55:07 Unraid-Host kernel: Code: 5c 41 5d 41 5e c3 66 0f 1f 44 00 00 48 8b 7c 24 08 e8 c6 d4 f4 ff 49 89 c6 48 85 c0 0f 88 aa 01 00 00 48 89 ef e8 32 c0 02 00 <4c> 89 70 08 66 0f 1f 44 00 00 b8 01 00 00 00 48 83 c4 10 5b 5d 41 Apr 7 06:55:07 Unraid-Host nginx: 2020/04/07 06:55:07 [alert] 25415#25415: worker process 4424 exited on signal 11 Apr 7 06:55:07 Unraid-Host kernel: nginx[4428]: segfault at 38 ip 00000000004dc37e sp 00007ffd526fdad0 error 6 in nginx[421000+105000] Apr 7 06:55:07 Unraid-Host kernel: Code: 5c 41 5d 41 5e c3 66 0f 1f 44 00 00 48 8b 7c 24 08 e8 c6 d4 f4 ff 49 89 c6 48 85 c0 0f 88 aa 01 00 00 48 89 ef e8 32 c0 02 00 <4c> 89 70 08 66 0f 1f 44 00 00 b8 01 00 00 00 48 83 c4 10 5b 5d 41 Apr 7 06:55:07 Unraid-Host nginx: 2020/04/07 06:55:07 [alert] 25415#25415: worker process 4428 exited on signal 11 Apr 7 06:55:07 Unraid-Host kernel: nginx[4430]: segfault at 38 ip 00000000004dc37e sp 00007ffd526fdad0 error 6 in nginx[421000+105000] Apr 7 06:55:07 Unraid-Host kernel: Code: 5c 41 5d 41 5e c3 66 0f 1f 44 00 00 48 8b 7c 24 08 e8 c6 d4 f4 ff 49 89 c6 48 85 c0 0f 88 aa 01 00 00 48 89 ef e8 32 c0 02 00 <4c> 89 70 08 66 0f 1f 44 00 00 b8 01 00 00 00 48 83 c4 10 5b 5d 41 Apr 7 06:55:07 Unraid-Host emhttpd: error: publish, 244: Connection reset by peer (104): read Apr 7 06:55:07 Unraid-Host nginx: 2020/04/07 06:55:07 [alert] 25415#25415: worker process 4430 exited on signal 11 Apr 7 06:55:07 Unraid-Host kernel: nginx[4431]: segfault at 38 ip 00000000004dc37e sp 00007ffd526fdad0 error 6 in nginx[421000+105000] Apr 7 06:55:07 Unraid-Host kernel: Code: 5c 41 5d 41 5e c3 66 0f 1f 44 00 00 48 8b 7c 24 08 e8 c6 d4 f4 ff 49 89 c6 48 85 c0 0f 88 aa 01 00 00 48 89 ef e8 32 c0 02 00 <4c> 89 70 08 66 0f 1f 44 00 00 b8 01 00 00 00 48 83 c4 10 5b 5d 41 When I see these logs, I'm not able to get a consistent connection to the UI. It'll hang and I have to reload the page multiple times. I then restart Nginx and then will see this is the logs indefinately: Apr 7 06:56:35 Unraid-Host nginx: 2020/04/07 06:56:35 [alert] 5073#5073: worker process 6418 exited on signal 6 Apr 7 06:56:37 Unraid-Host nginx: 2020/04/07 06:56:37 [alert] 5073#5073: worker process 6481 exited on signal 6 Apr 7 06:56:39 Unraid-Host nginx: 2020/04/07 06:56:39 [alert] 5073#5073: worker process 6506 exited on signal 6 Apr 7 06:56:41 Unraid-Host nginx: 2020/04/07 06:56:41 [alert] 5073#5073: worker process 6520 exited on signal 6 Apr 7 06:56:43 Unraid-Host nginx: 2020/04/07 06:56:43 [alert] 5073#5073: worker process 6594 exited on signal 6 Apr 7 06:56:45 Unraid-Host nginx: 2020/04/07 06:56:45 [alert] 5073#5073: worker process 6599 exited on signal 6 Apr 7 06:56:47 Unraid-Host nginx: 2020/04/07 06:56:47 [alert] 5073#5073: worker process 6696 exited on signal 6 Apr 7 06:56:49 Unraid-Host nginx: 2020/04/07 06:56:49 [alert] 5073#5073: worker process 6775 exited on signal 6 Apr 7 06:56:51 Unraid-Host nginx: 2020/04/07 06:56:51 [alert] 5073#5073: worker process 6780 exited on signal 6 This causes my /var/log to fill up very quickly and the UI of my UnRaid server to be sluggish. Does anyone happen to be able to point me in the right direction? Running version 6.8.3. Thanks Quote Link to comment
trurl Posted April 7, 2020 Share Posted April 7, 2020 Go to Tools-diagnostics and attach the complete Diagnostics zip file to your NEXT post. Have you done memtest? Quote Link to comment
Frostbite2600 Posted April 21, 2020 Author Share Posted April 21, 2020 Took me a bit in order to take it offline to perform the Memtest, but I ran memtest twice with 0 issues reported. I waited 2 days before uploading the diagnostics because it seems to cause issues after about 24-48 hour uptime and I have to restart Nginx in order for me to get it to load properly. Thanks! unraid-host-diagnostics-20200420-1950.zip Quote Link to comment
Frostbite2600 Posted June 9, 2020 Author Share Posted June 9, 2020 (edited) So I decided to build a new UnRaid host since I kept having issues with this one, built a new host with a 3700x, 32GB of RAM and a new motherboard. I moved the USB drive over as is since it was a new install from when I rebuilt it 3 months ago and obviously the same disks (3x8TB Data, 2x500GB SSD Cache). After 48 hours I'm getting the same errors in the logs and Nginx is crashing and the UI is extremely sluggish. These are the two logs that are filling up /var/log within hours. Sometimes restarting Nginx fixes everything, other times it requires a full reboot of the Host: root@Unraid-Host:~# tail -n 5 /var/log/nginx/error.log 2020/06/09 09:21:46 [alert] 30084#30084: worker process 26338 exited on signal 6 ker process: ./nchan-1.2.6/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed. 2020/06/09 09:21:47 [alert] 30084#30084: worker process 26341 exited on signal 6 ker process: ./nchan-1.2.6/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed. 2020/06/09 09:21:49 [alert] 30084#30084: worker process 26346 exited on signal 6 root@Unraid-Host:~# tail -n 5 /var/log/syslog Jun 9 09:21:49 Unraid-Host nginx: 2020/06/09 09:21:49 [alert] 30084#30084: worker process 26346 exited on signal 6 Jun 9 09:21:50 Unraid-Host nginx: 2020/06/09 09:21:50 [alert] 30084#30084: worker process 26490 exited on signal 6 Jun 9 09:21:51 Unraid-Host nginx: 2020/06/09 09:21:51 [alert] 30084#30084: worker process 26494 exited on signal 6 Jun 9 09:21:52 Unraid-Host nginx: 2020/06/09 09:21:52 [alert] 30084#30084: worker process 26496 exited on signal 6 Jun 9 09:21:53 Unraid-Host nginx: 2020/06/09 09:21:53 [alert] 30084#30084: worker process 26504 exited on signal 6 root@Unraid-Host:~# I've also been trying to collect new diagnostic file, but I'm unable to do so right now as the UI is just hanging even after restarting Nginx. Will try to collect new ones tonight when I'll reboot it. Edited June 9, 2020 by Frostbite2600 Adding info at the end Quote Link to comment
Frostbite2600 Posted June 15, 2020 Author Share Posted June 15, 2020 I started a couple of SMART errors on one of my 512GB SSD Cache drives, so I went ahead and got 2x1TB SSD and invoked the mover to move all files off of the cache, and then once I replaced it the mover moved everything back to the cache drives. After 48 hours the errors are back and the UI is locking up. I'm unable to get diagnostics at this time since the UI won't allow me to download while these errors are occurring, though it's the same errors spamming the logs as above. Anyone have any suggestions by chance? Quote Link to comment
JorgeB Posted June 15, 2020 Share Posted June 15, 2020 Your are overclocking the RAM considering the CPU and number of DIMMs used, see here, it's known to cause stability issues with Ryzen, since 2 DIMMS are single Rank and the outher dual dual Rank not sure witch speed is the correct, 1866 or 2133, probably 1866, and that's where I would start. Quote Link to comment
Frostbite2600 Posted June 15, 2020 Author Share Posted June 15, 2020 25 minutes ago, johnnie.black said: Your are overclocking the RAM considering the CPU and number of DIMMs used, see here, it's known to cause stability issues with Ryzen, since 2 DIMMS are single Rank and the outher dual dual Rank not sure witch speed is the correct, 1866 or 2133, probably 1866, and that's where I would start. According to the link you showed, 3rd gen Ryzen should be able to handle DDR4-3200 when 2 of 4 slots are being used which is where I'm at, but mine is clocked at the clock settings for the specific RAM that I have (2666): root@Unraid-Host:~# dmidecode --type memory | grep -A 5 "Manufacturer: Kingston" | grep -v Serial Manufacturer: Kingston Asset Tag: Not Specified Part Number: KHX2666C16/16G Rank: 2 Configured Memory Speed: 2667 MT/s -- Manufacturer: Kingston Asset Tag: Not Specified Part Number: KHX2666C16/16G Rank: 2 Configured Memory Speed: 2667 MT/s root@Unraid-Host:~# Are you saying that even though I'm well within acceptable range for the RAM frequency, to try and clock it to be slower to see if that has any result? Quote Link to comment
JorgeB Posted June 15, 2020 Share Posted June 15, 2020 16 minutes ago, Frostbite2600 said: According to the link you showed, 3rd gen Ryzen should be able to handle DDR4-3200 The you posted the wrong diags: Apr 17 18:10:48 Unraid-Host kernel: smpboot: CPU0: AMD Ryzen 7 1800X Eight-Core Processor (family: 0x17, model: 0x1, stepping: 0x1) Quote Link to comment
Frostbite2600 Posted June 15, 2020 Author Share Posted June 15, 2020 Yeah that's the original diags, I can't get new ones to post because the UI won't allow me to download them. After removing everything else from the equation I went out and bought a 3700x and a new motherboard, and the issues are still happening after 2-3 days. On 6/9/2020 at 11:22 AM, Frostbite2600 said: So I decided to build a new UnRaid host since I kept having issues with this one, built a new host with a 3700x, 32GB of RAM and a new motherboard. I moved the USB drive over as is since it was a new install from when I rebuilt it 3 months ago and obviously the same disks (3x8TB Data, 2x500GB SSD Cache). After 48 hours I'm getting the same errors in the logs and Nginx is crashing and the UI is extremely sluggish. Quote Link to comment
JorgeB Posted June 15, 2020 Share Posted June 15, 2020 2 minutes ago, Frostbite2600 said: After removing everything else from the equation I went out and bought a 3700x and a new motherboard, and the issues are still happening after 2-3 days. Sorry, missed that part. Quote Link to comment
trurl Posted June 15, 2020 Share Posted June 15, 2020 3 hours ago, Frostbite2600 said: Yeah that's the original diags, I can't get new ones to post because the UI won't allow me to download them. Can you get diagnostics immediately after booting? That would be better than nothing. Also you might try setting up Syslog Server so you can get syslog from before hang: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=781601 Quote Link to comment
Frostbite2600 Posted June 16, 2020 Author Share Posted June 16, 2020 5 hours ago, trurl said: Can you get diagnostics immediately after booting? That would be better than nothing. Also you might try setting up Syslog Server so you can get syslog from before hang: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=781601 Attached! As for the syslog, the UnRaid server never ceases to respond, I can still SSH into it and all my VM's and Containers are running as expected. It's just the UI that's sluggish and won't let me download the diagnostics bundle, even after restarting Nginx. But I'm still able to access all logs on the UnRaid server and can manually grab them and tar them if needed. Thanks! unraid-host-diagnostics-20200615-2113.zip Quote Link to comment
JonathanM Posted June 16, 2020 Share Posted June 16, 2020 9 hours ago, Frostbite2600 said: But I'm still able to access all logs on the UnRaid server and can manually grab them and tar them if needed. That shouldn't be necessary, next time it acts up type diagnostics at the command line and it should collect the diagnostics zip file. Quote Link to comment
Frostbite2600 Posted June 17, 2020 Author Share Posted June 17, 2020 Was able to get the diagnostics from the CLI as directed after it started acting up. Attached. unraid-host-diagnostics-20200617-1317.zip Quote Link to comment
jonp Posted June 19, 2020 Share Posted June 19, 2020 This is a tricky one. I think we need to go step by step in testing to verify the source of the issue. First, disable your plugins, your VMs, and your Docker Containers and reboot. Let the system just idle with the array started for a while and see if the server remains responsive. Then turn on your Docker containers. If you run into issues, reboot with docker disabled, and start turning containers on one by one. If you can get all containers running, next we move on to VMs. There is just nothing glaring in the logs immediately before all the error messages start showing up, so we need to resort to isolating the Apps and VMs to figure it out. Quote Link to comment
Frostbite2600 Posted June 27, 2020 Author Share Posted June 27, 2020 So I built another temporary UnRaid host to migrate my VM's to. On the existing UnRaid server I stopped everything, disabled VM's, disabled Docker, removed all plugins and rebooted. So it's literally just a NAS doing nothing else. Errors came back after a couple days like clockwork. Diagnostics attached for that. The temporary host that I built, it's my old 1800x and I installed a fresh installation of UnRaid. I then SCP'd the each of the VM's disk image over and created a new VM, pointing to the images so that I wasn't bringing a corrupt libvert or anything over. I also brought a single Docker container over (UniFi for my network) by SCP'ing the appdata folder for the UniFi container and installing it. After a couple days, this host too started throwing the same errors. Interestingly, it stopped after a day or so but I have a feeling it's going to come back. Exact same errors. Both diagnostics are attached. The "Unraid-Host" is my permanent UnRaid host. The "Tower" is the temporary. unraid-host-diagnostics-20200627-1139.zip tower-diagnostics-20200627-1141.zip Quote Link to comment
Frostbite2600 Posted June 27, 2020 Author Share Posted June 27, 2020 Just realized I had one plugin still installed on both of them, the Dynamix System Temp. Not sure if it's related but it is installed on both. Just removed them and will wait a few more days to see if there's anything that comes of it. Quote Link to comment
Frostbite2600 Posted July 2, 2020 Author Share Posted July 2, 2020 Update: It's been 5 days of uptime and the errors haven't come back since disabling the Dynamix System Temp plugin. I've re-enabled the docker containers and Virtual Machines on the Primary Unraid server this morning and so far so good. Not confident to call this "solved" quite yet as I'd like to go a week or so without the errors, but so far it looks promising. Thanks Quote Link to comment
jungle Posted December 7, 2020 Share Posted December 7, 2020 (edited) How has it gone? I’ve got nginx spamming my logs too. After some reading I saw lots of people saying they've had their UI p for unRAID and I've been doing the same. As soon as I closed it the messages stopped.... I will be monitoring. Edited December 7, 2020 by jungle Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.