Jump to content

Up for 40+ days then froze twice in a row - going on vacation tomorrow


Go to solution Solved by JorgeB,

Recommended Posts

Hello,

 

Unraid Version 6.12.4

 

Woke up this morning to a totally frozen server after it had been up and stable for over 40 days. I will be going on vacation tomorrow for a week and was hopeful that the server would survive without me being there to get it out of jams like this. I have collected syslog and diagnostics and would gratefully appreciate any insights.

 

It looks like this is the time when I had to reboot from the second freeze:

Quote

Jan  9 08:33:52 HaynesTower php-fpm[13589]: [WARNING] [pool www] child 12857 exited on signal 9 (SIGKILL) after 199.092351 seconds from start
Jan  9 08:33:53 HaynesTower php-fpm[13589]: [WARNING] [pool www] child 14272 exited on signal 9 (SIGKILL) after 181.265295 seconds from start
Jan  9 08:33:54 HaynesTower kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:00:1a.0
Jan  9 08:33:54 HaynesTower kernel: pcieport 0000:00:1a.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Jan  9 08:33:54 HaynesTower kernel: pcieport 0000:00:1a.0:   device [8086:7a48] error status/mask=00000001/00002000
Jan  9 08:33:54 HaynesTower kernel: pcieport 0000:00:1a.0:    [ 0] RxErr                 
Jan  9 08:34:13 HaynesTower php-fpm[13589]: [WARNING] [pool www] child 27609 exited on signal 9 (SIGKILL) after 16.440253 seconds from start
Jan  9 08:35:05 HaynesTower php-fpm[13589]: [WARNING] [pool www] child 27655 exited on signal 9 (SIGKILL) after 21.825540 seconds from start
Jan  9 08:36:07 HaynesTower php-fpm[13589]: [WARNING] [pool www] child 28095 exited on signal 9 (SIGKILL) after 72.155603 seconds from start
Jan  9 08:36:15 HaynesTower php-fpm[13589]: [WARNING] [pool www] child 28422 exited on signal 9 (SIGKILL) after 66.596950 seconds from start
Jan  9 08:36:22 HaynesTower kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:00:1a.0
Jan  9 08:36:22 HaynesTower kernel: pcieport 0000:00:1a.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Jan  9 08:36:22 HaynesTower kernel: pcieport 0000:00:1a.0:   device [8086:7a48] error status/mask=00000001/00002000
Jan  9 08:36:22 HaynesTower kernel: pcieport 0000:00:1a.0:    [ 0] RxErr                 
Jan  9 08:47:21 HaynesTower kernel: microcode: microcode updated early to revision 0x26, date = 2022-09-19
Jan  9 08:47:21 HaynesTower kernel: Linux version 6.1.49-Unraid (root@Develop-612) (gcc (GCC) 12.2.0, GNU ld version 2.40-slack151) #1 SMP PREEMPT_DYNAMIC Wed Aug 30 09:42:35 PDT 2023
Jan  9 08:47:21 HaynesTower kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot,/bzroot-gui

 

Update - the errors start with this:

 

Quote

Jan  9 07:53:42 HaynesTower kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:00:1a.0

 

I have posted a screen shot with the details of that PCI Bridge but can't really tell what it is.

 

Syslog from flash and diagnostics attached (diagnostics are from right after the second freeze):

syslog haynestower-diagnostics-20240109-0856.zip

image.png

Edited by manofoz
Link to comment
  • Solution
On 1/9/2024 at 2:01 PM, manofoz said:

pcieport 0000:00:1a.0: AER: Corrected error received: 0000:00:1a.0

For these first try this:

https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/?do=findComment&comment=1165009

 

On 1/9/2024 at 2:01 PM, manofoz said:

Jan  9 08:33:52 HaynesTower php-fpm[13589]: [WARNING] [pool www] child 12857 exited on signal 9 (SIGKILL) after 199.092351 seconds from start

For these try booting in safe mode and/or closing any browser windows open to the GUI, only open when you need to use it then close again.

 

Link to comment
18 hours ago, JorgeB said:

For these first try this:

https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/?do=findComment&comment=1165009

 

For these try booting in safe mode and/or closing any browser windows open to the GUI, only open when you need to use it then close again.

 

 

Thanks for the tip. I've added what they mention in that thread to the "Unraid OS" section of System configuration. I didn't see mention of adding it to the other sections but since you mentioned running in safe mode I'm going to add it everywhere. See the attached picture of the config. 

 

As for safe mode, I assume you just mean always run in safe mode and never open a chrome / firefox etc tab to the server unless I absolutely need to. I usually have one pinned in chrome actually so I have one open a lot but chrome usually offload's it so it's not active unless I click on it. I like having it available to check out the dashboards but if that causes it to freeze I can just make some other dashboards on things I host off of it.

 

Will reboot now and let the "Unraid OS" config change take effect. I see there is an option to automatically come back in safe mode so I don't need a monitor / keyboard.

 

Thanks!

 

image.png

Edited by manofoz
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...