manofoz Posted January 9 Share Posted January 9 (edited) Hello, Unraid Version 6.12.4 Woke up this morning to a totally frozen server after it had been up and stable for over 40 days. I will be going on vacation tomorrow for a week and was hopeful that the server would survive without me being there to get it out of jams like this. I have collected syslog and diagnostics and would gratefully appreciate any insights. It looks like this is the time when I had to reboot from the second freeze: Quote Jan 9 08:33:52 HaynesTower php-fpm[13589]: [WARNING] [pool www] child 12857 exited on signal 9 (SIGKILL) after 199.092351 seconds from start Jan 9 08:33:53 HaynesTower php-fpm[13589]: [WARNING] [pool www] child 14272 exited on signal 9 (SIGKILL) after 181.265295 seconds from start Jan 9 08:33:54 HaynesTower kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:00:1a.0 Jan 9 08:33:54 HaynesTower kernel: pcieport 0000:00:1a.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Jan 9 08:33:54 HaynesTower kernel: pcieport 0000:00:1a.0: device [8086:7a48] error status/mask=00000001/00002000 Jan 9 08:33:54 HaynesTower kernel: pcieport 0000:00:1a.0: [ 0] RxErr Jan 9 08:34:13 HaynesTower php-fpm[13589]: [WARNING] [pool www] child 27609 exited on signal 9 (SIGKILL) after 16.440253 seconds from start Jan 9 08:35:05 HaynesTower php-fpm[13589]: [WARNING] [pool www] child 27655 exited on signal 9 (SIGKILL) after 21.825540 seconds from start Jan 9 08:36:07 HaynesTower php-fpm[13589]: [WARNING] [pool www] child 28095 exited on signal 9 (SIGKILL) after 72.155603 seconds from start Jan 9 08:36:15 HaynesTower php-fpm[13589]: [WARNING] [pool www] child 28422 exited on signal 9 (SIGKILL) after 66.596950 seconds from start Jan 9 08:36:22 HaynesTower kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:00:1a.0 Jan 9 08:36:22 HaynesTower kernel: pcieport 0000:00:1a.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Jan 9 08:36:22 HaynesTower kernel: pcieport 0000:00:1a.0: device [8086:7a48] error status/mask=00000001/00002000 Jan 9 08:36:22 HaynesTower kernel: pcieport 0000:00:1a.0: [ 0] RxErr Jan 9 08:47:21 HaynesTower kernel: microcode: microcode updated early to revision 0x26, date = 2022-09-19 Jan 9 08:47:21 HaynesTower kernel: Linux version 6.1.49-Unraid (root@Develop-612) (gcc (GCC) 12.2.0, GNU ld version 2.40-slack151) #1 SMP PREEMPT_DYNAMIC Wed Aug 30 09:42:35 PDT 2023 Jan 9 08:47:21 HaynesTower kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot,/bzroot-gui Update - the errors start with this: Quote Jan 9 07:53:42 HaynesTower kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:00:1a.0 I have posted a screen shot with the details of that PCI Bridge but can't really tell what it is. Syslog from flash and diagnostics attached (diagnostics are from right after the second freeze): syslog haynestower-diagnostics-20240109-0856.zip Edited January 9 by manofoz Quote Link to comment
manofoz Posted January 13 Author Share Posted January 13 Survived while I was gone but I'd really appreciate any help diagnosing the freezes. Quote Link to comment
Solution JorgeB Posted January 13 Solution Share Posted January 13 On 1/9/2024 at 2:01 PM, manofoz said: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:00:1a.0 For these first try this: https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/?do=findComment&comment=1165009 On 1/9/2024 at 2:01 PM, manofoz said: Jan 9 08:33:52 HaynesTower php-fpm[13589]: [WARNING] [pool www] child 12857 exited on signal 9 (SIGKILL) after 199.092351 seconds from start For these try booting in safe mode and/or closing any browser windows open to the GUI, only open when you need to use it then close again. Quote Link to comment
manofoz Posted January 14 Author Share Posted January 14 (edited) 18 hours ago, JorgeB said: For these first try this: https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/?do=findComment&comment=1165009 For these try booting in safe mode and/or closing any browser windows open to the GUI, only open when you need to use it then close again. Thanks for the tip. I've added what they mention in that thread to the "Unraid OS" section of System configuration. I didn't see mention of adding it to the other sections but since you mentioned running in safe mode I'm going to add it everywhere. See the attached picture of the config. As for safe mode, I assume you just mean always run in safe mode and never open a chrome / firefox etc tab to the server unless I absolutely need to. I usually have one pinned in chrome actually so I have one open a lot but chrome usually offload's it so it's not active unless I click on it. I like having it available to check out the dashboards but if that causes it to freeze I can just make some other dashboards on things I host off of it. Will reboot now and let the "Unraid OS" config change take effect. I see there is an option to automatically come back in safe mode so I don't need a monitor / keyboard. Thanks! Edited January 14 by manofoz Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.