netpro2k Posted January 7, 2018 Share Posted January 7, 2018 (edited) Been running an unRAID server for a few weeks and have had the server randomly lock up several times, usually every few days but have had it happen in as little as 24 hours. I can't detect a pattern to when it is locking up. When the server locks up I can not access the web interface or ssh over the network, and locally I can see the cursor blinking on the login prompt but plugging in a keyboard and attempting to log in has no effect. The system does not shut down via power button so my only option is to do a hard reset. I don't see anything odd in the logs, memory usage seems to be fine, and I have run a memtest with 3 passes with no errors found (though when I hit esc to exit the system did not reboot on its own, not sure if this is normal). I also recently upgraded from the stable branch to 6.4.0_rc19b since some users mentioned on the forums this might help with a similar issue another user reported, but it has continued happening. System specs: ASRock Rack Motherboard C236 WSI (new) Intel BX80662I36100T Core i3 6100T (new) SeaSonic Platinum Series SS-400FL2 Active PFC F3 400W (new) 4 x HGST DeskStar NAS 3.5" 4TB 7200 RPM 128MB Cache (new) 2 x 8gb non ECC ram (recycled from my an Alienware desktop after a ram upgrade, a bit over a year old) 1 x 256gb OCZ Vertex 4 (recycled drive from an old system, probably a few years old, plan to use as a cache drive but currently is just set as an unassigned device) Running a few SMB shares, no vms, and a few dockers (nzbget, radarr, sonarr, plex, and openvpn-as all by linuxserver). I am somewhat suspecting the ram since it is the oldest part of the system (apart from the currently unused SSD), I want to buy some ECC memory anyway, but was avoiding it for now and I just want to rule out other issues before tossing it. Have just over a week left on my unRAID trial and would love to have a an actual working system before I purchase a license. FCPsyslog_tail.txt panoptes-diagnostics-20180107-0740.zip Edited January 7, 2018 by netpro2k Quote Link to comment
HellDiverUK Posted January 7, 2018 Share Posted January 7, 2018 AsRock. If this is still in your return window, swap it for the similar MSI C236A (or C236M if you prefer uATX). Or perhaps Supermicro. I've seen people posting about that AsRock board before, and they're never posts of praise. Quote Link to comment
netpro2k Posted January 8, 2018 Author Share Posted January 8, 2018 Still under return window for a bi, but I would need a non mini-itx case for the mobos you suggested (got the current case for free from a friend), so id also like to prove it's the culprit before going through all the cost and effort of buying a new case and full transplanting the system. I haw generally read favorable things about that board (and asrock in general), so do you mean specifically with regards to unraid compatibility? Any tips for how I can debug this issue further? Quote Link to comment
netpro2k Posted January 11, 2018 Author Share Posted January 11, 2018 Hmm, ran another memtest overnight and found the system had locked up overnight though it did not output any errors, ordered some new ECC ram to see if this fixes the issue. Quote Link to comment
netpro2k Posted January 19, 2018 Author Share Posted January 19, 2018 Swapped the ram for a new 16gb stick of ECC ram, though things were fixed, had a good 5 days of uptime, but looks like the system just locked up again ... Any suggestions on further debug steps? Quote Link to comment
netpro2k Posted January 20, 2018 Author Share Posted January 20, 2018 Upgraded to 6.4.0 from 6.4.0_rc19b and registered unraid (my trial just expired)... Monitoring again... Quote Link to comment
Salandor Posted January 28, 2018 Share Posted January 28, 2018 Any update on your situation? I am having the exact same problem. I'm 80% through my first memtest pass with no errors. I don't expect there to be any. I don't see any similarities in our builds other than the same OS version. Let us know if the rc19b fixed your problem My specs: X9DRI-F Motherboard 2x E5-2670 v2 2.5ghz 10-Core LSI 9207-8i Controller 16x 4gb PC3-10600R 2x 1200w PSU USB boot drive is a 32GB Samsung MUF-32BB/AM Silverstone PCIe adapter running the NVMe cache drive Roswell PCIe to USB 3 expansion card. Quote Link to comment
ashman70 Posted January 28, 2018 Share Posted January 28, 2018 If you install the fix common problems plugin and put it in troubleshooting mode, it can save logs so that when you have to do a hard reset they are not lost. Salandor I would suggest removing the the Roswell card just to see if that might have something to do with it. Also both of you check your CPU fans to make sure they are in good working order and monitor CPU temps. Netpro2k how old is your power supply? Quote Link to comment
Salandor Posted January 28, 2018 Share Posted January 28, 2018 (edited) Ill try to remove the USB expansion card. I'm running a supermicro CSE-846A-R1200B Chassis, so my CPU radiators & memory are cooled passively by the 2 80mm PWM fans and an air shroud. The freezing generally happens when I just leave it alone while migrating data to the array. Tried both local transfer using USB and network transfer. CPU temps at the time of transfer are likely as they were when I walked away, they never get above 37C while just writing to the array. Edited January 28, 2018 by Salandor Quote Link to comment
SiNtEnEl Posted January 28, 2018 Share Posted January 28, 2018 The AsRock Rack was also on my list since its the only available M-ITX, but i have read some complaints about it also on other forums. Freenas users have also experienced lockups with the Rack board, one of the reasons for me to pickup a M-ATX supermicro board. Unraid is rocksolid on that motherboard also running on a C236 chipset and i have seen all releases since December of 6.4RC. Still feels like a memory issue to me, but also seen reports on freenas about cpu resets. Have u tried the 2.3 Bios? Maybe that resolves some issues? Quote Link to comment
netpro2k Posted January 28, 2018 Author Share Posted January 28, 2018 Been running for 8 days since upgrading 6.4.0 release version, but I can't help but feel that was a coincidence. I also put the system on a UPS but again would be pretty surprised if that helped stability due to the way it was crashing (login screen with blinking cursor was always visible on the onboard video, just unresponsive). If it does lock up again the fix common problems log has been running since boot, so there should be quite a nice log. @ashman70 power supply is new, purchased at the same time as the mobo and CPU. Quote Link to comment
Salandor Posted January 28, 2018 Share Posted January 28, 2018 Well, I crashed again during a file copy. I've attached the logs from troubleshooting. I have no idea what I'm looking for in these logs, maybe someone here does. syslog.txt FCPsyslog_tail.txt Quote Link to comment
Frank1940 Posted January 29, 2018 Share Posted January 29, 2018 It looks like these files are after the reboot. The files that are needed are the one before the crash. IF they are not on the flash drive, after the next crash, pull the Flash Drive before rebooting, stick in a PC, and copy the files off. Quote Link to comment
Salandor Posted January 29, 2018 Share Posted January 29, 2018 Well, I don't want to get too excited yet, but I changed my CPU Frequency scaling from power saving to performance and I have not froze in over 5 hours while migrating data. Quote Link to comment
NewDisplayName Posted January 29, 2018 Share Posted January 29, 2018 (edited) Could it happen that your RAM is full? if your ram runs full, unraid wont respond. Edited January 29, 2018 by nuhll Quote Link to comment
Salandor Posted January 29, 2018 Share Posted January 29, 2018 (edited) Doubtful, ram has yet to go over 5% usage while copying files. I have all dockers except Krusader turned off, the server was literally doing nothing but writing data to the array. Well it’s been all night now and it’s still copying. I’m becoming more confident that it has something to do with the cpu scaling. I read that haswell and ivy bridge xeons had a problem with the intel Pstate driver and properly scaling in and out of power save. Edited January 29, 2018 by Salandor Quote Link to comment
Frank1940 Posted January 29, 2018 Share Posted January 29, 2018 2 hours ago, Salandor said: I read that haswell and ivy bridge xeons had a problem with the intel Pstate driver and properly scaling in and out of power save. You might want to read through the first page of the 6.4.1-rc1 release. There are some posts that discuss the Intel Meltdown/Spectre patch that is in the 6.4.0 that caused problems with certain Intel processors. One of the changes in 6.4.1-rc1 was the removal of that patch. No guarantee that this would address your problem but it another factor in the bigger picture. Quote Link to comment
netpro2k Posted February 10, 2018 Author Share Posted February 10, 2018 (edited) Had several weeks of uptime with no issue but looks like the system has locked up again. I was running FCP troubleshooting mode so I have a very long log of the whole time. I don't notice anything at the end of the log or in the last zip though I do see some interesting kernel messages appear a few times which I hadn't seen in logs from previous crashes... is this some sort of memory issue? If I look at the memory.txt from the zip captured about 10 minutes before this log message I see that there is quite a small amount of free memory total used free shared buff/cache available Mem: 15G 2.3G 823M 611M 12G 11G Swap: 0B 0B 0B Total: 15G 2.3G 823M and zip after total used free shared buff/cache available Mem: 15G 2.2G 3.2G 611M 10G 12G Swap: 0B 0B 0B Total: 15G 2.2G 3.2G It's almost entirely all cache, which I assume can be immediately evicted if needed. Is this an issue of memory reserved by cache not being cleared fast enough and then having no swap to spill over into? If so is there something I have miss-configured? I assume 16gb of ram should be enough to run the few docker containers I am running (I have never seen the non-cach ram usage go above like 3gb). I suspect this might be a red herring though since the last zip before the crash show over 9gb free even after cache. Snippet of the log here, full log attached as well as the zips from before and after this message and the last one before the crash Jan 26 04:28:30 Panoptes kernel: emhttpd: page allocation stalls for 14959ms, order:1, mode:0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null) Jan 26 04:28:30 Panoptes kernel: emhttpd cpuset=/ mems_allowed=0 Jan 26 04:28:30 Panoptes kernel: CPU: 0 PID: 12457 Comm: emhttpd Not tainted 4.14.13-unRAID #1 Jan 26 04:28:30 Panoptes kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C236 WSI, BIOS P1.00 10/21/2015 Jan 26 04:28:30 Panoptes kernel: Call Trace: Jan 26 04:28:30 Panoptes kernel: dump_stack+0x5d/0x79 Jan 26 04:28:30 Panoptes kernel: warn_alloc+0xdf/0x160 Jan 26 04:28:30 Panoptes kernel: ? wakeup_kswapd+0x2c/0xb2 Jan 26 04:28:30 Panoptes kernel: __alloc_pages_nodemask+0x578/0xb03 Jan 26 04:28:30 Panoptes kernel: ? get_mem_cgroup_from_mm+0x82/0x88 Jan 26 04:28:30 Panoptes kernel: ? memcg_kmem_get_cache+0x55/0x16a Jan 26 04:28:30 Panoptes kernel: __get_free_pages+0x5/0x32 Jan 26 04:28:30 Panoptes kernel: pgd_alloc+0x14/0xf5 Jan 26 04:28:30 Panoptes kernel: mm_init+0x168/0x213 Jan 26 04:28:30 Panoptes kernel: copy_process.part.4+0xa4f/0x1767 Jan 26 04:28:30 Panoptes kernel: _do_fork+0xaf/0x290 Jan 26 04:28:30 Panoptes kernel: ? __set_current_blocked+0x38/0x50 Jan 26 04:28:30 Panoptes kernel: do_syscall_64+0x5b/0xf8 Jan 26 04:28:30 Panoptes kernel: entry_SYSCALL64_slow_path+0x25/0x25 Jan 26 04:28:30 Panoptes kernel: RIP: 0033:0x1524df2a57cb Jan 26 04:28:30 Panoptes kernel: RSP: 002b:00001524de640c10 EFLAGS: 00000246 ORIG_RAX: 0000000000000038 Jan 26 04:28:30 Panoptes kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00001524df2a57cb Jan 26 04:28:30 Panoptes kernel: RDX: 00001524de640c2c RSI: 0000000000000000 RDI: 0000000000100011 Jan 26 04:28:30 Panoptes kernel: RBP: 00001524de640c70 R08: 00001524df63a5c0 R09: 00001524c8006760 Jan 26 04:28:30 Panoptes kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 00001524c8006760 Jan 26 04:28:30 Panoptes kernel: R13: 00007ffd4b819e10 R14: 0000000000000000 R15: 0000000000000000 Jan 26 04:28:30 Panoptes kernel: Mem-Info: Jan 26 04:28:30 Panoptes kernel: active_anon:635451 inactive_anon:24193 isolated_anon:0 Jan 26 04:28:30 Panoptes kernel: active_file:112792 inactive_file:3067951 isolated_file:0 Jan 26 04:28:30 Panoptes kernel: unevictable:0 dirty:290750 writeback:1536 unstable:0 Jan 26 04:28:30 Panoptes kernel: slab_reclaimable:95081 slab_unreclaimable:25153 Jan 26 04:28:30 Panoptes kernel: mapped:46503 shmem:156522 pagetables:5276 bounce:0 Jan 26 04:28:30 Panoptes kernel: free:52317 free_pcp:4 free_cma:0 Jan 26 04:28:30 Panoptes kernel: Node 0 active_anon:2541804kB inactive_anon:96772kB active_file:451168kB inactive_file:12271804kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:186012kB dirty:1163000kB writeback:6144kB shmem:626088kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1398784kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Jan 26 04:28:30 Panoptes kernel: Node 0 DMA free:15892kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB managed:15892kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Jan 26 04:28:30 Panoptes kernel: lowmem_reserve[]: 0 1730 15628 15628 Jan 26 04:28:30 Panoptes kernel: Node 0 DMA32 free:70184kB min:14944kB low:18680kB high:22416kB active_anon:279908kB inactive_anon:16kB active_file:16656kB inactive_file:1482552kB unevictable:0kB writepending:129416kB present:1934320kB managed:1920952kB mlocked:0kB kernel_stack:80kB pagetables:204kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Jan 26 04:28:30 Panoptes kernel: lowmem_reserve[]: 0 0 13898 13898 Jan 26 04:28:30 Panoptes kernel: Node 0 Normal free:123192kB min:120084kB low:150104kB high:180124kB active_anon:2261896kB inactive_anon:96756kB active_file:434512kB inactive_file:10789252kB unevictable:0kB writepending:1039728kB present:14491648kB managed:14232784kB mlocked:0kB kernel_stack:8944kB pagetables:20900kB bounce:0kB free_pcp:16kB local_pcp:0kB free_cma:0kB Jan 26 04:28:30 Panoptes kernel: lowmem_reserve[]: 0 0 0 0 Jan 26 04:28:30 Panoptes kernel: Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15892kB Jan 26 04:28:30 Panoptes kernel: Node 0 DMA32: 525*4kB (UME) 348*8kB (UME) 359*16kB (UME) 255*32kB (UME) 133*64kB (UME) 88*128kB (UE) 12*256kB (U) 10*512kB (UM) 15*1024kB (UM) 4*2048kB (M) 0*4096kB = 70308kB Jan 26 04:28:30 Panoptes kernel: Node 0 Normal: 3878*4kB (UME) 2068*8kB (UMH) 2193*16kB (UMEH) 744*32kB (UME) 228*64kB (UME) 26*128kB (UM) 3*256kB (UM) 1*512kB (U) 11*1024kB (UME) 1*2048kB (H) 0*4096kB = 123464kB Jan 26 04:28:30 Panoptes kernel: 3337265 total pagecache pages Jan 26 04:28:30 Panoptes kernel: 0 pages in swap cache Jan 26 04:28:30 Panoptes kernel: Swap cache stats: add 0, delete 0, find 0/0 Jan 26 04:28:30 Panoptes kernel: Free swap = 0kB Jan 26 04:28:30 Panoptes kernel: Total swap = 0kB Jan 26 04:28:30 Panoptes kernel: 4110486 pages RAM Jan 26 04:28:30 Panoptes kernel: 0 pages HighMem/MovableOnly Jan 26 04:28:30 Panoptes kernel: 68079 pages reserved Jan 26 04:28:30 Panoptes kernel: 0 pages cma reserved Jan 26 04:28:46 Panoptes kernel: diskload: page allocation stalls for 26370ms, order:1, mode:0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null) Jan 26 04:28:46 Panoptes kernel: diskload cpuset=/ mems_allowed=0 Jan 26 04:28:46 Panoptes kernel: CPU: 2 PID: 8707 Comm: diskload Not tainted 4.14.13-unRAID #1 Jan 26 04:28:46 Panoptes kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C236 WSI, BIOS P1.00 10/21/2015 Jan 26 04:28:46 Panoptes kernel: Call Trace: Jan 26 04:28:46 Panoptes kernel: dump_stack+0x5d/0x79 Jan 26 04:28:46 Panoptes kernel: warn_alloc+0xdf/0x160 Jan 26 04:28:46 Panoptes kernel: ? wakeup_kswapd+0x2c/0xb2 Jan 26 04:28:46 Panoptes kernel: __alloc_pages_nodemask+0x578/0xb03 Jan 26 04:28:46 Panoptes kernel: ? get_mem_cgroup_from_mm+0x82/0x88 Jan 26 04:28:46 Panoptes kernel: ? memcg_kmem_get_cache+0x55/0x16a Jan 26 04:28:46 Panoptes kernel: __get_free_pages+0x5/0x32 Jan 26 04:28:46 Panoptes kernel: pgd_alloc+0x14/0xf5 Jan 26 04:28:46 Panoptes kernel: mm_init+0x168/0x213 Jan 26 04:28:46 Panoptes kernel: copy_process.part.4+0xa4f/0x1767 Jan 26 04:28:46 Panoptes kernel: ? kmem_cache_alloc+0xde/0xea Jan 26 04:28:46 Panoptes kernel: ? get_empty_filp+0x9f/0x157 Jan 26 04:28:46 Panoptes kernel: _do_fork+0xaf/0x290 Jan 26 04:28:46 Panoptes kernel: ? __set_current_blocked+0x38/0x50 Jan 26 04:28:46 Panoptes kernel: do_syscall_64+0x5b/0xf8 Jan 26 04:28:46 Panoptes kernel: entry_SYSCALL64_slow_path+0x25/0x25 Jan 26 04:28:46 Panoptes kernel: RIP: 0033:0x14a1e424d39c Jan 26 04:28:46 Panoptes kernel: RSP: 002b:00007ffdf3033100 EFLAGS: 00000246 ORIG_RAX: 0000000000000038 Jan 26 04:28:46 Panoptes kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000014a1e424d39c Jan 26 04:28:46 Panoptes kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011 Jan 26 04:28:46 Panoptes kernel: RBP: 00007ffdf3033140 R08: 000014a1e4b94740 R09: 00007ffdf3033170 Jan 26 04:28:46 Panoptes kernel: R10: 000014a1e4b94a10 R11: 0000000000000246 R12: 0000000000000000 Jan 26 04:28:46 Panoptes kernel: R13: 00007ffdf30331f0 R14: 0000000000000000 R15: 00007ffdf30334f4 Jan 26 04:28:46 Panoptes kernel: Mem-Info: Jan 26 04:28:46 Panoptes kernel: active_anon:633831 inactive_anon:24192 isolated_anon:0 Jan 26 04:28:46 Panoptes kernel: active_file:112808 inactive_file:3070305 isolated_file:0 Jan 26 04:28:46 Panoptes kernel: unevictable:0 dirty:229128 writeback:4330 unstable:0 Jan 26 04:28:46 Panoptes kernel: slab_reclaimable:95009 slab_unreclaimable:25179 Jan 26 04:28:46 Panoptes kernel: mapped:46468 shmem:156522 pagetables:5239 bounce:0 Jan 26 04:28:46 Panoptes kernel: free:51826 free_pcp:0 free_cma:0 Jan 26 04:28:46 Panoptes kernel: Node 0 active_anon:2535324kB inactive_anon:96768kB active_file:451232kB inactive_file:12281220kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:185872kB dirty:916512kB writeback:17320kB shmem:626088kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1396736kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Jan 26 04:28:46 Panoptes kernel: Node 0 DMA free:15892kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB managed:15892kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Jan 26 04:28:46 Panoptes kernel: lowmem_reserve[]: 0 1730 15628 15628 Jan 26 04:28:46 Panoptes kernel: Node 0 DMA32 free:69940kB min:14944kB low:18680kB high:22416kB active_anon:277764kB inactive_anon:16kB active_file:16656kB inactive_file:1485080kB unevictable:0kB writepending:121148kB present:1934320kB managed:1920952kB mlocked:0kB kernel_stack:80kB pagetables:204kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Jan 26 04:28:46 Panoptes kernel: lowmem_reserve[]: 0 0 13898 13898 Jan 26 04:28:46 Panoptes kernel: Node 0 Normal free:121472kB min:120084kB low:150104kB high:180124kB active_anon:2257560kB inactive_anon:96752kB active_file:434576kB inactive_file:10796140kB unevictable:0kB writepending:812684kB present:14491648kB managed:14232784kB mlocked:0kB kernel_stack:8880kB pagetables:20752kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Jan 26 04:28:46 Panoptes kernel: lowmem_reserve[]: 0 0 0 0 Jan 26 04:28:46 Panoptes kernel: Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15892kB Jan 26 04:28:46 Panoptes kernel: Node 0 DMA32: 531*4kB (UME) 351*8kB (UME) 359*16kB (UME) 232*32kB (UME) 126*64kB (UME) 88*128kB (UE) 13*256kB (UM) 7*512kB (UM) 15*1024kB (UM) 5*2048kB (M) 0*4096kB = 69940kB Jan 26 04:28:46 Panoptes kernel: Node 0 Normal: 3861*4kB (UME) 2071*8kB (UMEH) 2069*16kB (UMEH) 752*32kB (UME) 228*64kB (UME) 27*128kB (UMH) 4*256kB (UMH) 2*512kB (UH) 12*1024kB (UMEH) 0*2048kB 0*4096kB = 121564kB Jan 26 04:28:46 Panoptes kernel: 3339639 total pagecache pages Jan 26 04:28:46 Panoptes kernel: 0 pages in swap cache Jan 26 04:28:46 Panoptes kernel: Swap cache stats: add 0, delete 0, find 0/0 Jan 26 04:28:46 Panoptes kernel: Free swap = 0kB Jan 26 04:28:46 Panoptes kernel: Total swap = 0kB Jan 26 04:28:46 Panoptes kernel: 4110486 pages RAM Jan 26 04:28:46 Panoptes kernel: 0 pages HighMem/MovableOnly Jan 26 04:28:46 Panoptes kernel: 68079 pages reserved Jan 26 04:28:46 Panoptes kernel: 0 pages cma reserved Jan 26 04:28:46 Panoptes kernel: diskload: page allocation stalls for 26371ms, order:1, mode:0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null) Jan 26 04:28:46 Panoptes kernel: diskload cpuset=/ mems_allowed=0 Jan 26 04:28:46 Panoptes kernel: CPU: 2 PID: 8707 Comm: diskload Not tainted 4.14.13-unRAID #1 Jan 26 04:28:46 Panoptes kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C236 WSI, BIOS P1.00 10/21/2015 Jan 26 04:28:46 Panoptes kernel: Call Trace: Jan 26 04:28:46 Panoptes kernel: dump_stack+0x5d/0x79 Jan 26 04:28:46 Panoptes kernel: warn_alloc+0xdf/0x160 Jan 26 04:28:46 Panoptes kernel: ? wakeup_kswapd+0x2c/0xb2 Jan 26 04:28:46 Panoptes kernel: __alloc_pages_nodemask+0x578/0xb03 Jan 26 04:28:46 Panoptes kernel: ? get_mem_cgroup_from_mm+0x82/0x88 Jan 26 04:28:46 Panoptes kernel: ? memcg_kmem_get_cache+0x55/0x16a Jan 26 04:28:46 Panoptes kernel: __get_free_pages+0x5/0x32 Jan 26 04:28:46 Panoptes kernel: pgd_alloc+0x14/0xf5 Jan 26 04:28:46 Panoptes kernel: mm_init+0x168/0x213 Jan 26 04:28:46 Panoptes kernel: copy_process.part.4+0xa4f/0x1767 Jan 26 04:28:46 Panoptes kernel: ? kmem_cache_alloc+0xde/0xea Jan 26 04:28:46 Panoptes kernel: ? get_empty_filp+0x9f/0x157 Jan 26 04:28:46 Panoptes kernel: _do_fork+0xaf/0x290 Jan 26 04:28:46 Panoptes kernel: ? __set_current_blocked+0x38/0x50 Jan 26 04:28:46 Panoptes kernel: do_syscall_64+0x5b/0xf8 Jan 26 04:28:46 Panoptes kernel: entry_SYSCALL64_slow_path+0x25/0x25 Jan 26 04:28:46 Panoptes kernel: RIP: 0033:0x14a1e424d39c Jan 26 04:28:46 Panoptes kernel: RSP: 002b:00007ffdf3033100 EFLAGS: 00000246 ORIG_RAX: 0000000000000038 Jan 26 04:28:46 Panoptes kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000014a1e424d39c Jan 26 04:28:46 Panoptes kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011 Jan 26 04:28:46 Panoptes kernel: RBP: 00007ffdf3033140 R08: 000014a1e4b94740 R09: 00007ffdf3033170 Jan 26 04:28:46 Panoptes kernel: R10: 000014a1e4b94a10 R11: 0000000000000246 R12: 0000000000000000 Jan 26 04:28:46 Panoptes kernel: R13: 00007ffdf30331f0 R14: 0000000000000000 R15: 00007ffdf30334f4 Jan 26 04:28:46 Panoptes kernel: cache_dirs: page allocation stalls for 25156ms, order:1, mode:0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null) Jan 26 04:28:46 Panoptes kernel: cache_dirs cpuset=/ mems_allowed=0 Jan 26 04:28:46 Panoptes kernel: CPU: 2 PID: 8884 Comm: cache_dirs Not tainted 4.14.13-unRAID #1 Jan 26 04:28:46 Panoptes kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C236 WSI, BIOS P1.00 10/21/2015 Jan 26 04:28:46 Panoptes kernel: Call Trace: Jan 26 04:28:46 Panoptes kernel: dump_stack+0x5d/0x79 Jan 26 04:28:46 Panoptes kernel: warn_alloc+0xdf/0x160 Jan 26 04:28:46 Panoptes kernel: ? wakeup_kswapd+0x2c/0xb2 Jan 26 04:28:46 Panoptes kernel: __alloc_pages_nodemask+0x578/0xb03 Jan 26 04:28:46 Panoptes kernel: ? get_mem_cgroup_from_mm+0x82/0x88 Jan 26 04:28:46 Panoptes kernel: ? memcg_kmem_get_cache+0x55/0x16a Jan 26 04:28:46 Panoptes kernel: __get_free_pages+0x5/0x32 Jan 26 04:28:46 Panoptes kernel: pgd_alloc+0x14/0xf5 Jan 26 04:28:46 Panoptes kernel: mm_init+0x168/0x213 Jan 26 04:28:46 Panoptes kernel: copy_process.part.4+0xa4f/0x1767 Jan 26 04:28:46 Panoptes kernel: ? kmem_cache_alloc+0xde/0xea Jan 26 04:28:46 Panoptes kernel: ? get_empty_filp+0x9f/0x157 Jan 26 04:28:46 Panoptes kernel: _do_fork+0xaf/0x290 Jan 26 04:28:46 Panoptes kernel: ? __set_current_blocked+0x38/0x50 Jan 26 04:28:46 Panoptes kernel: do_syscall_64+0x5b/0xf8 Jan 26 04:28:46 Panoptes kernel: entry_SYSCALL64_slow_path+0x25/0x25 Jan 26 04:28:46 Panoptes kernel: RIP: 0033:0x153b9970a39c Jan 26 04:28:46 Panoptes kernel: RSP: 002b:00007ffc21b41330 EFLAGS: 00000246 ORIG_RAX: 0000000000000038 Jan 26 04:28:46 Panoptes kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000153b9970a39c Jan 26 04:28:46 Panoptes kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011 Jan 26 04:28:46 Panoptes kernel: RBP: 00007ffc21b41370 R08: 0000153b9a051740 R09: 00007ffc21b413a0 Jan 26 04:28:46 Panoptes kernel: R10: 0000153b9a051a10 R11: 0000000000000246 R12: 0000000000000000 Jan 26 04:28:46 Panoptes kernel: R13: 00007ffc21b41420 R14: 0000000000000000 R15: 00007ffc21b41724 FCPsyslog_tail.txt panoptes-diagnostics-20180126-0415.zip panoptes-diagnostics-20180126-0447.zip panoptes-diagnostics-20180208-0843.zip Edited February 10, 2018 by netpro2k Quote Link to comment
Jessie Posted February 10, 2018 Share Posted February 10, 2018 (edited) Not sure whether relevant, but I built a few using a gigabyte x170 motherboard. Had the stability problem. Especially in one that was running a vm with small business server. It would go for a week, then nothing. Lights were on but nobody home. All hardware was new and tested ok. The problem was a bug in the e3 xeon processors. The fix was a firmware update for the motherboard. After the fix, rock solid. Lost functionality in a couple of the pcie slots though. Edited February 10, 2018 by Jessie Quote Link to comment
Salandor Posted February 11, 2018 Share Posted February 11, 2018 Somehow, I'm back on stable version 6.4.1 which includes the spectre patch.. and now I just froze again. It's obvious now that its that patch making my server freeze. Since the Update OS plugin wont let me install a "next" release candidate now, does anyone have any suggestions? Quote Link to comment
PeteB Posted February 11, 2018 Share Posted February 11, 2018 I've had exactly the same symptoms but only once (I'm on 6.4.1)I have troubleshooting mode turned on now but it's been stable for a while. Never had any issues with 6.3.5.Sent from my SM-N920I using Tapatalk Quote Link to comment
Salandor Posted February 11, 2018 Share Posted February 11, 2018 (edited) Crashed again, I attached my FCPsyslog. Can someone provide some insight here? FCPsyslog_tail.txt Edited February 11, 2018 by Salandor Quote Link to comment
netpro2k Posted February 16, 2018 Author Share Posted February 16, 2018 Seems to be back to locking up every few days now ... Guess that 2 week uptime was a fluke... Quote Link to comment
ljm42 Posted February 17, 2018 Share Posted February 17, 2018 On 2/11/2018 at 11:46 AM, Salandor said: Crashed again, I attached my FCPsyslog. Can someone provide some insight here? FCPsyslog_tail.txt Your syslog is full of errors related to dimm 0 and dimm 1: Feb 11 14:27:35 Maximus kernel: EDAC MC0: 465 CE memory read error on CPU_SrcID#0_Ha#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x16d6c6 offset:0x40 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:2 rank:0) Feb 11 14:27:36 Maximus kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR Feb 11 14:27:36 Maximus kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: cc007fc000010090 Feb 11 14:27:36 Maximus kernel: EDAC sbridge MC0: TSC 0 Feb 11 14:27:36 Maximus kernel: EDAC sbridge MC0: ADDR 16e608040 Feb 11 14:27:36 Maximus kernel: EDAC sbridge MC0: MISC 2050204486 Feb 11 14:27:36 Maximus kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1518377256 SOCKET 0 APIC 0 Feb 11 14:27:36 Maximus kernel: EDAC MC0: 511 CE memory read error on CPU_SrcID#0_Ha#0_Chan#1_DIMM#1 (channel:1 slot:1 page:0x16e608 offset:0x40 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:2 rank:4) Feb 11 14:27:38 Maximus kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR Feb 11 14:27:38 Maximus kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: cc00ee0000010090 Feb 11 14:27:38 Maximus kernel: EDAC sbridge MC0: TSC 0 Feb 11 14:27:38 Maximus kernel: EDAC sbridge MC0: ADDR 16dff0200 Feb 11 14:27:38 Maximus kernel: EDAC sbridge MC0: MISC 2040681a86 Feb 11 14:27:38 Maximus kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1518377258 SOCKET 0 APIC 0 Looks like a hardware problem to me. I'd start by removing and reseating the dimms, maybe you'll get lucky. Next step is probably to replace them, unless someone else has ideas. Quote Link to comment
Salandor Posted February 17, 2018 Share Posted February 17, 2018 memtest with 5 passes shows the dimms are fine. I re-seated CPU 1 and the errors are all gone. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.