Jump to content

Unraid randomly locking up


Recommended Posts

Been running an unRAID server for a few weeks and have had the server randomly lock up several times, usually every few days but have had it happen in as little as 24 hours. I can't detect a pattern to when it is locking up. When the server locks up I can not access the web interface or ssh over the network, and locally I can see the cursor blinking on the login prompt but plugging in a keyboard and attempting to log in has no effect. The system does not shut down via power button so my only option is to do a hard reset.

 

I don't see anything odd in the logs, memory usage seems to be fine, and I have run a memtest with 3 passes with no errors found (though when I hit esc to exit the system did not reboot on its own, not sure if this is normal). I also recently upgraded from the stable branch to 6.4.0_rc19b since some users mentioned on the forums this might help with a similar issue another user reported, but it has continued happening.

 

System specs:

ASRock Rack Motherboard C236 WSI (new)

Intel BX80662I36100T Core i3 6100T (new)

SeaSonic Platinum Series SS-400FL2 Active PFC F3 400W (new)

4 x HGST DeskStar NAS 3.5" 4TB 7200 RPM 128MB Cache (new)

2 x 8gb non ECC ram (recycled from my an Alienware desktop after a ram upgrade, a bit over a year old)

1 x 256gb OCZ Vertex 4 (recycled drive from an old system, probably a few years old,  plan to use as a cache drive but currently is just set as an unassigned device)

 

Running a few SMB shares, no vms, and a few dockers (nzbget, radarr, sonarr, plex, and openvpn-as all by linuxserver).

 

I am somewhat suspecting the ram since it is the oldest part of the system (apart from the currently unused SSD), I want to buy some ECC memory anyway, but was avoiding it for now and I just want to rule out other issues before tossing it.

 

Have just over a week left on my unRAID trial and would love to have a an actual working system before I purchase a license.

FCPsyslog_tail.txt

panoptes-diagnostics-20180107-0740.zip

Edited by netpro2k
Link to comment

Still under return window for a bi, but I would need a non mini-itx case for the mobos you suggested (got the current case for free from a friend), so id also like to prove it's the culprit before going through all the cost and effort of buying a new case and full transplanting the system. I haw generally read favorable things about that board (and asrock in general), so do you mean specifically with regards to unraid compatibility? Any tips for how I can debug this issue further?

Link to comment
  • 2 weeks later...
  • 2 weeks later...

Any update on your situation? I am having the exact same problem. I'm 80% through my first memtest pass with no errors. I don't expect there to be any. I don't see any similarities in our builds other than the same OS version. Let us know if the rc19b fixed your problem

 

My specs:

X9DRI-F Motherboard
2x E5-2670 v2 2.5ghz 10-Core

LSI 9207-8i Controller

16x 4gb PC3-10600R

2x 1200w PSU

USB boot drive is a 32GB Samsung MUF-32BB/AM

Silverstone PCIe adapter running the NVMe cache drive

Roswell PCIe to USB 3 expansion card.

 

Link to comment

If you install the fix common problems plugin and put it in troubleshooting mode, it can save logs so that when you have to do a hard reset they are not lost. Salandor I would suggest removing the the Roswell card just to see if that might have something to do with it.

 

Also both of you check your CPU fans to make sure they are in good working order and monitor CPU temps.

 

Netpro2k how old is your power supply?

Link to comment

Ill try to remove the USB expansion card. I'm running a supermicro CSE-846A-R1200B Chassis, so my CPU radiators & memory are cooled passively by the 2 80mm PWM fans and an air shroud. The freezing generally happens when I just leave it alone while migrating data to the array. Tried both local transfer using USB and network transfer. CPU temps at the time of transfer are likely as they were when I walked away, they never get above 37C while just writing to the array.

Edited by Salandor
Link to comment

The AsRock Rack was also on my list since its the only available M-ITX, but i have read some complaints about it also on other forums.

Freenas users have also experienced lockups with the Rack board, one of the reasons for me to pickup a M-ATX supermicro board.
Unraid is rocksolid on that motherboard also running on a C236 chipset and i have seen all releases since December of 6.4RC.

 

Still feels like a memory issue to me, but also seen reports on freenas about cpu resets.

Have u tried the 2.3 Bios? Maybe that resolves some issues?

Link to comment

Been running for 8 days since upgrading 6.4.0 release version, but I can't help but feel that was a coincidence. I also put the system on a UPS but again would be pretty surprised if that helped stability due to the way it was crashing (login screen with blinking cursor was always visible on the onboard video, just unresponsive). If it does lock up again the fix common problems log has been running since boot, so there should be quite a nice log.

 

@ashman70 power supply is new, purchased at the same time as the mobo and CPU.

Link to comment

Doubtful, ram has yet to go over 5% usage while copying files. I have all dockers except Krusader turned off,  the server was literally doing nothing but writing data to the array. Well it’s been all night now and it’s still copying. I’m becoming more confident that it has something to do with the cpu scaling. I read that haswell and ivy bridge xeons had a problem with the intel Pstate driver and properly scaling in and out of power save.

Edited by Salandor
Link to comment
2 hours ago, Salandor said:

I read that haswell and ivy bridge xeons had a problem with the intel Pstate driver and properly scaling in and out of power save.

 You might want to read through the first page of the 6.4.1-rc1 release.  There are some posts that discuss the Intel Meltdown/Spectre patch that is in the 6.4.0 that caused problems with certain Intel processors.  One of the changes in 6.4.1-rc1 was the removal of that patch.  No guarantee that this would address your problem but it another factor in the bigger picture.  

Link to comment
  • 2 weeks later...

Had several weeks of uptime with no issue but looks like the system has locked up again. I was running FCP troubleshooting mode so I have a very long log of the whole time. I don't notice anything at the end of the log or in the last zip though I do see some interesting kernel messages appear a few times which I hadn't seen in logs from previous crashes... is this some sort of memory issue?

 

If I look at the memory.txt from the zip captured about 10 minutes before this log message I see that there is quite a small amount of free memory

              total        used        free      shared  buff/cache   available
Mem:            15G        2.3G        823M        611M         12G         11G
Swap:            0B          0B          0B
Total:          15G        2.3G        823M

and zip after

              total        used        free      shared  buff/cache   available
Mem:            15G        2.2G        3.2G        611M         10G         12G
Swap:            0B          0B          0B
Total:          15G        2.2G        3.2G

It's almost entirely all cache, which I assume can be immediately evicted if needed. Is this an issue of memory reserved by cache not being cleared fast enough and then having no swap to spill over into? If so is there something I have miss-configured? I assume 16gb of ram should be enough to run the few docker containers I am running (I have never seen the non-cach ram usage go above like 3gb). I suspect this might be a red herring though since the last zip before the crash show over 9gb free even after cache.

 

Snippet of the log here, full log attached as well as the zips from before and after this message and the last one before the crash

Jan 26 04:28:30 Panoptes kernel: emhttpd: page allocation stalls for 14959ms, order:1, mode:0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null)
Jan 26 04:28:30 Panoptes kernel: emhttpd cpuset=/ mems_allowed=0
Jan 26 04:28:30 Panoptes kernel: CPU: 0 PID: 12457 Comm: emhttpd Not tainted 4.14.13-unRAID #1
Jan 26 04:28:30 Panoptes kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C236 WSI, BIOS P1.00 10/21/2015
Jan 26 04:28:30 Panoptes kernel: Call Trace:
Jan 26 04:28:30 Panoptes kernel: dump_stack+0x5d/0x79
Jan 26 04:28:30 Panoptes kernel: warn_alloc+0xdf/0x160
Jan 26 04:28:30 Panoptes kernel: ? wakeup_kswapd+0x2c/0xb2
Jan 26 04:28:30 Panoptes kernel: __alloc_pages_nodemask+0x578/0xb03
Jan 26 04:28:30 Panoptes kernel: ? get_mem_cgroup_from_mm+0x82/0x88
Jan 26 04:28:30 Panoptes kernel: ? memcg_kmem_get_cache+0x55/0x16a
Jan 26 04:28:30 Panoptes kernel: __get_free_pages+0x5/0x32
Jan 26 04:28:30 Panoptes kernel: pgd_alloc+0x14/0xf5
Jan 26 04:28:30 Panoptes kernel: mm_init+0x168/0x213
Jan 26 04:28:30 Panoptes kernel: copy_process.part.4+0xa4f/0x1767
Jan 26 04:28:30 Panoptes kernel: _do_fork+0xaf/0x290
Jan 26 04:28:30 Panoptes kernel: ? __set_current_blocked+0x38/0x50
Jan 26 04:28:30 Panoptes kernel: do_syscall_64+0x5b/0xf8
Jan 26 04:28:30 Panoptes kernel: entry_SYSCALL64_slow_path+0x25/0x25
Jan 26 04:28:30 Panoptes kernel: RIP: 0033:0x1524df2a57cb
Jan 26 04:28:30 Panoptes kernel: RSP: 002b:00001524de640c10 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
Jan 26 04:28:30 Panoptes kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00001524df2a57cb
Jan 26 04:28:30 Panoptes kernel: RDX: 00001524de640c2c RSI: 0000000000000000 RDI: 0000000000100011
Jan 26 04:28:30 Panoptes kernel: RBP: 00001524de640c70 R08: 00001524df63a5c0 R09: 00001524c8006760
Jan 26 04:28:30 Panoptes kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 00001524c8006760
Jan 26 04:28:30 Panoptes kernel: R13: 00007ffd4b819e10 R14: 0000000000000000 R15: 0000000000000000
Jan 26 04:28:30 Panoptes kernel: Mem-Info:
Jan 26 04:28:30 Panoptes kernel: active_anon:635451 inactive_anon:24193 isolated_anon:0
Jan 26 04:28:30 Panoptes kernel: active_file:112792 inactive_file:3067951 isolated_file:0
Jan 26 04:28:30 Panoptes kernel: unevictable:0 dirty:290750 writeback:1536 unstable:0
Jan 26 04:28:30 Panoptes kernel: slab_reclaimable:95081 slab_unreclaimable:25153
Jan 26 04:28:30 Panoptes kernel: mapped:46503 shmem:156522 pagetables:5276 bounce:0
Jan 26 04:28:30 Panoptes kernel: free:52317 free_pcp:4 free_cma:0
Jan 26 04:28:30 Panoptes kernel: Node 0 active_anon:2541804kB inactive_anon:96772kB active_file:451168kB inactive_file:12271804kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:186012kB dirty:1163000kB writeback:6144kB shmem:626088kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1398784kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Jan 26 04:28:30 Panoptes kernel: Node 0 DMA free:15892kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB managed:15892kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 26 04:28:30 Panoptes kernel: lowmem_reserve[]: 0 1730 15628 15628
Jan 26 04:28:30 Panoptes kernel: Node 0 DMA32 free:70184kB min:14944kB low:18680kB high:22416kB active_anon:279908kB inactive_anon:16kB active_file:16656kB inactive_file:1482552kB unevictable:0kB writepending:129416kB present:1934320kB managed:1920952kB mlocked:0kB kernel_stack:80kB pagetables:204kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 26 04:28:30 Panoptes kernel: lowmem_reserve[]: 0 0 13898 13898
Jan 26 04:28:30 Panoptes kernel: Node 0 Normal free:123192kB min:120084kB low:150104kB high:180124kB active_anon:2261896kB inactive_anon:96756kB active_file:434512kB inactive_file:10789252kB unevictable:0kB writepending:1039728kB present:14491648kB managed:14232784kB mlocked:0kB kernel_stack:8944kB pagetables:20900kB bounce:0kB free_pcp:16kB local_pcp:0kB free_cma:0kB
Jan 26 04:28:30 Panoptes kernel: lowmem_reserve[]: 0 0 0 0
Jan 26 04:28:30 Panoptes kernel: Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15892kB
Jan 26 04:28:30 Panoptes kernel: Node 0 DMA32: 525*4kB (UME) 348*8kB (UME) 359*16kB (UME) 255*32kB (UME) 133*64kB (UME) 88*128kB (UE) 12*256kB (U) 10*512kB (UM) 15*1024kB (UM) 4*2048kB (M) 0*4096kB = 70308kB
Jan 26 04:28:30 Panoptes kernel: Node 0 Normal: 3878*4kB (UME) 2068*8kB (UMH) 2193*16kB (UMEH) 744*32kB (UME) 228*64kB (UME) 26*128kB (UM) 3*256kB (UM) 1*512kB (U) 11*1024kB (UME) 1*2048kB (H) 0*4096kB = 123464kB
Jan 26 04:28:30 Panoptes kernel: 3337265 total pagecache pages
Jan 26 04:28:30 Panoptes kernel: 0 pages in swap cache
Jan 26 04:28:30 Panoptes kernel: Swap cache stats: add 0, delete 0, find 0/0
Jan 26 04:28:30 Panoptes kernel: Free swap  = 0kB
Jan 26 04:28:30 Panoptes kernel: Total swap = 0kB
Jan 26 04:28:30 Panoptes kernel: 4110486 pages RAM
Jan 26 04:28:30 Panoptes kernel: 0 pages HighMem/MovableOnly
Jan 26 04:28:30 Panoptes kernel: 68079 pages reserved
Jan 26 04:28:30 Panoptes kernel: 0 pages cma reserved
Jan 26 04:28:46 Panoptes kernel: diskload: page allocation stalls for 26370ms, order:1, mode:0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null)
Jan 26 04:28:46 Panoptes kernel: diskload cpuset=/ mems_allowed=0
Jan 26 04:28:46 Panoptes kernel: CPU: 2 PID: 8707 Comm: diskload Not tainted 4.14.13-unRAID #1
Jan 26 04:28:46 Panoptes kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C236 WSI, BIOS P1.00 10/21/2015
Jan 26 04:28:46 Panoptes kernel: Call Trace:
Jan 26 04:28:46 Panoptes kernel: dump_stack+0x5d/0x79
Jan 26 04:28:46 Panoptes kernel: warn_alloc+0xdf/0x160
Jan 26 04:28:46 Panoptes kernel: ? wakeup_kswapd+0x2c/0xb2
Jan 26 04:28:46 Panoptes kernel: __alloc_pages_nodemask+0x578/0xb03
Jan 26 04:28:46 Panoptes kernel: ? get_mem_cgroup_from_mm+0x82/0x88
Jan 26 04:28:46 Panoptes kernel: ? memcg_kmem_get_cache+0x55/0x16a
Jan 26 04:28:46 Panoptes kernel: __get_free_pages+0x5/0x32
Jan 26 04:28:46 Panoptes kernel: pgd_alloc+0x14/0xf5
Jan 26 04:28:46 Panoptes kernel: mm_init+0x168/0x213
Jan 26 04:28:46 Panoptes kernel: copy_process.part.4+0xa4f/0x1767
Jan 26 04:28:46 Panoptes kernel: ? kmem_cache_alloc+0xde/0xea
Jan 26 04:28:46 Panoptes kernel: ? get_empty_filp+0x9f/0x157
Jan 26 04:28:46 Panoptes kernel: _do_fork+0xaf/0x290
Jan 26 04:28:46 Panoptes kernel: ? __set_current_blocked+0x38/0x50
Jan 26 04:28:46 Panoptes kernel: do_syscall_64+0x5b/0xf8
Jan 26 04:28:46 Panoptes kernel: entry_SYSCALL64_slow_path+0x25/0x25
Jan 26 04:28:46 Panoptes kernel: RIP: 0033:0x14a1e424d39c
Jan 26 04:28:46 Panoptes kernel: RSP: 002b:00007ffdf3033100 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
Jan 26 04:28:46 Panoptes kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000014a1e424d39c
Jan 26 04:28:46 Panoptes kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
Jan 26 04:28:46 Panoptes kernel: RBP: 00007ffdf3033140 R08: 000014a1e4b94740 R09: 00007ffdf3033170
Jan 26 04:28:46 Panoptes kernel: R10: 000014a1e4b94a10 R11: 0000000000000246 R12: 0000000000000000
Jan 26 04:28:46 Panoptes kernel: R13: 00007ffdf30331f0 R14: 0000000000000000 R15: 00007ffdf30334f4
Jan 26 04:28:46 Panoptes kernel: Mem-Info:
Jan 26 04:28:46 Panoptes kernel: active_anon:633831 inactive_anon:24192 isolated_anon:0
Jan 26 04:28:46 Panoptes kernel: active_file:112808 inactive_file:3070305 isolated_file:0
Jan 26 04:28:46 Panoptes kernel: unevictable:0 dirty:229128 writeback:4330 unstable:0
Jan 26 04:28:46 Panoptes kernel: slab_reclaimable:95009 slab_unreclaimable:25179
Jan 26 04:28:46 Panoptes kernel: mapped:46468 shmem:156522 pagetables:5239 bounce:0
Jan 26 04:28:46 Panoptes kernel: free:51826 free_pcp:0 free_cma:0
Jan 26 04:28:46 Panoptes kernel: Node 0 active_anon:2535324kB inactive_anon:96768kB active_file:451232kB inactive_file:12281220kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:185872kB dirty:916512kB writeback:17320kB shmem:626088kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1396736kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Jan 26 04:28:46 Panoptes kernel: Node 0 DMA free:15892kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB managed:15892kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 26 04:28:46 Panoptes kernel: lowmem_reserve[]: 0 1730 15628 15628
Jan 26 04:28:46 Panoptes kernel: Node 0 DMA32 free:69940kB min:14944kB low:18680kB high:22416kB active_anon:277764kB inactive_anon:16kB active_file:16656kB inactive_file:1485080kB unevictable:0kB writepending:121148kB present:1934320kB managed:1920952kB mlocked:0kB kernel_stack:80kB pagetables:204kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 26 04:28:46 Panoptes kernel: lowmem_reserve[]: 0 0 13898 13898
Jan 26 04:28:46 Panoptes kernel: Node 0 Normal free:121472kB min:120084kB low:150104kB high:180124kB active_anon:2257560kB inactive_anon:96752kB active_file:434576kB inactive_file:10796140kB unevictable:0kB writepending:812684kB present:14491648kB managed:14232784kB mlocked:0kB kernel_stack:8880kB pagetables:20752kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 26 04:28:46 Panoptes kernel: lowmem_reserve[]: 0 0 0 0
Jan 26 04:28:46 Panoptes kernel: Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15892kB
Jan 26 04:28:46 Panoptes kernel: Node 0 DMA32: 531*4kB (UME) 351*8kB (UME) 359*16kB (UME) 232*32kB (UME) 126*64kB (UME) 88*128kB (UE) 13*256kB (UM) 7*512kB (UM) 15*1024kB (UM) 5*2048kB (M) 0*4096kB = 69940kB
Jan 26 04:28:46 Panoptes kernel: Node 0 Normal: 3861*4kB (UME) 2071*8kB (UMEH) 2069*16kB (UMEH) 752*32kB (UME) 228*64kB (UME) 27*128kB (UMH) 4*256kB (UMH) 2*512kB (UH) 12*1024kB (UMEH) 0*2048kB 0*4096kB = 121564kB
Jan 26 04:28:46 Panoptes kernel: 3339639 total pagecache pages
Jan 26 04:28:46 Panoptes kernel: 0 pages in swap cache
Jan 26 04:28:46 Panoptes kernel: Swap cache stats: add 0, delete 0, find 0/0
Jan 26 04:28:46 Panoptes kernel: Free swap  = 0kB
Jan 26 04:28:46 Panoptes kernel: Total swap = 0kB
Jan 26 04:28:46 Panoptes kernel: 4110486 pages RAM
Jan 26 04:28:46 Panoptes kernel: 0 pages HighMem/MovableOnly
Jan 26 04:28:46 Panoptes kernel: 68079 pages reserved
Jan 26 04:28:46 Panoptes kernel: 0 pages cma reserved
Jan 26 04:28:46 Panoptes kernel: diskload: page allocation stalls for 26371ms, order:1, mode:0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null)
Jan 26 04:28:46 Panoptes kernel: diskload cpuset=/ mems_allowed=0
Jan 26 04:28:46 Panoptes kernel: CPU: 2 PID: 8707 Comm: diskload Not tainted 4.14.13-unRAID #1
Jan 26 04:28:46 Panoptes kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C236 WSI, BIOS P1.00 10/21/2015
Jan 26 04:28:46 Panoptes kernel: Call Trace:
Jan 26 04:28:46 Panoptes kernel: dump_stack+0x5d/0x79
Jan 26 04:28:46 Panoptes kernel: warn_alloc+0xdf/0x160
Jan 26 04:28:46 Panoptes kernel: ? wakeup_kswapd+0x2c/0xb2
Jan 26 04:28:46 Panoptes kernel: __alloc_pages_nodemask+0x578/0xb03
Jan 26 04:28:46 Panoptes kernel: ? get_mem_cgroup_from_mm+0x82/0x88
Jan 26 04:28:46 Panoptes kernel: ? memcg_kmem_get_cache+0x55/0x16a
Jan 26 04:28:46 Panoptes kernel: __get_free_pages+0x5/0x32
Jan 26 04:28:46 Panoptes kernel: pgd_alloc+0x14/0xf5
Jan 26 04:28:46 Panoptes kernel: mm_init+0x168/0x213
Jan 26 04:28:46 Panoptes kernel: copy_process.part.4+0xa4f/0x1767
Jan 26 04:28:46 Panoptes kernel: ? kmem_cache_alloc+0xde/0xea
Jan 26 04:28:46 Panoptes kernel: ? get_empty_filp+0x9f/0x157
Jan 26 04:28:46 Panoptes kernel: _do_fork+0xaf/0x290
Jan 26 04:28:46 Panoptes kernel: ? __set_current_blocked+0x38/0x50
Jan 26 04:28:46 Panoptes kernel: do_syscall_64+0x5b/0xf8
Jan 26 04:28:46 Panoptes kernel: entry_SYSCALL64_slow_path+0x25/0x25
Jan 26 04:28:46 Panoptes kernel: RIP: 0033:0x14a1e424d39c
Jan 26 04:28:46 Panoptes kernel: RSP: 002b:00007ffdf3033100 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
Jan 26 04:28:46 Panoptes kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000014a1e424d39c
Jan 26 04:28:46 Panoptes kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
Jan 26 04:28:46 Panoptes kernel: RBP: 00007ffdf3033140 R08: 000014a1e4b94740 R09: 00007ffdf3033170
Jan 26 04:28:46 Panoptes kernel: R10: 000014a1e4b94a10 R11: 0000000000000246 R12: 0000000000000000
Jan 26 04:28:46 Panoptes kernel: R13: 00007ffdf30331f0 R14: 0000000000000000 R15: 00007ffdf30334f4
Jan 26 04:28:46 Panoptes kernel: cache_dirs: page allocation stalls for 25156ms, order:1, mode:0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null)
Jan 26 04:28:46 Panoptes kernel: cache_dirs cpuset=/ mems_allowed=0
Jan 26 04:28:46 Panoptes kernel: CPU: 2 PID: 8884 Comm: cache_dirs Not tainted 4.14.13-unRAID #1
Jan 26 04:28:46 Panoptes kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C236 WSI, BIOS P1.00 10/21/2015
Jan 26 04:28:46 Panoptes kernel: Call Trace:
Jan 26 04:28:46 Panoptes kernel: dump_stack+0x5d/0x79
Jan 26 04:28:46 Panoptes kernel: warn_alloc+0xdf/0x160
Jan 26 04:28:46 Panoptes kernel: ? wakeup_kswapd+0x2c/0xb2
Jan 26 04:28:46 Panoptes kernel: __alloc_pages_nodemask+0x578/0xb03
Jan 26 04:28:46 Panoptes kernel: ? get_mem_cgroup_from_mm+0x82/0x88
Jan 26 04:28:46 Panoptes kernel: ? memcg_kmem_get_cache+0x55/0x16a
Jan 26 04:28:46 Panoptes kernel: __get_free_pages+0x5/0x32
Jan 26 04:28:46 Panoptes kernel: pgd_alloc+0x14/0xf5
Jan 26 04:28:46 Panoptes kernel: mm_init+0x168/0x213
Jan 26 04:28:46 Panoptes kernel: copy_process.part.4+0xa4f/0x1767
Jan 26 04:28:46 Panoptes kernel: ? kmem_cache_alloc+0xde/0xea
Jan 26 04:28:46 Panoptes kernel: ? get_empty_filp+0x9f/0x157
Jan 26 04:28:46 Panoptes kernel: _do_fork+0xaf/0x290
Jan 26 04:28:46 Panoptes kernel: ? __set_current_blocked+0x38/0x50
Jan 26 04:28:46 Panoptes kernel: do_syscall_64+0x5b/0xf8
Jan 26 04:28:46 Panoptes kernel: entry_SYSCALL64_slow_path+0x25/0x25
Jan 26 04:28:46 Panoptes kernel: RIP: 0033:0x153b9970a39c
Jan 26 04:28:46 Panoptes kernel: RSP: 002b:00007ffc21b41330 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
Jan 26 04:28:46 Panoptes kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000153b9970a39c
Jan 26 04:28:46 Panoptes kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
Jan 26 04:28:46 Panoptes kernel: RBP: 00007ffc21b41370 R08: 0000153b9a051740 R09: 00007ffc21b413a0
Jan 26 04:28:46 Panoptes kernel: R10: 0000153b9a051a10 R11: 0000000000000246 R12: 0000000000000000
Jan 26 04:28:46 Panoptes kernel: R13: 00007ffc21b41420 R14: 0000000000000000 R15: 00007ffc21b41724

FCPsyslog_tail.txt

panoptes-diagnostics-20180126-0415.zip

panoptes-diagnostics-20180126-0447.zip

panoptes-diagnostics-20180208-0843.zip

Edited by netpro2k
Link to comment

Not sure whether relevant, but I built a few using a gigabyte x170 motherboard.  Had the stability problem. Especially in one that was running a vm with small business server.  It would go for a week, then nothing. Lights were on but nobody home.  All hardware was new and tested ok.

 

The problem was a bug in the e3 xeon processors.

The fix was a firmware update for the motherboard.

 

After the fix, rock solid.  Lost functionality in a couple of the pcie slots though.

 

Edited by Jessie
Link to comment
On 2/11/2018 at 11:46 AM, Salandor said:

Crashed again, I attached my FCPsyslog. Can someone provide some insight here?

 

FCPsyslog_tail.txt

 

Your syslog is full of errors related to dimm 0 and dimm 1:

Feb 11 14:27:35 Maximus kernel: EDAC MC0: 465 CE memory read error on CPU_SrcID#0_Ha#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x16d6c6 offset:0x40 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:2 rank:0)
Feb 11 14:27:36 Maximus kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
Feb 11 14:27:36 Maximus kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: cc007fc000010090
Feb 11 14:27:36 Maximus kernel: EDAC sbridge MC0: TSC 0 
Feb 11 14:27:36 Maximus kernel: EDAC sbridge MC0: ADDR 16e608040 
Feb 11 14:27:36 Maximus kernel: EDAC sbridge MC0: MISC 2050204486 
Feb 11 14:27:36 Maximus kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1518377256 SOCKET 0 APIC 0

Feb 11 14:27:36 Maximus kernel: EDAC MC0: 511 CE memory read error on CPU_SrcID#0_Ha#0_Chan#1_DIMM#1 (channel:1 slot:1 page:0x16e608 offset:0x40 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:2 rank:4)
Feb 11 14:27:38 Maximus kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
Feb 11 14:27:38 Maximus kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: cc00ee0000010090
Feb 11 14:27:38 Maximus kernel: EDAC sbridge MC0: TSC 0 
Feb 11 14:27:38 Maximus kernel: EDAC sbridge MC0: ADDR 16dff0200 
Feb 11 14:27:38 Maximus kernel: EDAC sbridge MC0: MISC 2040681a86 
Feb 11 14:27:38 Maximus kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1518377258 SOCKET 0 APIC 0

 

Looks like a hardware problem to me. I'd start by removing and reseating the dimms, maybe you'll get lucky. Next step is probably to replace them, unless someone else has ideas.
 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...