Jump to content

pfields

Members
  • Posts

    11
  • Joined

  • Last visited

Posts posted by pfields

  1. I was still suffering from crashes after changing the USB drive.

     

    In the end I changed the Ryzen 7 1700 for a Ryzen 5 2600 and haven't had a crash in over 19 hours. 

     

    Seems like Unraid is still not playing well with first gen Ryzen even with the BIOS options set.

  2. 5 hours ago, XceRpt said:

    a long shot but always worth checking as far as random crashes are concerned. Dbl check your ram speed settings. I ran for nearly a year then suddenly started having crashes. changing the speed to the stock non oc speed of 2133 instead of 3200 stopped the crashes for me.

     

     

     

    I set XMP Profile 1 this morning to 2400Mhz but has just crashed. 

     

    There are always a few lines in Sylog before it also goes dead. 

     

    Dec 15 08:21:04 Tower nmbd[2659]: [2021/12/15 08:21:04.787064,  0] ../../source3/nmbd/nmbd_become_lmb.c:397(become_local_master_stage2)
    Dec 15 08:21:04 Tower nmbd[2659]:   *****
    Dec 15 08:21:04 Tower nmbd[2659]:   
    Dec 15 08:21:04 Tower nmbd[2659]:   Samba name server TOWER is now a local master browser for workgroup WORKGROUP on subnet 172.17.0.1
    Dec 15 08:21:04 Tower nmbd[2659]:   
    Dec 15 08:21:04 Tower nmbd[2659]:   *****
    Dec 15 08:25:54 Tower ntpd[2029]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

     

    I assume this has nothing to do with time settings in unraid because NTP within UNraid shows the right time and I can communicate correctly with time1.google.com. 

  3. The server crashed again after about 1h30, black screen on the physical monitor. Nothing responding. 

     

    Can anyone decipher the following and also why am I getting these Clock Unsynchronized errors? 

     

     

    Quote

    Dec 14 14:02:34 Tower ntpd[2031]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
    Dec 14 15:19:31 Tower kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 9-... 11-... } 62078 jiffies s: 289 root: 0x1/.
    Dec 14 15:19:31 Tower kernel: rcu: blocking rcu_node structures: l=1:0-15:0xa00/.
    Dec 14 15:19:31 Tower kernel: Task dump for CPU 9:
    Dec 14 15:19:31 Tower kernel: task:Plex Media Serv state:R  running task     stack:    0 pid: 4850 ppid:  3545 flags:0x00000328
    Dec 14 15:19:31 Tower kernel: Call Trace:
    Dec 14 15:19:31 Tower kernel: ? smp_call_function_many_cond+0x272/0x285
    Dec 14 15:19:31 Tower kernel: ? smp_call_function_many_cond+0x250/0x285
    Dec 14 15:19:31 Tower kernel: ? flush_tlb_func_common.constprop.0+0xcc/0xcc
    Dec 14 15:19:31 Tower kernel: ? native_flush_tlb_local+0x10/0x17
    Dec 14 15:19:31 Tower kernel: ? __flush_tlb_others+0x5/0x8
    Dec 14 15:19:31 Tower kernel: ? flush_tlb_mm_range+0xba/0xc0
    Dec 14 15:19:31 Tower kernel: ? tlb_flush_mmu_tlbonly+0x6d/0x92
    Dec 14 15:19:31 Tower kernel: ? tlb_flush_mmu+0xc/0x65
    Dec 14 15:19:31 Tower kernel: ? tlb_finish_mmu+0x27/0x54
    Dec 14 15:19:31 Tower kernel: ? madvise_free_single_vma+0x151/0x175
    Dec 14 15:19:31 Tower kernel: ? find_vma+0xe/0x54
    Dec 14 15:19:31 Tower kernel: ? find_vma_prev+0xf/0x3b
    Dec 14 15:19:31 Tower kernel: ? do_madvise+0x578/0x86d
    Dec 14 15:19:31 Tower kernel: ? __seccomp_filter+0x185/0x368
    Dec 14 15:19:31 Tower kernel: ? __x64_sys_madvise+0x21/0x24
    Dec 14 15:19:31 Tower kernel: ? __x64_sys_madvise+0x21/0x24
    Dec 14 15:19:31 Tower kernel: ? do_syscall_64+0x5d/0x6a
    Dec 14 15:19:31 Tower kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Dec 14 15:19:31 Tower kernel: Task dump for CPU 11:
    Dec 14 15:19:31 Tower kernel: task:Plex Media Serv state:R  running task     stack:    0 pid: 4851 ppid:  3545 flags:0x00000328
    Dec 14 15:19:31 Tower kernel: Call Trace:
    Dec 14 15:19:31 Tower kernel: ? tlb_finish_mmu+0x27/0x54
    Dec 14 15:19:31 Tower kernel: ? madvise_free_single_vma+0x151/0x175
    Dec 14 15:19:31 Tower kernel: ? find_vma+0xe/0x54
    Dec 14 15:19:31 Tower kernel: ? find_vma_prev+0xf/0x3b
    Dec 14 15:19:31 Tower kernel: ? do_madvise+0x578/0x86d
    Dec 14 15:19:31 Tower kernel: ? __seccomp_filter+0x185/0x368
    Dec 14 15:19:31 Tower kernel: ? __x64_sys_madvise+0x21/0x24
    Dec 14 15:19:31 Tower kernel: ? __x64_sys_madvise+0x21/0x24
    Dec 14 15:19:31 Tower kernel: ? do_syscall_64+0x5d/0x6a
    Dec 14 15:19:31 Tower kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Dec 14 15:22:31 Tower kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 9-... 11-... } 242302 jiffies s: 289 root: 0x1/.
    Dec 14 15:22:31 Tower kernel: rcu: blocking rcu_node structures: l=1:0-15:0xa00/.
    Dec 14 15:22:31 Tower kernel: Task dump for CPU 9:
    Dec 14 15:22:31 Tower kernel: task:Plex Media Serv state:R  running task     stack:    0 pid: 4850 ppid:  3545 flags:0x00000328
    Dec 14 15:22:31 Tower kernel: Call Trace:
    Dec 14 15:22:31 Tower kernel: ? smp_call_function_many_cond+0x272/0x285
    Dec 14 15:22:31 Tower kernel: ? smp_call_function_many_cond+0x250/0x285
    Dec 14 15:22:31 Tower kernel: ? flush_tlb_func_common.constprop.0+0xcc/0xcc
    Dec 14 15:22:31 Tower kernel: ? native_flush_tlb_local+0x10/0x17
    Dec 14 15:22:31 Tower kernel: ? __flush_tlb_others+0x5/0x8
    Dec 14 15:22:31 Tower kernel: ? flush_tlb_mm_range+0xba/0xc0
    Dec 14 15:22:31 Tower kernel: ? tlb_flush_mmu_tlbonly+0x6d/0x92
    Dec 14 15:22:31 Tower kernel: ? tlb_flush_mmu+0xc/0x65
    Dec 14 15:22:31 Tower kernel: ? tlb_finish_mmu+0x27/0x54
    Dec 14 15:22:31 Tower kernel: ? madvise_free_single_vma+0x151/0x175
    Dec 14 15:22:31 Tower kernel: ? find_vma+0xe/0x54
    Dec 14 15:22:31 Tower kernel: ? find_vma_prev+0xf/0x3b
    Dec 14 15:22:31 Tower kernel: ? do_madvise+0x578/0x86d
    Dec 14 15:22:31 Tower kernel: ? __seccomp_filter+0x185/0x368
    Dec 14 15:22:31 Tower kernel: ? __x64_sys_madvise+0x21/0x24
    Dec 14 15:22:31 Tower kernel: ? __x64_sys_madvise+0x21/0x24
    Dec 14 15:22:31 Tower kernel: ? do_syscall_64+0x5d/0x6a
    Dec 14 15:22:31 Tower kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Dec 14 15:22:31 Tower kernel: Task dump for CPU 11:
    Dec 14 15:22:31 Tower kernel: task:Plex Media Serv state:R  running task     stack:    0 pid: 4851 ppid:  3545 flags:0x00000328
    Dec 14 15:22:31 Tower kernel: Call Trace:
    Dec 14 15:22:31 Tower kernel: ? tlb_finish_mmu+0x27/0x54
    Dec 14 15:22:31 Tower kernel: ? madvise_free_single_vma+0x151/0x175
    Dec 14 15:22:31 Tower kernel: ? find_vma+0xe/0x54
    Dec 14 15:22:31 Tower kernel: ? find_vma_prev+0xf/0x3b
    Dec 14 15:22:31 Tower kernel: ? do_madvise+0x578/0x86d
    Dec 14 15:22:31 Tower kernel: ? __seccomp_filter+0x185/0x368
    Dec 14 15:22:31 Tower kernel: ? __x64_sys_madvise+0x21/0x24
    Dec 14 15:22:31 Tower kernel: ? __x64_sys_madvise+0x21/0x24
    Dec 14 15:22:31 Tower kernel: ? do_syscall_64+0x5d/0x6a
    Dec 14 15:22:31 Tower kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Dec 14 15:25:31 Tower kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 9-... 11-... } 422527 jiffies s: 289 root: 0x1/.
    Dec 14 15:25:31 Tower kernel: rcu: blocking rcu_node structures: l=1:0-15:0xa00/.
    Dec 14 15:25:31 Tower kernel: Task dump for CPU 9:
    Dec 14 15:25:31 Tower kernel: task:Plex Media Serv state:R  running task     stack:    0 pid: 4850 ppid:  3545 flags:0x00000328
    Dec 14 15:25:31 Tower kernel: Call Trace:
    Dec 14 15:25:31 Tower kernel: ? smp_call_function_many_cond+0x26c/0x285
    Dec 14 15:25:31 Tower kernel: ? smp_call_function_many_cond+0x250/0x285
    Dec 14 15:25:31 Tower kernel: ? flush_tlb_func_common.constprop.0+0xcc/0xcc
    Dec 14 15:25:31 Tower kernel: ? native_flush_tlb_local+0x10/0x17
    Dec 14 15:25:31 Tower kernel: ? __flush_tlb_others+0x5/0x8
    Dec 14 15:25:31 Tower kernel: ? flush_tlb_mm_range+0xba/0xc0
    Dec 14 15:25:31 Tower kernel: ? tlb_flush_mmu_tlbonly+0x6d/0x92
    Dec 14 15:25:31 Tower kernel: ? tlb_flush_mmu+0xc/0x65
    Dec 14 15:25:31 Tower kernel: ? tlb_finish_mmu+0x27/0x54
    Dec 14 15:25:31 Tower kernel: ? madvise_free_single_vma+0x151/0x175
    Dec 14 15:25:31 Tower kernel: ? find_vma+0xe/0x54
    Dec 14 15:25:31 Tower kernel: ? find_vma_prev+0xf/0x3b
    Dec 14 15:25:31 Tower kernel: ? do_madvise+0x578/0x86d
    Dec 14 15:25:31 Tower kernel: ? __seccomp_filter+0x185/0x368
    Dec 14 15:25:31 Tower kernel: ? __x64_sys_madvise+0x21/0x24
    Dec 14 15:25:31 Tower kernel: ? __x64_sys_madvise+0x21/0x24
    Dec 14 15:25:31 Tower kernel: ? do_syscall_64+0x5d/0x6a
    Dec 14 15:25:31 Tower kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Dec 14 15:25:31 Tower kernel: Task dump for CPU 11:
    Dec 14 15:25:31 Tower kernel: task:Plex Media Serv state:R  running task     stack:    0 pid: 4851 ppid:  3545 flags:0x00000328
    Dec 14 15:25:31 Tower kernel: Call Trace:
    Dec 14 15:25:31 Tower kernel: ? tlb_finish_mmu+0x27/0x54
    Dec 14 15:25:31 Tower kernel: ? madvise_free_single_vma+0x151/0x175
    Dec 14 15:25:31 Tower kernel: ? find_vma+0xe/0x54
    Dec 14 15:25:31 Tower kernel: ? find_vma_prev+0xf/0x3b
    Dec 14 15:25:31 Tower kernel: ? do_madvise+0x578/0x86d
    Dec 14 15:25:31 Tower kernel: ? __seccomp_filter+0x185/0x368
    Dec 14 15:25:31 Tower kernel: ? __x64_sys_madvise+0x21/0x24
    Dec 14 15:25:31 Tower kernel: ? __x64_sys_madvise+0x21/0x24
    Dec 14 15:25:31 Tower kernel: ? do_syscall_64+0x5d/0x6a
    Dec 14 15:25:31 Tower kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
     

     

  4. Ok i've worked out the issue with Gigabyte for the 'Typical Current Idle', there is two ways to access the option in the BIOS and one way resets and the other sticks. 

     

    The server doesn't seem to crash like before, but I keep getting restarts, I will randomly log in and see the Uptime at 5 mins even though the server has been on for an hour for example. 

     

    The only errors or warnings I can see in the log when I check is the following:

     

    Dec 14 13:36:03 Tower kernel: mce: [Hardware Error]: Machine check events logged
    Dec 14 13:36:03 Tower kernel: mce: [Hardware Error]: CPU 8: Machine Check: 0 Bank 5: bea0000000000108
    Dec 14 13:36:03 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff81064b1e MISC d012000100000000 SYND 4d000000 IPID 500b000000000 
    Dec 14 13:36:03 Tower kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1639488944 SOCKET 0 APIC 1 microcode 8001138
    Dec 14 13:36:03 Tower kernel: floppy0: no floppy controllers found
    Dec 14 13:36:03 Tower kernel: random: 7 urandom warning(s) missed due to ratelimiting
    Dec 14 13:36:03 Tower kernel: ACPI Warning: SystemIO range 0x0000000000000B00-0x0000000000000B08 conflicts with OpRegion 0x0000000000000B00-0x0000000000000B0F (\GSA1.SMBI) (20200925/utaddress-204)
    Dec 14 13:36:07 Tower rpc.statd[1976]: Failed to read /var/lib/nfs/state: Success

     

    What should I do going forward in order to try and diagnose the restarts? 

     

    Cheers

  5. Hmm seems that it might be the same crash as was happening before because I just checked to revert the change and it was set back to Auto. So the BIOS setting is reverting to Auto every time. 

     

    Its a Gigabyte B450 AORUS M (rev. 1.1) if anyone knows any reason as to why this option would continue to revert to Auto. Its not the CMOS as the date and time is being held in memory. 

  6. Well it works to begin with but I have just reproduced the error twice. It works initially, but then becomes unresponsive over the network. It responds to ping but nothing else works/loads. 

     

    The server hasn't crashed as I can still log in on the physical machine. I logged in and then didn't do anything for 10-15 minutes at which point it was unresponsive, the Syslog shows nothing after my successful login. 

  7. I set the Power Supply Idle Control to Typical Current Idle as suggested in the article and rebooted. 

     

    Now the server seems to be in a weird state, its apparently working ok when I check the physical screen and I can ping it but no SSH, no GUI, no docker apps. 

     

    Is there something else I should do with the C-States? 

  8. Hello, 

     

    So first of all I know that it's not random but at the minute I can't find any rhyme or reason for the crashes. The first crash I had I checked the physical screen and it was all black with no life, nothing was working I got it working again with a reboot. This morning I woke up to see a Nginx 500 error when I tried to look at the GUI and had to reboot again.

     

    I took the server to the office this morning to try and diagnose what's happening, I updated the BIOS and restarted. I left it to idle for about an hour and it seemed to have crashed again, no SSH, no GUI, no shares but some text on the screen.

     

    screen-text.thumb.jpg.96028b60d6eccd812df4526c7349a015.jpg

     

    I have attached diagnostics below but they were generated after the reboot. The syslog was being mirrored to flash and only has very little information before the crash. Could the kernel time error be causing this? 

     

    tower-diagnostics-20211207-1127.zip

     

    syslog

     

    Any help would be much appreciated. 

     

    Thanks

×
×
  • Create New...