Jump to content

Unraid Crashes almost every day


Recommended Posts

Hello,

I have been using Uraid for the past year and it has been a pleasure.  However this month or so it started crashing almost every couple of days, I would restart hoping it was a transient thing.  Unfortunately it keeps coming back, today it happened twice.  As a first step I disabled the VM manager to see if that had anything to do with it.  

 

Below is the log dump during the time of the crash, I would appreciate it if anyone could point me in the right direction in solving this.

 

//////////////////

 

Oct  5 14:36:16 unraid1 webGUI: Successful login user root from 192.168.1.174

Oct  5 14:45:09 unraid1 kernel: docker0: port 6(veth6499aa1) entered blocking state

Oct  5 14:45:09 unraid1 kernel: docker0: port 6(veth6499aa1) entered disabled state

Oct  5 14:45:09 unraid1 kernel: device veth6499aa1 entered promiscuous mode

Oct  5 14:45:11 unraid1 kernel: eth0: renamed from vethf05059e

Oct  5 14:45:11 unraid1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth6499aa1: link becomes ready

Oct  5 14:45:11 unraid1 kernel: docker0: port 6(veth6499aa1) entered blocking state

Oct  5 14:45:11 unraid1 kernel: docker0: port 6(veth6499aa1) entered forwarding state

Oct  5 14:45:13 unraid1  avahi-daemon[26200]: Joining mDNS multicast group on interface veth6499aa1.IPv6 with address fe80::740e:a9ff:fef8:370b.

Oct  5 14:45:13 unraid1  avahi-daemon[26200]: New relevant interface veth6499aa1.IPv6 for mDNS.

Oct  5 14:45:13 unraid1  avahi-daemon[26200]: Registering new address record for fe80::740e:a9ff:fef8:370b on veth6499aa1.*.

Oct  5 15:12:15 unraid1 kernel: BUG: unable to handle page fault for address: ffff88883f37ce90

Oct  5 15:12:15 unraid1 kernel: #PF: supervisor read access in kernel mode

Oct  5 15:12:15 unraid1 kernel: #PF: error_code(0x0000) - not-present page

Oct  5 15:12:15 unraid1 kernel: PGD 2801067 P4D 2801067 PUD 2804067 PMD 0

Oct  5 15:12:15 unraid1 kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI

Oct  5 15:12:15 unraid1 kernel: CPU: 10 PID: 17042 Comm: sh Tainted: G S                5.19.9-Unraid #1

Oct  5 15:12:15 unraid1 kernel: Hardware name: System manufacturer System Product Name/TUF GAMING X570-PLUS (WI-FI), BIOS 3402 01/13/2021

Oct  5 15:12:15 unraid1 kernel: RIP: 0010:__free_one_page+0x1f6/0x285

Oct  5 15:12:15 unraid1 kernel: Code: ef 48 c1 e7 06 4c 01 f7 e8 8f e8 ff ff 4c 8b 44 24 08 48 85 c0 75 98 49 6b d7 68 48 c1 e3 04 48 8d 84 13 00 01 00 00 48 01 d3 <49> 8b 8c 1c 00 01 00 00 4c 01 e0 4c 89 41 08 49 89 4e 08 49 89 46

Oct  5 15:12:15 unraid1 kernel: RSP: 0018:ffffc9000234fc40 EFLAGS: 00010002

Oct  5 15:12:15 unraid1 kernel: RAX: 0000000010000110 RBX: 0000000010000010 RCX: 00000000f0000080

Oct  5 15:12:15 unraid1 kernel: RDX: 0000000000000000 RSI: 00000000004b6e7e RDI: 00000000004b6e7c

Oct  5 15:12:15 unraid1 kernel: RBP: 0000000000000000 R08: ffffea0012db9f88 R09: 0000000000000000

Oct  5 15:12:15 unraid1 kernel: R10: ffff8880c479e540 R11: 0000000000000000 R12: ffff88882f37cd80

Oct  5 15:12:15 unraid1 kernel: R13: 00000000004b6e7e R14: ffffea0012db9f80 R15: 0000000000000000

Oct  5 15:12:15 unraid1 kernel: FS:  0000000000000000(0000) GS:ffff88880ea80000(0000) knlGS:0000000000000000

Oct  5 15:12:15 unraid1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

Oct  5 15:12:15 unraid1 kernel: CR2: ffff88883f37ce90 CR3: 000000025533a000 CR4: 0000000000350ee0

Oct  5 15:12:15 unraid1 kernel: Call Trace:

Oct  5 15:12:15 unraid1 kernel: <TASK>

Oct  5 15:12:15 unraid1 kernel: free_pcppages_bulk+0x158/0x1d2

Oct  5 15:12:15 unraid1 kernel: free_unref_page+0x8d/0xa9

Oct  5 15:12:15 unraid1 kernel: __mmdrop+0x4b/0x104

Oct  5 15:12:15 unraid1 kernel: begin_new_exec+0x6f7/0x945

Oct  5 15:12:15 unraid1 kernel: load_elf_binary+0x22c/0x12ae

Oct  5 15:12:15 unraid1 kernel: ? __kernel_read+0x100/0x145

Oct  5 15:12:15 unraid1 kernel: ? __kernel_read+0x100/0x145

Oct  5 15:12:15 unraid1 kernel: bprm_execve+0x23a/0x52b

Oct  5 15:12:15 unraid1 kernel: do_execveat_common.isra.0+0x1a9/0x1d2

Oct  5 15:12:15 unraid1 kernel: __x64_sys_execve+0x38/0x44

Oct  5 15:12:15 unraid1 kernel: do_syscall_64+0x6b/0x81

Oct  5 15:12:15 unraid1 kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd

Oct  5 15:12:15 unraid1 kernel: RIP: 0033:0x1483c12b0e47

Oct  5 15:12:15 unraid1 kernel: Code: Unable to access opcode bytes at RIP 0x1483c12b0e1d.

Oct  5 15:12:15 unraid1 kernel: RSP: 002b:00007fff815b6198 EFLAGS: 00000202 ORIG_RAX: 000000000000003b

Oct  5 15:12:15 unraid1 kernel: RAX: ffffffffffffffda RBX: 000000000054cd30 RCX: 00001483c12b0e47

Oct  5 15:12:15 unraid1 kernel: RDX: 000000000054c010 RSI: 0000000000541b10 RDI: 000000000054f150

Oct  5 15:12:15 unraid1 kernel: RBP: 000000000054f150 R08: 0000000000541b10 R09: 00706572672f6e69

Oct  5 15:12:15 unraid1 kernel: R10: 0000000000000005 R11: 0000000000000202 R12: 000000000054f150

Oct  5 15:12:15 unraid1 kernel: R13: 0000000000541b10 R14: 000000000054c010 R15: 000000000052a5e4

Oct  5 15:12:15 unraid1 kernel: </TASK>

Oct  5 15:12:15 unraid1 kernel: Modules linked in: dm_mod dax xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap tcp_diag udp_diag inet_diag cmac cifs asn1_decoder cifs_arc4 cifs_md4 oid_registry dns_resolver ipvlan veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc bonding tls ipv6 btusb btrtl btbcm edac_mce_amd edac_core wmi_bmof kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel btintel aesni_intel crypto_simd cryptd bluetooth mpt3sas rapl input_leds led_class joydev nvme i2c_piix4

Oct  5 15:12:15 unraid1 kernel: r8169 k10temp ahci i2c_core cdc_acm ecdh_generic raid_class nvme_core ecc libahci ccp realtek scsi_transport_sas wmi tpm_crb tpm_tis tpm_tis_core tpm button acpi_cpufreq unix

Oct  5 15:12:15 unraid1 kernel: CR2: ffff88883f37ce90

Oct  5 15:12:15 unraid1 kernel: ---[ end trace 0000000000000000 ]---

Oct  5 15:12:15 unraid1 kernel: RIP: 0010:__free_one_page+0x1f6/0x285

Oct  5 15:12:15 unraid1 kernel: Code: ef 48 c1 e7 06 4c 01 f7 e8 8f e8 ff ff 4c 8b 44 24 08 48 85 c0 75 98 49 6b d7 68 48 c1 e3 04 48 8d 84 13 00 01 00 00 48 01 d3 <49> 8b 8c 1c 00 01 00 00 4c 01 e0 4c 89 41 08 49 89 4e 08 49 89 46

Oct  5 15:12:15 unraid1 kernel: RSP: 0018:ffffc9000234fc40 EFLAGS: 00010002

Oct  5 15:12:15 unraid1 kernel: RAX: 0000000010000110 RBX: 0000000010000010 RCX: 00000000f0000080

Oct  5 15:12:15 unraid1 kernel: RDX: 0000000000000000 RSI: 00000000004b6e7e RDI: 00000000004b6e7c

 

 

////////////////

 

System information:

 

Model:Custom

M/B:ASUSTeK COMPUTER INC. TUF GAMING X570-PLUS (WI-FI) Version Rev X.0x 

BIOS:American Megatrends Inc. Version 3402. Dated: 01/13/2021

CPU:AMD Ryzen 7 3700X 8-Core @ 3600 MHz

HVM:Enabled

IOMMU:Enabled

Cache:512 KiB, 4 MB, 32 MB

Memory:32 GiB DDR4 (max. installable capacity 128 GiB)

Network:bond0: fault-tolerance (active-backup), mtu 1500

Kernel:Linux 5.19.9-Unraid x86_64

OpenSSL:1.1.1q

Uptime:

Link to comment

Hi,

 

I had the syslog enabled for a few days now and it went through a couple of crashes.  That's why I posted it.  We can deleted it and start over and wait for another crash.  The most recent data I have is what was originally posted and I had the live log on the screen.  I was able to get it because it was in the browser .

 

Tanks

 

Link to comment

Hi,

 

Yes before I posted here I disabled c states as per these instructions I found on the forum. I will reboot the server later and look for the bios setting.

 

///////////////////////////////

You have a first gen Ryzen, which UnRAID doesn't like to play nice with due to sleep states.

Make your /config/go file look like this:

#!/bin/bash
# Start the Management Utility
/usr/local/sbin/zenstates --c6-disable
/usr/local/sbin/emhttp &

///////////////////////////////////////

 

Link to comment

Nothing relevant logged, if the server was stable and started crashing recently without any change it suggests a hardware issue, one thing you can try is to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment

Hi,

I looked through there as well and I could not see any eye pocking things, I though maybe someone with more experience than me would spot something.  

 

Anyway, thanks for the help on this. I will take your advice and try to pinpoint the error using safemode.  

 

I will post findings in a few days time.

 

Take care

Link to comment
1 hour ago, ds1414 said:

Hi,

I looked through there as well and I could not see any eye pocking things, I though maybe someone with more experience than me would spot something.  

 

Anyway, thanks for the help on this. I will take your advice and try to pinpoint the error using safemode.  

 

I will post findings in a few days time.

 

Take care

Your first log look very similar to my log and problem.
I never had this problem before. I figured out it has something to do with my torrent program, because everytime this happens, i have no access to my torrent web gui and same problem with unraid gui. All other containers are running fine.
All this start happening after upgrading unraid to v.6.11.0.

Sep 27 22:07:18 Pegasus kernel: BUG: kernel NULL pointer dereference, address: 0000000000000056
Sep 27 22:07:18 Pegasus kernel: #PF: supervisor read access in kernel mode
Sep 27 22:07:18 Pegasus kernel: #PF: error_code(0x0000) - not-present page
Sep 27 22:07:18 Pegasus kernel: PGD 0 P4D 0 
Sep 27 22:07:18 Pegasus kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Sep 27 22:07:18 Pegasus kernel: CPU: 18 PID: 27462 Comm: Disk Tainted: P           O      5.19.9-Unraid #1
Sep 27 22:07:18 Pegasus kernel: Hardware name: ASUS System Product Name/PRIME B560M-K, BIOS 1605 05/13/2022
Sep 27 22:07:18 Pegasus kernel: RIP: 0010:folio_try_get_rcu+0x0/0x21
Sep 27 22:07:18 Pegasus kernel: Code: e8 84 5e 63 00 48 8b 84 24 80 00 00 00 65 48 2b 04 25 28 00 00 00 74 05 e8 15 95 64 00 48 81 c4 88 00 00 00 5b c3 cc cc cc cc <8b> 57 34 85 d2 74 10 8d 4a 01 89 d0 f0 0f b1 4f 34 74 04 89 c2 eb
Sep 27 22:07:18 Pegasus kernel: RSP: 0000:ffffc9000ea57cc0 EFLAGS: 00010246
Sep 27 22:07:18 Pegasus kernel: RAX: 0000000000000022 RBX: 0000000000000022 RCX: 0000000000000022
Sep 27 22:07:18 Pegasus kernel: RDX: 0000000000000001 RSI: ffff88853b93eda0 RDI: 0000000000000022
Sep 27 22:07:18 Pegasus kernel: RBP: 0000000000000000 R08: 000000000000000c R09: ffffc9000ea57cd0
Sep 27 22:07:18 Pegasus kernel: R10: ffffc9000ea57cd0 R11: ffffc9000ea57d48 R12: 0000000000000000
Sep 27 22:07:18 Pegasus kernel: R13: ffff888010febab8 R14: 000000000000340d R15: ffff888010febac0
Sep 27 22:07:18 Pegasus kernel: FS:  0000145db2182640(0000) GS:ffff88883c480000(0000) knlGS:0000000000000000
Sep 27 22:07:18 Pegasus kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 27 22:07:18 Pegasus kernel: CR2: 0000000000000056 CR3: 000000063ded2005 CR4: 00000000007726e0
Sep 27 22:07:18 Pegasus kernel: PKRU: 55555554
Sep 27 22:07:18 Pegasus kernel: Call Trace:
Sep 27 22:07:18 Pegasus kernel: <TASK>
Sep 27 22:07:18 Pegasus kernel: __filemap_get_folio+0x98/0x1ff
Sep 27 22:07:18 Pegasus kernel: ? _raw_spin_unlock+0x14/0x29
Sep 27 22:07:18 Pegasus kernel: filemap_fault+0x6e/0x524
Sep 27 22:07:18 Pegasus kernel: __do_fault+0x2d/0x6e
Sep 27 22:07:18 Pegasus kernel: __handle_mm_fault+0x9a5/0xc7d
Sep 27 22:07:18 Pegasus kernel: handle_mm_fault+0x113/0x1d7
Sep 27 22:07:18 Pegasus kernel: do_user_addr_fault+0x36a/0x514
Sep 27 22:07:18 Pegasus kernel: exc_page_fault+0xfc/0x11e
Sep 27 22:07:18 Pegasus kernel: asm_exc_page_fault+0x22/0x30
Sep 27 22:07:18 Pegasus kernel: RIP: 0033:0x145db6bd2e8d
Sep 27 22:07:18 Pegasus kernel: Code: 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 48 89 f8 48 83 fa 20 72 23 <c5> fe 6f 06 48 83 fa 40 0f 87 a5 00 00 00 c5 fe 6f 4c 16 e0 c5 fe
Sep 27 22:07:18 Pegasus kernel: RSP: 002b:0000145db2181798 EFLAGS: 00010202
Sep 27 22:07:18 Pegasus kernel: RAX: 0000145b9400b6d0 RBX: 0000145b940062f8 RCX: 0000145db21819d0
Sep 27 22:07:18 Pegasus kernel: RDX: 0000000000004000 RSI: 000014566580d691 RDI: 0000145b9400b6d0
Sep 27 22:07:18 Pegasus kernel: RBP: 0000000000000000 R08: 0000000000000104 R09: 0000000000000000
Sep 27 22:07:18 Pegasus kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000
Sep 27 22:07:18 Pegasus kernel: R13: 0000145db2181a50 R14: 0000000000000104 R15: 0000145b94004e90
Sep 27 22:07:18 Pegasus kernel: </TASK>
Sep 27 22:07:18 Pegasus kernel: Modules linked in: nvidia_uvm(PO) veth cmac cifs asn1_decoder cifs_arc4 cifs_md4 oid_registry dns_resolver xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat macvlan iptable_mangle vhost_net tun vhost vhost_iotlb tap xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod nct6775 nct6775_core hwmon_vid efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls ipv6 e1000e r8169 realtek nvidia_drm(PO) nvidia_modeset(PO) i915 iosf_mbi drm_buddy i2c_algo_bit nvidia(PO) ttm drm_display_helper x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel wmi_bmof kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm_kms_helper aesni_intel crypto_simd drm cryptd rapl intel_cstate intel_gtt i2c_i801 agpgart i2c_smbus ahci intel_uncore libahci i2c_core input_leds syscopyarea
Sep 27 22:07:18 Pegasus kernel: led_class joydev sysfillrect corsair_psu sysimgblt fb_sys_fops thermal fan wmi tpm_crb tpm_tis video tpm_tis_core backlight tpm acpi_tad acpi_pad button unix [last unloaded: e1000e]
Sep 27 22:07:18 Pegasus kernel: CR2: 0000000000000056
Sep 27 22:07:18 Pegasus kernel: ---[ end trace 0000000000000000 ]---
Sep 27 22:07:18 Pegasus kernel: RIP: 0010:folio_try_get_rcu+0x0/0x21
Sep 27 22:07:18 Pegasus kernel: Code: e8 84 5e 63 00 48 8b 84 24 80 00 00 00 65 48 2b 04 25 28 00 00 00 74 05 e8 15 95 64 00 48 81 c4 88 00 00 00 5b c3 cc cc cc cc <8b> 57 34 85 d2 74 10 8d 4a 01 89 d0 f0 0f b1 4f 34 74 04 89 c2 eb
Sep 27 22:07:18 Pegasus kernel: RSP: 0000:ffffc9000ea57cc0 EFLAGS: 00010246
Sep 27 22:07:18 Pegasus kernel: RAX: 0000000000000022 RBX: 0000000000000022 RCX: 0000000000000022
Sep 27 22:07:18 Pegasus kernel: RDX: 0000000000000001 RSI: ffff88853b93eda0 RDI: 0000000000000022
Sep 27 22:07:18 Pegasus kernel: RBP: 0000000000000000 R08: 000000000000000c R09: ffffc9000ea57cd0
Sep 27 22:07:18 Pegasus kernel: R10: ffffc9000ea57cd0 R11: ffffc9000ea57d48 R12: 0000000000000000
Sep 27 22:07:18 Pegasus kernel: R13: ffff888010febab8 R14: 000000000000340d R15: ffff888010febac0
Sep 27 22:07:18 Pegasus kernel: FS:  0000145db2182640(0000) GS:ffff88883c480000(0000) knlGS:0000000000000000
Sep 27 22:07:18 Pegasus kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 27 22:07:18 Pegasus kernel: CR2: 0000000000000056 CR3: 000000063ded2005 CR4: 00000000007726e0
Sep 27 22:07:18 Pegasus kernel: PKRU: 55555554

 

 

Edited by CiscoCoreX
Link to comment

I just started getting this error too; very frustrating.

 

To be transparent, I upgraded my hardware 2 weeks before my upgrade to 6.11.1. So while I thought the hardware is stable after running without issue on 6.10.3 for that time before upgrading, it is possible (though unlikely) that it is not an Unraid issue.

 

What's interesting is that all 3 of us (so far) use an ASUS motherboard. Maybe it's a coincidence. Maybe it's not.

 

I am cross posting for traction on my other thread I started here:

 

-JesterEE

Edited by JesterEE
Grammar
Link to comment

So I solved the problem with the server.  

 

After the five days as posted in the previous post, we had to cut electricity to the house for some electrical work.  After restoring power the server would not make it over 24 hours.  It was very frustrating I have become depended on a new apps running there for my daily work. So I upgraded the BIOS to the latest version, as per everyone's suggestion.  Unfortunately that did not solve the issue.  In desperation I decided to swap the mother board with an intel one hoping that would work.  Due to motherboard limitation I could not do it.  So since all suggests where pointing towards motherboard power, I swapped the system power supply.  It has been two days now and it is running smoothly.  

 

I will post again in a few more days on the progress.

 

DS

Link to comment
3 minutes ago, ds1414 said:

So since all suggests where pointing towards motherboard power, I swapped the system power supply.  It has been two days now and it is running smoothly.  

 

Glad you got something working but I'd hardly consider this a solution given that just 1 OS version before didn't show a hardware issue. So unless the update somehow messed up the physical power supply electronics (extremely unlikely) I'm skeptical this is it.

 

There are a lot more of us with this issue and I doubt that everyone has faulty hardware. You can track the problem thread here:

 

 

Link to comment
  • 2 weeks later...

Hi, 

 

I am new to Unraid but I have been getting crashes every couple of days.

 

I thought it might be my HBA getting hot, so I put a fan on it and it lasted almost 4 days of uptime, where as previously it would only manage just under 3 days.

 

I have done a memtest, CPU test, GPU test etc etc and nothing shows a fault.

 

I have Qbit and Sab on my server as well as Plex so I will try running it first for a few days without them and see how it goes.

 

This thread is rather usefull thanks.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...