Plex causing BTRFS corruption and killing a Hyperthread?


Entxawp

Recommended Posts

Hi everyone,

 

I have an issue with my Unraid server that kept crashing while under a heavy plex load. This was because of the UPS runtime left was set too high. but after fixing that the server is still spitting out BTRFS errors and plex even seems to have killed a HT (#18) and is now stuck at 100%. How should I proceed?

(recreate docker and btrfs?)

 

Many thanks, Ent

 

61844370_PlexkilledaHT.thumb.png.080971356419da16c193e80053efc567.pngentxvault-diagnostics-20201023-1417.zip

Link to comment
Oct 23 15:31:15 entxvault kernel: BTRFS warning (device loop2): csum failed root 1011 ino 45837 off 360448 csum 0x79fcb17b expected csum 0x528975a8 mirror 1
### [PREVIOUS LINE REPEATED 10 TIMES] ###
Oct 23 15:31:34 entxvault kernel: general protection fault: 0000 [#1] SMP NOPTI
Oct 23 15:31:34 entxvault kernel: CPU: 16 PID: 1224 Comm: kswapd0 Tainted: G           O      4.19.107-Unraid #1
Oct 23 15:31:34 entxvault kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.90 12/04/2019
Oct 23 15:31:34 entxvault kernel: RIP: 0010:dentry_unlink_inode+0xa4/0xc6
Oct 23 15:31:34 entxvault kernel: Code: e7 45 31 c9 45 31 c0 b9 02 00 00 00 4c 89 e2 be 00 04 00 00 e8 df 20 02 00 4c 89 e7 e8 cb 20 02 00 48 8b 45 60 48 85 c0 74 17 <48> 8b 40 40 48 85 c0 74 0e 4c 89 e6 48 89 ef 5d 41 5c e9 1b 0b 8a
Oct 23 15:31:34 entxvault kernel: RSP: 0018:ffffc900042cbc18 EFLAGS: 00010282
Oct 23 15:31:34 entxvault kernel: RAX: fffbffff81c1fa40 RBX: ffff88822edccfc0 RCX: 0000000000000000
Oct 23 15:31:34 entxvault kernel: RDX: ffff88822ed6b730 RSI: ffffc900012416e0 RDI: ffff88822ed6b680
Oct 23 15:31:34 entxvault kernel: RBP: ffff8882d5abb5c0 R08: 0000000000000001 R09: ffffc900042cbb78
Oct 23 15:31:34 entxvault kernel: R10: 0000000000000001 R11: ffff88885cf5fb40 R12: ffff88822ed6b600
Oct 23 15:31:34 entxvault kernel: R13: ffff8882d5abb5c0 R14: ffffc900042cbc98 R15: ffff88822edccfc0
Oct 23 15:31:34 entxvault kernel: FS:  0000000000000000(0000) GS:ffff88885d200000(0000) knlGS:0000000000000000
Oct 23 15:31:34 entxvault kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 23 15:31:34 entxvault kernel: CR2: 000014fa7c15c000 CR3: 00000005b7098000 CR4: 00000000003406e0
Oct 23 15:31:34 entxvault kernel: Call Trace:
Oct 23 15:31:34 entxvault kernel: __dentry_kill+0xcb/0x135
Oct 23 15:31:34 entxvault kernel: shrink_dentry_list+0x149/0x185
Oct 23 15:31:34 entxvault kernel: prune_dcache_sb+0x56/0x74
Oct 23 15:31:34 entxvault kernel: super_cache_scan+0xee/0x16d
Oct 23 15:31:34 entxvault kernel: do_shrink_slab+0x128/0x194
Oct 23 15:31:34 entxvault kernel: shrink_slab+0x11b/0x276
Oct 23 15:31:34 entxvault kernel: shrink_node+0x108/0x3cb
Oct 23 15:31:34 entxvault kernel: kswapd+0x451/0x58a
Oct 23 15:31:34 entxvault kernel: ? __switch_to_asm+0x41/0x70
Oct 23 15:31:34 entxvault kernel: ? mem_cgroup_shrink_node+0xa4/0xa4
Oct 23 15:31:34 entxvault kernel: kthread+0x10c/0x114
Oct 23 15:31:34 entxvault kernel: ? kthread_park+0x89/0x89
Oct 23 15:31:34 entxvault kernel: ret_from_fork+0x22/0x40
Oct 23 15:31:34 entxvault kernel: Modules linked in: macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net vhost tap xt_nat veth ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod tun nct6775 hwmon_vid bonding igb(O) edac_mce_amd kvm_amd kvm btusb btrtl btbcm btintel bluetooth mpt3sas k10temp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc wmi_bmof mxm_wmi aesni_intel ecdh_generic aes_x86_64 ccp ahci i2c_piix4 crypto_simd i2c_core nvme raid_class pcc_cpufreq cryptd libahci glue_helper scsi_transport_sas nvme_core wmi button acpi_cpufreq [last unloaded: igb]
Oct 23 15:31:34 entxvault kernel: ---[ end trace 4db28ab27bc080eb ]---
Oct 23 15:31:34 entxvault kernel: RIP: 0010:dentry_unlink_inode+0xa4/0xc6
Oct 23 15:31:34 entxvault kernel: Code: e7 45 31 c9 45 31 c0 b9 02 00 00 00 4c 89 e2 be 00 04 00 00 e8 df 20 02 00 4c 89 e7 e8 cb 20 02 00 48 8b 45 60 48 85 c0 74 17 <48> 8b 40 40 48 85 c0 74 0e 4c 89 e6 48 89 ef 5d 41 5c e9 1b 0b 8a
Oct 23 15:31:34 entxvault kernel: RSP: 0018:ffffc900042cbc18 EFLAGS: 00010282
Oct 23 15:31:34 entxvault kernel: RAX: fffbffff81c1fa40 RBX: ffff88822edccfc0 RCX: 0000000000000000
Oct 23 15:31:34 entxvault kernel: RDX: ffff88822ed6b730 RSI: ffffc900012416e0 RDI: ffff88822ed6b680
Oct 23 15:31:34 entxvault kernel: RBP: ffff8882d5abb5c0 R08: 0000000000000001 R09: ffffc900042cbb78
Oct 23 15:31:34 entxvault kernel: R10: 0000000000000001 R11: ffff88885cf5fb40 R12: ffff88822ed6b600
Oct 23 15:31:34 entxvault kernel: R13: ffff8882d5abb5c0 R14: ffffc900042cbc98 R15: ffff88822edccfc0
Oct 23 15:31:34 entxvault kernel: FS:  0000000000000000(0000) GS:ffff88885d200000(0000) knlGS:0000000000000000
Oct 23 15:31:34 entxvault kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 23 15:31:34 entxvault kernel: CR2: 000014fa7c15c000 CR3: 00000005b7098000 CR4: 00000000003406e0
Oct 23 15:31:44 entxvault kernel: BTRFS warning (device loop2): csum failed root 1011 ino 49129 off 4558848 csum 0x946039f0 expected csum 0xbf15fd23 mirror 1
### [PREVIOUS LINE REPEATED 8 TIMES] ###
Oct 23 15:31:54 entxvault kernel: BTRFS warning (device loop2): csum failed root 1011 ino 45837 off 360448 csum 0x79fcb17b expected csum 0x528975a8 mirror 1
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Oct 23 15:32:00 entxvault kernel: BTRFS warning (device loop2): csum failed root 1011 ino 49129 off 4558848 csum 0x946039f0 expected csum 0xbf15fd23 mirror 1
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Oct 23 15:32:04 entxvault kernel: ------------[ cut here ]------------
Oct 23 15:32:04 entxvault kernel: WARNING: CPU: 18 PID: 17059 at fs/btrfs/inode.c:9333 btrfs_destroy_inode+0xaa/0x206
Oct 23 15:32:04 entxvault kernel: Modules linked in: macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net vhost tap xt_nat veth ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod tun nct6775 hwmon_vid bonding igb(O) edac_mce_amd kvm_amd kvm btusb btrtl btbcm btintel bluetooth mpt3sas k10temp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc wmi_bmof mxm_wmi aesni_intel ecdh_generic aes_x86_64 ccp ahci i2c_piix4 crypto_simd i2c_core nvme raid_class pcc_cpufreq cryptd libahci glue_helper scsi_transport_sas nvme_core wmi button acpi_cpufreq [last unloaded: igb]
Oct 23 15:32:04 entxvault kernel: CPU: 18 PID: 17059 Comm: shfs Tainted: G      D    O      4.19.107-Unraid #1
Oct 23 15:32:04 entxvault kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.90 12/04/2019
Oct 23 15:32:04 entxvault kernel: RIP: 0010:btrfs_destroy_inode+0xaa/0x206
Oct 23 15:32:04 entxvault kernel: Code: ff ff ff 00 74 0e 48 c7 c7 79 fc d2 81 e8 bf 35 e6 ff 0f 0b 48 83 bb 20 ff ff ff 00 74 0e 48 c7 c7 79 fc d2 81 e8 a7 35 e6 ff <0f> 0b 48 83 bb 28 ff ff ff 00 74 0e 48 c7 c7 79 fc d2 81 e8 8f 35
Oct 23 15:32:04 entxvault kernel: RSP: 0018:ffffc90006323998 EFLAGS: 00010246
Oct 23 15:32:04 entxvault kernel: RAX: 0000000000000024 RBX: ffff8882e1ba3700 RCX: 0000000000000000
Oct 23 15:32:04 entxvault kernel: RDX: 0000000000000000 RSI: ffff88885d2964f8 RDI: ffff88885d2964f8
Oct 23 15:32:04 entxvault kernel: RBP: ffff888854eb2800 R08: 000000000000000f R09: ffff8880000b9900
Oct 23 15:32:04 entxvault kernel: R10: 0000000000000000 R11: 0000000000000044 R12: ffff888858bb8000
Oct 23 15:32:04 entxvault kernel: R13: 0000000000000000 R14: 0000000000025eba R15: 0000000000000000
Oct 23 15:32:04 entxvault kernel: FS:  0000148d0653b700(0000) GS:ffff88885d280000(0000) knlGS:0000000000000000
Oct 23 15:32:04 entxvault kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 23 15:32:04 entxvault kernel: CR2: 0000148d0507b020 CR3: 0000000810e50000 CR4: 00000000003406e0
Oct 23 15:32:04 entxvault kernel: Call Trace:
Oct 23 15:32:04 entxvault kernel: dispose_list+0x30/0x39
Oct 23 15:32:04 entxvault kernel: prune_icache_sb+0x56/0x74
Oct 23 15:32:04 entxvault kernel: super_cache_scan+0x11a/0x16d
Oct 23 15:32:04 entxvault kernel: do_shrink_slab+0x128/0x194
Oct 23 15:32:04 entxvault kernel: shrink_slab+0x20c/0x276
Oct 23 15:32:04 entxvault kernel: shrink_node+0x108/0x3cb
Oct 23 15:32:04 entxvault kernel: do_try_to_free_pages+0x1a1/0x300
Oct 23 15:32:04 entxvault kernel: try_to_free_pages+0xb2/0xcd
Oct 23 15:32:04 entxvault kernel: __alloc_pages_nodemask+0x423/0xae1
Oct 23 15:32:04 entxvault kernel: ? __switch_to_asm+0x41/0x70
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Oct 23 15:32:04 entxvault kernel: ? __switch_to_asm+0x35/0x70
Oct 23 15:32:04 entxvault kernel: ? __switch_to+0x2a6/0x2fb
Oct 23 15:32:04 entxvault kernel: fuse_copy_fill.part.0+0x9e/0x147
Oct 23 15:32:04 entxvault kernel: fuse_copy_one+0x43/0x5c
Oct 23 15:32:04 entxvault kernel: fuse_dev_do_read.isra.0+0x4f7/0x650
Oct 23 15:32:04 entxvault kernel: ? do_iter_write+0x14a/0x15c
Oct 23 15:32:04 entxvault kernel: ? wait_woken+0x6a/0x6a
Oct 23 15:32:04 entxvault kernel: fuse_dev_splice_read+0x91/0x14d
Oct 23 15:32:04 entxvault kernel: __se_sys_splice+0x4c2/0x54f
Oct 23 15:32:04 entxvault kernel: do_syscall_64+0x57/0xf2
Oct 23 15:32:04 entxvault kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 23 15:32:04 entxvault kernel: RIP: 0033:0x148d07e78c6b
Oct 23 15:32:04 entxvault kernel: Code: e8 ca 92 f7 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 4c 8b 54 24 18 8b 54 24 28 b8 13 01 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 31 89 ef 48 89 44 24 08 e8 f1 92 f7 ff 48 8b
Oct 23 15:32:04 entxvault kernel: RSP: 002b:0000148d0653add0 EFLAGS: 00000293 ORIG_RAX: 0000000000000113
Oct 23 15:32:04 entxvault kernel: RAX: ffffffffffffffda RBX: 0000000000000036 RCX: 0000148d07e78c6b
Oct 23 15:32:04 entxvault kernel: RDX: 0000000000000036 RSI: 0000000000000000 RDI: 0000000000000004
Oct 23 15:32:04 entxvault kernel: RBP: 0000000000000000 R08: 0000000000021000 R09: 0000000000000000
Oct 23 15:32:04 entxvault kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000148ce4000ba0
Oct 23 15:32:04 entxvault kernel: R13: 0000000000000000 R14: 0000148cd8000b60 R15: 0000148d0653b700
Oct 23 15:32:04 entxvault kernel: ---[ end trace 4db28ab27bc080ec ]---
Oct 23 15:32:13 entxvault kernel: BTRFS warning (device loop2): csum failed root 1011 ino 49129 off 4558848 csum 0x946039f0 expected csum 0xbf15fd23 mirror 1
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Oct 23 15:32:28 entxvault kernel: BTRFS warning (device loop2): csum failed root 1011 ino 45837 off 360448 csum 0x79fcb17b expected csum 0x528975a8 mirror 1

 

Link to comment
5 minutes ago, JorgeB said:

There are several crashes, possibly RAM related, see here, you're overclocking the RAM and that is a known issue with those CPUs, but after fixing that also good to recreate the docker image.

Hi, the ram is 3200 which is what I'm running at, I also ran a Memtest with 2 passes without any issues. I did run the server with the exact same setup for months (sometimes for up to 50 days of uptime) without any issues.

Edited by Entxawp
Link to comment
2 minutes ago, JorgeB said:

Did you look at the link, it's still way above max AMD supported speeds, so it's an overclock, and known to corrupt data in some cases, even if no errors are detected with memtest.

Thank you for the help then, It is a little sad that I will have to sacrifice this much speed for this issue.

Link to comment
37 minutes ago, Entxawp said:

Thank you for the help then, It is a little sad that I will have to sacrifice this much speed for this issue.

I doubt that you will see a speed decrease, in fact I suspect things will work much faster with the memory working in sync with the processor. Pushing speeds past spec causes multiple retries for data that is corrupted but correctable and crashes when the corruption is uncorrectable, which should be eliminated when things are running in spec.

Link to comment
1 minute ago, jonathanm said:

I doubt that you will see a speed decrease, in fact I suspect things will work much faster with the memory working in sync with the processor. Pushing speeds past spec causes multiple retries for data that is corrupted but correctable and crashes when the corruption is uncorrectable, which should be eliminated when things are running in spec.

I do have a question left, the link doesn't mention threadrippers with 8 slots running at quadchannel (not that I have that right now, I have 4x8gb). Should I ever fill all eight slots can I still run them at 1866?

Link to comment
2 hours ago, JorgeB said:

It does:

Hey I just want to say that I stresstested the system for about an hour and it no longer crashes, thank you for your help. I do have one more question, can I still overclock my cpu, or would that cause the memory to be out of sync again?

 

Thank you, Ent

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.