kbareis

Members
  • Posts

    8
  • Joined

  • Last visited

kbareis's Achievements

Noob

Noob (1/14)

1

Reputation

  1. Marking @JorgeB as the correct answer. I have swapped basically everything and what I finally came to was unstable RAM. Both sticks individually passed 3x passes. Both passed together with 3x passes but after running a 7x pass they failed. Currently, I have a 8x2 set of Corsair in there and uptime is at 5+ days
  2. I’ll try but this hardware has been running in this config since last year. There are numerous posts on these forums of issues on 6.12.x. I worry there is an unknown software issue. I am considering ether moving to zfs for the cache disks and container images or completely leaving unraid
  3. Well I spoke to soon... attaching syslog and diagnostics. Same thing, can ping the server but ssh and webgui down. Looks like issues started at 19:28 in the logs with some kernel errors on several cores that quickly spread into btrfs issues which bricked the whole thing. syslog-192.168.1.30.log oldtown-diagnostics-20240224-1954.zip
  4. Dropping an update here for anyone who might be having stability issues with Ryzen and Unraid 6.12.8. Memtested my RAM again and ensured 3 successful passes Down clocked it from 3600 to 2400 (could do a bit higher per the document below but I decided to just run DDR4 default spec). Disabled C-States on the processor (bios) Set my Power Idle Control to typical (bios) Removed both cache SSDs Did a parity check to ensure data on the array Changed Docker from macvlan to ipvlan per some stability issue reports on these forms and reddit Used recovery tool to pull files off BTRFS cache drives Rebuilt the cache drives from scratch ensuring that only critical data was corruption free for the rebuild Overall I have been up for longer than I have been in two weeks. Will keep posting on this thread with any additional updates or recommendations. My best guess at this point is that due to some sort of corruption event from ether cstates, power, memory timings, or macvlan that caused btrfs, vm container or the docker container to become unstable thus rendering the UI to freeze and stop responding after 24-48 hours.
  5. Thanks all! Parity check is just finishing up and then I will run the xfs repair. One thing I have tweaked since posting is unassigning my cache drives due to constant btrfs errors being logged out to syslog. Things appear to be running better now that they are out. 18 hours uptime so far! If I had to guess at this point, I believe the btrfs filesystem was hosed and was causing full system instability. When I have tried to do data recovery mounting the btrfs cache drives with read-only mount commands is also when I see instability. Going to run the system without cache drives for a week or so to test stability and if good, likely reformat them, and test them for issues prior to dropping them back in. Memtest also passed three loops without issue.
  6. Hey @trurl- Diags are hard to get due to the device becoming unresponsive to webui, ssh, and input from hardwired keyboard and mouse. I setup an external syslog server just to try and diagnose. Attaching a fresh diagnostics from when it rebooted. Likely won't have all the errors that we are looking for. Happy to gather more off the syslog server if that is of interest. Currently doing a parity check on the drives. oldtown-diagnostics-20240220-1536.zip
  7. For those that don't want to download the log, here's a snippet of one of the syslog errors Feb 20 13:18:33 Oldtown kernel: rcu: INFO: rcu_preempt self-detected stall on CPU Feb 20 13:18:33 Oldtown kernel: rcu: #0119-....: (1319739 ticks this GP) idle=ca84/1/0x4000000000000000 softirq=781013/781013 fqs=516199 Feb 20 13:18:33 Oldtown kernel: #011(t=1320207 jiffies g=2460473 q=3174295 ncpus=16) Feb 20 13:18:33 Oldtown kernel: CPU: 9 PID: 16255 Comm: find Tainted: P D O 6.1.74-Unraid #1 Feb 20 13:18:33 Oldtown kernel: Hardware name: System manufacturer System Product Name/TUF GAMING X570-PLUS (WI-FI), BIOS 4602 02/23/2023 Feb 20 13:18:33 Oldtown kernel: RIP: 0010:xfs_buf_get_map+0x108/0x804 [xfs] Feb 20 13:18:33 Oldtown kernel: Code: e8 ee 65 03 00 0f 0b bd 8b ff ff ff e9 eb 06 00 00 48 8d 54 f5 40 4c 8b 22 49 83 e4 fe 75 07 49 89 d4 49 83 cc 01 41 f6 c4 01 <74> 41 48 89 d0 48 83 c8 01 49 39 c4 75 de 48 8b 6d 30 48 85 ed 74 Feb 20 13:18:33 Oldtown kernel: RSP: 0018:ffffc90017827b10 EFLAGS: 00000202 Feb 20 13:18:33 Oldtown kernel: RAX: 0000000000000001 RBX: ffff88814b446c00 RCX: 000000003b627298 Feb 20 13:18:33 Oldtown kernel: RDX: ffff888112411290 RSI: ffff8882759e3a80 RDI: ffffc90017827b60 Feb 20 13:18:33 Oldtown kernel: RBP: ffff888112410000 R08: ffffffffa0d7fac7 R09: 0000000000000000 Feb 20 13:18:33 Oldtown kernel: R10: 0000000000000000 R11: ffff8881069ee018 R12: ffff888112411a91 Feb 20 13:18:33 Oldtown kernel: R13: ffff8882759e3a80 R14: ffff888106e7e000 R15: ffffc90017827c40 Feb 20 13:18:33 Oldtown kernel: FS: 000014f85bf33740(0000) GS:ffff88880ea40000(0000) knlGS:0000000000000000 Feb 20 13:18:33 Oldtown kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 20 13:18:33 Oldtown kernel: CR2: 0000000000456020 CR3: 0000000150ace000 CR4: 0000000000750ee0 Feb 20 13:18:33 Oldtown kernel: PKRU: 55555554 Feb 20 13:18:33 Oldtown kernel: Call Trace: Feb 20 13:18:33 Oldtown kernel: <IRQ> Feb 20 13:18:33 Oldtown kernel: ? rcu_dump_cpu_stacks+0x95/0xb9 Feb 20 13:18:33 Oldtown kernel: ? rcu_sched_clock_irq+0x345/0xa45 Feb 20 13:18:33 Oldtown kernel: ? tick_init_jiffy_update+0x7c/0x7c Feb 20 13:18:33 Oldtown kernel: ? update_process_times+0x62/0x81 Feb 20 13:18:33 Oldtown kernel: ? tick_sched_timer+0x43/0x71 Feb 20 13:18:33 Oldtown kernel: ? __hrtimer_run_queues+0xeb/0x190 Feb 20 13:18:33 Oldtown kernel: ? hrtimer_interrupt+0x9c/0x16e Feb 20 13:18:33 Oldtown kernel: ? __sysvec_apic_timer_interrupt+0xc5/0x12f Feb 20 13:18:33 Oldtown kernel: ? sysvec_apic_timer_interrupt+0x80/0xa6 Feb 20 13:18:33 Oldtown kernel: </IRQ> Feb 20 13:18:33 Oldtown kernel: <TASK> Feb 20 13:18:33 Oldtown kernel: ? asm_sysvec_apic_timer_interrupt+0x16/0x20 Feb 20 13:18:33 Oldtown kernel: ? xfs_buf_get_map+0x9b/0x804 [xfs] Feb 20 13:18:33 Oldtown kernel: ? xfs_buf_get_map+0x108/0x804 [xfs] Feb 20 13:18:33 Oldtown kernel: xfs_buf_read_map+0x51/0x1b3 [xfs] Feb 20 13:18:33 Oldtown kernel: ? xfs_buf_readahead_map+0x5/0x50 [xfs] Feb 20 13:18:33 Oldtown kernel: xfs_buf_readahead_map+0x30/0x50 [xfs] Feb 20 13:18:33 Oldtown kernel: ? xfs_buf_readahead_map+0x5/0x50 [xfs] Feb 20 13:18:33 Oldtown kernel: xfs_da_reada_buf+0x6c/0xa1 [xfs] Feb 20 13:18:33 Oldtown kernel: xfs_dir2_leaf_readbuf+0x260/0x2f5 [xfs] Feb 20 13:18:33 Oldtown kernel: xfs_dir2_leaf_getdents+0xe0/0x322 [xfs] Feb 20 13:18:33 Oldtown kernel: ? xfs_bmap_last_offset+0x8a/0xc2 [xfs] Feb 20 13:18:33 Oldtown kernel: xfs_readdir+0x14e/0x190 [xfs] Feb 20 13:18:33 Oldtown kernel: iterate_dir+0x97/0x146 Feb 20 13:18:33 Oldtown kernel: __do_sys_getdents64+0x6b/0xd8 Feb 20 13:18:33 Oldtown kernel: ? compat_filldir+0x17a/0x17a Feb 20 13:18:33 Oldtown kernel: do_syscall_64+0x6b/0x81 Feb 20 13:18:33 Oldtown kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce Feb 20 13:18:33 Oldtown kernel: RIP: 0033:0x14f85c00d283 Feb 20 13:18:33 Oldtown kernel: Code: 89 df e8 20 05 fb ff 48 83 c4 08 48 89 e8 5b 5d c3 66 0f 1f 44 00 00 b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 61 0b 11 00 f7 d8 Feb 20 13:18:33 Oldtown kernel: RSP: 002b:00007fffb298e2b8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9 Feb 20 13:18:33 Oldtown kernel: RAX: ffffffffffffffda RBX: 00000000004588f0 RCX: 000014f85c00d283 Feb 20 13:18:33 Oldtown kernel: RDX: 0000000000008000 RSI: 0000000000458920 RDI: 0000000000000008 Feb 20 13:18:33 Oldtown kernel: RBP: 00000000004588f4 R08: 00000000ffffffff R09: 000000000044f6c0 Feb 20 13:18:33 Oldtown kernel: R10: 0000000000000100 R11: 0000000000000293 R12: ffffffffffffff88 Feb 20 13:18:33 Oldtown kernel: R13: 0000000000000000 R14: 0000000000444c90 R15: 000000000000106f Feb 20 13:18:33 Oldtown kernel: </TASK>
  8. Hi all, I have been banging my head against an issue I am seeing with Unraid. My server has been solid for months but within the last two weeks or so it has taken a nose dive. System Specs: 5800x Ryzen 32GB Gskill Asus X570 Tuf 2x Crucial NVME drives Smattering of 3-4 TB WD and Seagate drives Bios Changes- C States disabled Power set to typical DOCP off Last night the lock up was so bad it broke the filesystem on my mirrored btrfs cache drives. Booted today without Docker and VMs running and the thing still crashed. Dump from Syslog server attached syslogDump.log