omartian

Members
  • Posts

    105
  • Joined

  • Last visited

Everything posted by omartian

  1. Hi Guys- Had A weird issue where i couldn't access my plex docker this morning. Went to login to my unraid server via ip address and was able to login and get to the main page. All the disks were spun down. I could access the tools, dockers, and app page as well as the terminal. However, when i tried to access the dashboard, it would not load. Was hoping to initiate a shutdown via main page, but no-go. I was able to access the terminal via the unraid ui, and typed in shutdown. Was greeted w/a "your system is rebooting now" message but it didn't shutdown/reboot. Eventually, the other tabs became unresponsive and couldn't access the server prompting a hard reset. Currently, doing a Correcting parity check. Attached are the diagnostics. Any idea what went wrong here? Should the parity check be correcting or non-correcting? nasgard-diagnostics-20210820-1256.zip
  2. seems to have worked. just ran my 1st "0 error" parity check in months on my server. thank you.
  3. reverting helped and haven't crashed in last 2 weeks. will keep an eye out for a better upgrade.
  4. Server has been up for over 4 days w/o crashing. Hoping it's the latest update. Let me know if you have success on a future update.
  5. I'm hoping that's the case for me. Currently running a parity check with zero errors and that the system has not crashed in the last 36 hours. Fingers crossed.
  6. CPU issue? How to determine what hardware.
  7. Just downgraded to 6.8.3. It unassigned my cache drives oddly enough, but they're back up now. After a bit of snooping, seeing similar kernel errors on 6.9.2. Im running a parity check now and will see how it goes. Let me know if you find an update that works for you.
  8. Think i'm having the same issue. Going to try and revert to 6.8. syslog
  9. Happened again tonight. Had to powercycle the server. This is what I see in syslog before the kernel panic: Jun 8 19:12:30 Nasgard emhttpd: spinning down /dev/sdm Jun 8 19:57:31 Nasgard emhttpd: spinning down /dev/sdm Jun 8 20:42:32 Nasgard emhttpd: spinning down /dev/sdm Jun 8 20:43:27 Nasgard kernel: rcu: INFO: rcu_sched self-detected stall on CPU Jun 8 20:43:27 Nasgard kernel: rcu: 3-....: (59999 ticks this GP) idle=0fe/1/0x4000000000000000 softirq=1334521/1334530 fqs=14397 Jun 8 20:43:27 Nasgard kernel: (t=60001 jiffies g=3697701 q=125290) Jun 8 20:43:27 Nasgard kernel: NMI backtrace for cpu 3 Jun 8 20:43:27 Nasgard kernel: CPU: 3 PID: 24985 Comm: 7 Not tainted 5.10.28-Unraid #1 Jun 8 20:43:27 Nasgard kernel: Hardware name: Gigabyte Technology Co., Ltd. B450 AORUS PRO WIFI/B450 AORUS PRO WIFI-CF, BIOS F60e 12/09/2020 Jun 8 20:43:27 Nasgard kernel: Call Trace: Jun 8 20:43:27 Nasgard kernel: <IRQ> Jun 8 20:43:27 Nasgard kernel: dump_stack+0x6b/0x83 Jun 8 20:43:27 Nasgard kernel: ? lapic_can_unplug_cpu+0x8e/0x8e Jun 8 20:43:27 Nasgard kernel: nmi_cpu_backtrace+0x7d/0x8f Jun 8 20:43:27 Nasgard kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3 Jun 8 20:43:27 Nasgard kernel: rcu_dump_cpu_stacks+0x9f/0xc6 Jun 8 20:43:27 Nasgard kernel: rcu_sched_clock_irq+0x1ec/0x543 Jun 8 20:43:27 Nasgard kernel: ? trigger_load_balance+0x5a/0x1ca Jun 8 20:43:27 Nasgard kernel: update_process_times+0x50/0x6e Jun 8 20:43:27 Nasgard kernel: tick_sched_timer+0x36/0x64 Jun 8 20:43:27 Nasgard kernel: __hrtimer_run_queues+0xb7/0x10b Jun 8 20:43:27 Nasgard kernel: ? tick_sched_do_timer+0x39/0x39 Jun 8 20:43:27 Nasgard kernel: hrtimer_interrupt+0x8d/0x15b Jun 8 20:43:27 Nasgard kernel: __sysvec_apic_timer_interrupt+0x5d/0x68 Jun 8 20:43:27 Nasgard kernel: asm_call_irq_on_stack+0x12/0x20 Jun 8 20:43:27 Nasgard kernel: </IRQ> Jun 8 20:43:27 Nasgard kernel: sysvec_apic_timer_interrupt+0x71/0x95 Jun 8 20:43:27 Nasgard kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 Jun 8 20:43:27 Nasgard kernel: RIP: 0010:xas_descend+0x45/0x49 Jun 8 20:43:27 Nasgard kernel: Code: 44 c6 08 48 89 77 18 4c 89 c7 e8 77 ff ff ff 84 c0 74 13 49 c1 e8 02 44 89 c1 45 89 c0 49 83 c0 04 4e 8b 44 c6 08 41 88 49 12 <4c> 89 c0 c3 4c 8b 4f 18 48 89 fe 41 f6 c1 03 75 6b 4d 85 c9 4c 8b Jun 8 20:43:27 Nasgard kernel: RSP: 0018:ffffc90001467ba8 EFLAGS: 00000246 Jun 8 20:43:27 Nasgard kernel: RAX: 0000000000000000 RBX: ffff888106325370 RCX: 0000000000000007 Jun 8 20:43:27 Nasgard kernel: RDX: 0000000000000000 RSI: ffff888100686480 RDI: ffffea000464dd80 Jun 8 20:43:27 Nasgard kernel: RBP: 0000000000000b47 R08: ffffea000464dd80 R09: ffffc90001467bb8 Jun 8 20:43:27 Nasgard kernel: R10: ffffc90001467bb8 R11: ffff88842e8e2400 R12: ffffea000464dd80 Jun 8 20:43:27 Nasgard kernel: R13: ffff88810005c480 R14: 0000000000000b47 R15: 00000000ffffffe5 Jun 8 20:43:27 Nasgard kernel: ? xas_descend+0x2a/0x49 Jun 8 20:43:27 Nasgard kernel: xas_load+0x2d/0x39 Jun 8 20:43:27 Nasgard kernel: find_get_entry+0x57/0xba Jun 8 20:43:27 Nasgard kernel: find_lock_entry+0x15/0x4a Jun 8 20:43:27 Nasgard kernel: shmem_getpage_gfp.isra.0+0xf6/0x543 Jun 8 20:43:27 Nasgard kernel: shmem_getpage+0x12/0x14 Jun 8 20:43:27 Nasgard kernel: shmem_file_read_iter+0xaa/0x22f Jun 8 20:43:27 Nasgard kernel: generic_file_splice_read+0xf0/0x15e Jun 8 20:43:27 Nasgard kernel: splice_direct_to_actor+0xe4/0x1cd Jun 8 20:43:27 Nasgard kernel: ? generic_file_splice_read+0x15e/0x15e Jun 8 20:43:27 Nasgard kernel: do_splice_direct+0x94/0xbd Jun 8 20:43:27 Nasgard kernel: do_sendfile+0x185/0x24f Jun 8 20:43:27 Nasgard kernel: __do_sys_sendfile64+0x81/0xa7 Jun 8 20:43:27 Nasgard kernel: do_syscall_64+0x5d/0x6a Jun 8 20:43:27 Nasgard kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 8 20:43:27 Nasgard kernel: RIP: 0033:0x951dca Full syslog attached. syslog
  10. Can this be SW related. I updated to 6.9.2 last week. looks like some weird kernel error on my monitor.
  11. ok. does this option write to my usb boot drive? I added an additional usb thumb drive to my server. Local sys log is diabled, remote syslog is blank, mirror syslog to flash is set to yes. Looks ok?
  12. Hi Everyone- Woke up to day to find that my unraid server is unresponsive. I took a pic of what the server was sending to my monitor and attached the diagnostics. Last time this happened 4 days ago, it said that my btrfs 2x cache drives were unmountable and had to be reformatted. I was able to recover my cache data (plex metadata and binhex krusader docker data) and re-integrate it into my server. I ran a parity check that finished yesterday and didn't have any new sync errors. When it occured this morning, i couldn't initiate a clean shutdown bc the keyboard was unresponsive and i couldn't access the server remotely. When the server came back up, it initiated a parity check which i cancelled. This time, the cache drives are still mounted and part of the pool. In the past week, i've run memtest for 10 passes (>24 hrs) and reseated ram and sata cables. I haven't the slightest idea why this keeps happening. A little guidance would be helpful. Thank you. Server Down.zip
  13. What are your thoughts on my post about the steps I should take above?
  14. It's weird bc w every non correcting check, I'll get different # of parity errors. Now w both cache drives down, Im thinking it's hardware. So, if I understand this right, my course of action should be: 1. Get ssd back online 2. Run a non correcting check to get a baseline # of errors w/both dimm of ram in (since I never ran a correcting check) 3. Then run a correcting check 4. Make sure the # of errors corrected in step 3 matches up w/the # identified in step 2 5. Run scheduled non correcting checks monthly 6. if more errors identified, try taking out one dimm and repeat step 2&3 Does that sound ok? Also what do you think about keeping disks spinning at all times?
  15. Ok. So run memtest w 1 dimm for 24 hrs then run a non correcting check. If I get more than 1 error, repeat with other dimm? If ram not the culprit, where do I look next?
  16. Cache currently set to btrfs, data is xfs. Should I reformat Cache to xfs?
  17. I ran a 10 pass memtest prior to the cache drive becoming unmountable with 0 errors. Not sure why both cache drives went down at the exact same time. I reseated their cables and they are visible in bios. I'll look at the recovery options but this has never happened in the 2 years I've been using unraid. I feel that this might be related to my parity errors. Anything in the diagnostics that can identify the problem?
  18. I've been having some issues w/random parity sync errors over the last few weeks. I will get anywhere between 1-9 parity errors on my non-correcting checks. I've re-seated SATA cables/ram and run memtest for > 24 hrs w/o errors, but still would get parity errors on checks Just finished a non-correcting check this monring and had 1 error this morning (see attached). The consensus on this forum was to keep my drives spinning at all times bc that might be what's contributing to the rando errors i would get per parity check. When i tried to access my server later today to do this, i wasn't able to access my server. I couldn't initiate a clean shutdown via linux terminal, so had to reboot the system. A non-correcting parity check just started, but noticed an issue w/my dual cache drives. Under the main page, it states that both cache drives are "unmountable" and then it's asking if i want to format and create a file system on the unmountable drives. Is this a mobo issue and could this be the underlying cause of my sync errors i've had over the last few weeks? 1 error noncorrect 6042021.zip unclean shutdown, cache drives issue.zip
  19. Did you notice this issue after an unraid update or has it always been the case?
  20. Very cool. But the % option should still work, right? Not sure why that one isn't going through.
  21. Hi. Having some issues w/the plugin. I thought my first issue was bc i started a parity check after my 6 error run and then aborted it so the plugin was trying to utilize info from the aborted check. So I ran a non-correcting check (attached below) while also keeping an eye on my percentages and when the errors occured. I completed this non-correcting check w/1 error. So at 72.3% i had 0 errors, at 81.9% i had 1 error. At 72.3% all disks were spinning except d1, d2, d3. at 81.9%, the only disks that were spinning at 81.9% were my 2 parity drives and disk 7. per my syslog it looks like it happened at sector = 20466240808 Jun 4 00:32:22 Nasgard kernel: md: recovery thread: Q incorrect, sector=20466240808 I used these percentages and hit apply and the start check. It then removes my starting percentage from the field. It never initiates the check. When i put in a start and end sector of 20466240600 and 20466241000, respectively, i get the same error of "End point too large The end has been set to more than the size of the disk." Am i doing something wrong? Wanted to get this working and have disks continuing to spin up 1 error noncorrect 6042021.zip
  22. I'm counting 11 digits for the sector. That's what I thought I put in. That popup box would be helpful.