backlands

Members
  • Posts

    8
  • Joined

  • Last visited

Everything posted by backlands

  1. Thanks for putting this one together, you should add parameters for UID and GID to the docker so we can specify which user the docker runs as, otherwise if you have multiple users the file ownership can get mixed up. Correction to my wording, you should add them to the default template as 99/100 as other unRAID dockers do that we can set them easily since the image already supports this.
  2. Alright, I have been monitoring things and have not seen this issue again after cleaning and reseating the PCIe card for my backplane into a new slot, as well as the connectors at both ends. I think this was a random fault due to something at one of these points. Thanks for the help on this, this issue is solved now.
  3. I have been continuing to monitor the system and was having a run of stability but once again it went down, the error notice from iDRAC is "OS Stop: unknown event" and the syslog ends with the following messages and nothing after. Let me know if you would like the full syslog, unfortunately I can't provide diagnostics as the system was unreachable after this error so I can't quickly anonymize the log file. Aug 12 17:09:57 NUCLEAR-WINTER emhttpd: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog Aug 12 17:37:37 NUCLEAR-WINTER emhttpd: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog Aug 13 04:00:01 NUCLEAR-WINTER Plugin Auto Update: Checking for available plugin updates Aug 13 04:00:06 NUCLEAR-WINTER Plugin Auto Update: Community Applications Plugin Auto Update finished Aug 13 04:40:01 NUCLEAR-WINTER root: Fix Common Problems Version 2020.08.02 Aug 13 04:40:01 NUCLEAR-WINTER kernel: BUG: unable to handle kernel paging request at 0000000000ffffa0 Aug 13 04:40:01 NUCLEAR-WINTER kernel: PGD 80000011364ec067 P4D 80000011364ec067 PUD 11af050067 PMD 0 Aug 13 04:40:01 NUCLEAR-WINTER kernel: Oops: 0000 [#1] SMP PTI Aug 13 04:40:01 NUCLEAR-WINTER kernel: CPU: 12 PID: 672 Comm: curl Tainted: G W I 4.19.107-Unraid #1 Aug 13 04:40:01 NUCLEAR-WINTER kernel: Hardware name: Dell Inc. PowerEdge R710/00W9X3, BIOS 6.6.0 05/22/2018 Aug 13 04:40:01 NUCLEAR-WINTER kernel: RIP: 0010:vma_interval_tree_remove+0x1d4/0x231 Aug 13 04:40:01 NUCLEAR-WINTER kernel: Code: 80 e6 01 48 0f 45 fa 48 89 fd eb 4e 48 8b 50 b0 48 2b 50 a8 48 8b 48 40 48 c1 ea 0c 48 8d 54 0a ff 48 8b 48 10 48 85 c9 74 0b <48> 8b 49 18 48 39 ca 48 0f 42 d1 48 8b 48 08 48 85 c9 74 0b 48 8b Aug 13 04:40:01 NUCLEAR-WINTER kernel: RSP: 0018:ffffc900081d3c38 EFLAGS: 00010206 Aug 13 04:40:01 NUCLEAR-WINTER kernel: RAX: ffff8891ac7af4fc RBX: ffff889085e6f800 RCX: 0000000000ffff88 Aug 13 04:40:01 NUCLEAR-WINTER kernel: RDX: ffffffffffffffff RSI: ffff8891eb205aa0 RDI: ffff889085e6f800 Aug 13 04:40:01 NUCLEAR-WINTER kernel: RBP: 0000000000000000 R08: ffffffff811011d3 R09: ffff889085e6e658 Aug 13 04:40:01 NUCLEAR-WINTER kernel: R10: ffff889085e6f858 R11: ffff889085e6eaa0 R12: ffff8891eb205aa0 Aug 13 04:40:01 NUCLEAR-WINTER kernel: R13: ffff889085e6f858 R14: 0000000000000000 R15: 0000000000001000 Aug 13 04:40:01 NUCLEAR-WINTER kernel: FS: 0000148db728e700(0000) GS:ffff8891f7b00000(0000) knlGS:0000000000000000 Aug 13 04:40:01 NUCLEAR-WINTER kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 13 04:40:01 NUCLEAR-WINTER kernel: CR2: 0000000000ffffa0 CR3: 0000000175330000 CR4: 00000000000006e0 Aug 13 04:40:01 NUCLEAR-WINTER kernel: Call Trace: Aug 13 04:40:01 NUCLEAR-WINTER kernel: __vma_adjust+0x273/0x58c Aug 13 04:40:01 NUCLEAR-WINTER kernel: ? memcg_kmem_get_cache+0xb9/0x1a0 Aug 13 04:40:01 NUCLEAR-WINTER kernel: __split_vma+0x10d/0x16f Aug 13 04:40:01 NUCLEAR-WINTER kernel: do_munmap+0x159/0x2c0 Aug 13 04:40:01 NUCLEAR-WINTER kernel: ? vma_link+0x6f/0x7c Aug 13 04:40:01 NUCLEAR-WINTER kernel: mmap_region+0xfe/0x41b Aug 13 04:40:01 NUCLEAR-WINTER kernel: do_mmap+0x403/0x459 Aug 13 04:40:01 NUCLEAR-WINTER kernel: vm_mmap_pgoff+0x91/0xde Aug 13 04:40:01 NUCLEAR-WINTER kernel: ksys_mmap_pgoff+0x17c/0x1bb Aug 13 04:40:01 NUCLEAR-WINTER kernel: do_syscall_64+0x57/0xf2 Aug 13 04:40:01 NUCLEAR-WINTER kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Aug 13 04:40:01 NUCLEAR-WINTER kernel: RIP: 0033:0x148db7f90512 Aug 13 04:40:01 NUCLEAR-WINTER kernel: Code: eb aa 66 0f 1f 44 00 00 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd 53 89 cb 48 85 ff 74 33 41 89 da 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 56 5b 5d c3 0f 1f 00 c7 05 16 ec 00 00 16 00 Aug 13 04:40:01 NUCLEAR-WINTER kernel: RSP: 002b:0000148db728cad8 EFLAGS: 00000206 ORIG_RAX: 0000000000000009 Aug 13 04:40:01 NUCLEAR-WINTER kernel: RAX: ffffffffffffffda RBX: 0000000000000812 RCX: 0000148db7f90512 Aug 13 04:40:01 NUCLEAR-WINTER kernel: RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000148db7076000 Aug 13 04:40:01 NUCLEAR-WINTER kernel: RBP: 0000148db7076000 R08: 0000000000000005 R09: 0000000000005000 Aug 13 04:40:01 NUCLEAR-WINTER kernel: R10: 0000000000000812 R11: 0000000000000206 R12: 0000148db00015f0 Aug 13 04:40:01 NUCLEAR-WINTER kernel: R13: 0000148db728cec8 R14: 0000000000000004 R15: 0000000000000002 Aug 13 04:40:01 NUCLEAR-WINTER kernel: Modules linked in: macvlan veth xt_nat ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables ipmi_devintf ipmi_si md_mod xfs nfsd lockd grace sunrpc bonding bnx2 sr_mod cdrom intel_p> Aug 13 04:40:01 NUCLEAR-WINTER kernel: CR2: 0000000000ffffa0 Aug 13 04:40:01 NUCLEAR-WINTER kernel: ---[ end trace 52838674856dc5cd ]--- Aug 13 04:40:01 NUCLEAR-WINTER kernel: RIP: 0010:vma_interval_tree_remove+0x1d4/0x231 Aug 13 04:40:01 NUCLEAR-WINTER kernel: Code: 80 e6 01 48 0f 45 fa 48 89 fd eb 4e 48 8b 50 b0 48 2b 50 a8 48 8b 48 40 48 c1 ea 0c 48 8d 54 0a ff 48 8b 48 10 48 85 c9 74 0b <48> 8b 49 18 48 39 ca 48 0f 42 d1 48 8b 48 08 48 85 c9 74 0b 48 8b Aug 13 04:40:01 NUCLEAR-WINTER kernel: RSP: 0018:ffffc900081d3c38 EFLAGS: 00010206 Aug 13 04:40:01 NUCLEAR-WINTER kernel: RAX: ffff8891ac7af4fc RBX: ffff889085e6f800 RCX: 0000000000ffff88 Aug 13 04:40:01 NUCLEAR-WINTER kernel: RDX: ffffffffffffffff RSI: ffff8891eb205aa0 RDI: ffff889085e6f800 Aug 13 04:40:01 NUCLEAR-WINTER kernel: RBP: 0000000000000000 R08: ffffffff811011d3 R09: ffff889085e6e658 Aug 13 04:40:01 NUCLEAR-WINTER kernel: R10: ffff889085e6f858 R11: ffff889085e6eaa0 R12: ffff8891eb205aa0 Aug 13 04:40:01 NUCLEAR-WINTER kernel: R13: ffff889085e6f858 R14: 0000000000000000 R15: 0000000000001000 Aug 13 04:40:01 NUCLEAR-WINTER kernel: FS: 0000148db728e700(0000) GS:ffff8891f7b00000(0000) knlGS:0000000000000000 Aug 13 04:40:01 NUCLEAR-WINTER kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 13 04:40:01 NUCLEAR-WINTER kernel: CR2: 0000000000ffffa0 CR3: 0000000175330000 CR4: 00000000000006e0
  4. I double checked to confirm, I do have mcelog installed so it should be logging additional info. My current understanding is that mcelog is built into the diagnostics export so that "The output of mcelog (if installed) has been logged". I also checked my previously posted diagnostics to confirm that it is installed and found the following in syslog2.txt Jul 25 22:11:04 NUCLEAR-WINTER nerdpack: Installing mcelog-161 package... Jul 25 22:11:04 NUCLEAR-WINTER root: Jul 25 22:11:04 NUCLEAR-WINTER root: Installing mcelog-161 package... I haven't noticed any machine check events this boot and will continue keeping an eye on things unless you have additional steps I should take at this time. I am wondering if there are any other packages I should grab from NerdPack that might assist with this further? I may also shutdown and run a memtest this weekend, do you think that is worthwhile at this point?
  5. Thanks for looking into that Johnnie! I am not sure why the HBA crashed like that, I am moving it over to another slot on the board and will keep an eye on it. Hopefully nothing shows up again. I don't think it is a cooling issue as I am typically seeing temps across the board around 40C at load and 30C at idle. This weekend I might bring the system down and do a deep clean of the fans and everything. I checked the iDRAC card on the system for any logged events for the Hardware Errors you mentioned as well and found nothing useful there, is there anywhere else I can find more details on what devices were giving the errors? I searched for details on this and what I found noted to check the logging on the board (which would be iDRAC for me from my understanding) but let me know if I should look elsewhere. Thanks again for your help, I really just want to get a stable system and really hope I can get that with unRAID with some troubleshooting.
  6. Hey all, I have been running unRAID for a while now and this morning when I accessed my Plex account to watch some content my entire array crashed and upon restarting one of my drives was coming back to me with a "Device Disabled, Content Emulated" issue. I checked the drive and the SMART reports are coming back clean and healthy from what I can tell, aside from usual wear/tear. Since everything on the drive looks good I have started rebuilding the parity as it looks like the drive became out of sync with the existing parity while things were in a weird state. I have attached my diagnostics from when the array crashed (before restarting) as well as the SMART report for the drive in question, I am hoping with the expertise in these forums someone is able to point me in the right direction on this one. For details, I am running unRAID 6.8.3 on a Dell R710, 1 Parity drive, 4 Data drives, and 1 Cache drive. Disk 1 is the drive that was showing the Device Disabled notification and is currently having it's data rebuilt. Feel free to bug me if you would like any additional information along with the diagnostics and SMART report. nuclear-winter-diagnostics-20200804-1236.zip nuclear-winter-smart-20200804-1655.zip
  7. Ah interesting, I think it may be related to a bad USB3.0 PCI card I have installed. That 4TB drive is the only one on that card. I have unplugged and rebooted. Drives are showing no errors now and parity is running (around 70% done with no parity errors). I will come back if I start seeing more errors with the 4TB external drive removed. Thank you for the assistance Johnnie! On another note, do you have any recommendations for a USB3.0 PCI card that doesn't require an additional power connector? Only needs to support 2 drives at most, although more is always better.
  8. Hey everyone, overnight I started getting read errors on all my drives going from 0 errors to total of 15 read errors across 4 data drives and 1 parity drive in my array. I am at a loss at the moment and I am uncertain how to move forward without large amounts of data loss. I am not overly concerned about most of the data and have the really important stuff on offsite backups. I am hoping that someone can shed some light into what is happening with my array right now because it is very unsettling to have errors on every single one of my drives overnight. The odd thing is that my cache drive shows no errors but I also cannot access it and my docker containers are all hanging and unresponsive to any start/stop commands. I also find it odd that I am receiving errors about BTRFS when I have no BTRFS file systems. Jul 18 09:00:01 NUCLEAR-WINTER kernel: print_req_error: I/O error, dev loop2, sector 5441336 Jul 18 09:00:01 NUCLEAR-WINTER kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 43, rd 3655, flush 0, corrupt 0, gen 0 Attaching diagnostics in the hopes that someone on here can assist me. Thank you in advance. nuclear-winter-diagnostics-20200718-1000.zip