Mantene

Members
  • Posts

    72
  • Joined

Everything posted by Mantene

  1. Let me preface this by saying everything seems to be working great, so this seems like an error of no consequence at the moment, but I am seeing this:
  2. I can probably do all of those things. I am fairly sure I have a spare, though lower wattage, PSU. And just using two DIMMs is easy enough to try. However, I just started Prime95 cpu test so I will let that run for a few hours (if the PC stays up that long)! Thank you for the suggestions, I will report back when I have added information.
  3. May 5 08:58:30 Eeyore kernel: RSP: 0018:ffffc900007b78a0 EFLAGS: 00010202 May 5 08:58:30 Eeyore kernel: RAX: ffffea0005e41d80 RBX: ffffc900007b7940 RCX: 0000000000000006 May 5 08:58:30 Eeyore kernel: RDX: 0000000000000101 RSI: 17fec0817ed02b28 RDI: ffffc900007b78e8 May 5 08:58:30 Eeyore kernel: RBP: ffffc900007b7930 R08: 000000000000007f R09: ffffea0005e41d80 May 5 08:58:30 Eeyore kernel: R10: 0000000000000000 R11: ffff888103891500 R12: 000000000000000d May 5 08:58:30 Eeyore kernel: R13: ffff888103891500 R14: ffff8881026826c0 R15: 17fec0817ed02b08 May 5 08:58:30 Eeyore kernel: FS: 0000000000000000(0000) GS:ffff888ffea40000(0000) knlGS:0000000000000000 May 5 08:58:30 Eeyore kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 5 08:58:30 Eeyore kernel: CR2: 00001510c418d4e8 CR3: 000000000200a000 CR4: 0000000000350ee0 May 5 08:59:20 Eeyore kernel: mce: [Hardware Error]: Machine check events logged May 5 08:59:20 Eeyore kernel: [Hardware Error]: Corrected error, no action required. May 5 08:59:20 Eeyore kernel: [Hardware Error]: CPU:9 (17:71:0) MC2_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0x9c20400000020136 May 5 08:59:20 Eeyore kernel: [Hardware Error]: Error Addr: 0x00000001790531e0 May 5 08:59:20 Eeyore kernel: [Hardware Error]: IPID: 0x000200b000000000, Syndrome: 0x000171f21a4418f5 May 5 08:59:20 Eeyore kernel: [Hardware Error]: L2 Cache Ext. Error Code: 2, L2M Data Array ECC Error. May 5 08:59:20 Eeyore kernel: [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD May 5 08:59:20 Eeyore kernel: mce: [Hardware Error]: Machine check events logged May 5 08:59:20 Eeyore kernel: [Hardware Error]: Corrected error, no action required. May 5 08:59:20 Eeyore kernel: [Hardware Error]: CPU:1 (17:71:0) MC14_STATUS[Over|CE|MiscV|AddrV|-|SyndV|CECC|-|-|-]: 0xdc2040000004010b May 5 08:59:20 Eeyore kernel: [Hardware Error]: Error Addr: 0x00000001790531e0 May 5 08:59:20 Eeyore kernel: [Hardware Error]: IPID: 0x000700b020f50300, Syndrome: 0x000171f21a47010a May 5 08:59:20 Eeyore kernel: [Hardware Error]: L3 Cache Ext. Error Code: 4, L3M Data ECC Error. May 5 08:59:20 Eeyore kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: GEN
  4. Yep, it still does it with the Ram at 2133. I did safemode with docker and vms disabled. it stays up for longer, but it still seems to reboot randomly. So yes, I am also of the opinion that it is hardware. I just wish I knew which component. MB, CPU, or PSU are the main suspects.
  5. Thank you, @JorgeB for moving the thread to the correct forum. Apologies to @Squid for posting in the wrong place. I was in somewhat of a panic when I created the original thread. So, to address the comments of @John_M - I let MemTest run overnight and there were no errors in the morning - also the system did not reboot at all. I am now mirroring syslog to flash. I will attach a new diags bundle. Also, I have removed the modprobe i915 now - this used to be on an intel system, and that is a remnant. @ChatNoir PSU seems to be okay, but that is one of the more difficult pieces of hardware to know for sure. Cooling also seems okay - the cpu and mb temps hover around 45, one occasionally hits 60 but only ever for a few seconds and it has always been so. These were my first thoughts too. I have seen some errors relating to L2 or L3 cache in the syslog. Could that be the issue? Is there a way to test the CPU for faults? I am at a loss here. The system stays up if I boot into safe mode and don't mount the array. Once I mount the array it just takes minutes until an unexcepted reboot. @JorgeB - as to the memory overcocking - you are right! I had the XMP turned on for my ram. I turned it off this AM and the current speed should be 2133. I deeply appreciate all the help you are all providing. Any ideas what my next steps should be? eeyore-diagnostics-20210505-1013.zip
  6. Oh, it even does it in safe mode. I am ready to throw the box out the window
  7. Oh, it even does it in safe mode. I am ready to throw the box out the window
  8. Yes, I read that a while back. And I do not overclock my RAM (or my CPU), I am using approved RAM and all the same RAM in all the slots. Also, my power settings are correct so the c-state should not be an issue. And again, I haven't made any changes to any of that recently and I have been running Unraid for quite some time now. Also, here is a diagnostics from a regular boot - it crashed about 20 seconds after I got this! eeyore-diagnostics-20210504-1730.zip
  9. Yes, I read that a while back. And I do not overclock my RAM (or my CPU), I am using approved RAM and all the same RAM in all the slots. Also, my power settings are correct so the c-state should not be an issue. And again, I haven't made any changes to any of that recently and I have been running Unraid for quite some time now. Also, here is a diagnostics from a regular boot - it crashed about 20 seconds after I got this! eeyore-diagnostics-20210504-1730.zip
  10. I don't know what is going on. All of the sudden my server is rebooting what seems like every few minutes. I have been running 6.9.2 since it first came out, so it isn't like I am running a beta release. And I haven't made any configuration changes to the server. In fact, I was simply using the Windows 10 VM when it started this behavior. I am attaching the diagnostic data from Safe Mode with the array started. Yes, it seems to work in safe mode. I know that some plugins got updated today but I honestly don't know which ones - unassigned devices? But that shouldn't cause this, right? Please help! eeyore-diagnostics-20210504-1719.zip
  11. I don't know what is going on. All of the sudden my server is rebooting what seems like every few minutes. I have been running 6.9.2 since it first came out, so it isn't like I am running a beta release. And I haven't made any configuration changes to the server. In fact, I was simply using the Windows 10 VM when it started this behavior. I am attaching the diagnostic data from Safe Mode with the array started. Yes, it seems to work in safe mode. I know that some plugins got updated today but I honestly don't know which ones - unassigned devices? But that shouldn't cause this, right? Please help! eeyore-diagnostics-20210504-1719.zip
  12. Getting: Mar 8 09:58:03 Eeyore kernel: caller _nv000708rm+0x1af/0x200 [nvidia] mapping multiple BARs Mar 8 09:58:04 Eeyore kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Mar 8 09:58:04 Eeyore kernel: caller _nv000708rm+0x1af/0x200 [nvidia] mapping multiple BARs Mar 8 09:58:05 Eeyore kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Mar 8 09:58:05 Eeyore kernel: caller _nv000708rm+0x1af/0x200 [nvidia] mapping multiple BARs Mar 8 09:58:07 Eeyore kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] In my log file constantly. What do I have configured incorrectly? eeyore-diagnostics-20210308-0959.zip
  13. I just wanted to come back and report that uninstalling atop did the trick. My log has been sitting at 1% Full for the past two weeks. Thank you, @JorgeB and @trurl!
  14. Yep, that is what i decided after the first response.
  15. Thanks! I will remove it and see what develops!
  16. Can someone please help me figure out why my log is filling up? Please let me know if there is anything else I can provide to help solve this issue. I would prefer not to reboot every week to keep the log from hitting 100%! Thank you, Matt eeyore-diagnostics-20201118-0841.zip
  17. I added a second nVidia card to my box and now I cannot get my windows 10 vm to start up. The monitor keeps flashing on and off every minute, as though it sends a signal and then stops. I cannot find anything in the logs. ANY help would be greatly appreciated. eeyore-diagnostics-20200927-1747.zip
  18. I think the @limetech people need to fix this. The new build of Windows 10 still does not allow AMD procs to use Hyper V.
  19. Nothing from me. I keep hoping some unraid update will fix this.
  20. This is very similar to my build. Can I ask, were you able to get Hyper-V enabled in Windows 10 (or is it not something you even use so you don't care)?
  21. Until that is a feature, I would think it should be possible to build wireguard into the containers. I haven't experimented with that, but I would imagine it should be possible. I could be very very wrong though.
  22. So I am guessing no one can help. Okay. I will keep playing around.
  23. Does anyone have WSL2 working with an AMD processor?