AlaskanBeard

Members
  • Posts

    10
  • Joined

  • Last visited

Everything posted by AlaskanBeard

  1. It's not helpful to anything, but I figured I'd mention I'm having the same issue on the exact same hardware. I'm running a 7302P on a Supermicro H11SSL-I with 4x16GB DDR4 (it was 8x16, but I took half out to troubleshoot). I was also getting some weird crashes due to my bios resetting and cstates getting re-enabled, but after I fixed that I started having memory crashes.
  2. Thanks again for your help! The issue was twofold. The first, is that my bios was getting reset; despite no warnings in ipmi, the CMOS battery was bad, and replacing that seems to have solved the hard crashes I was seeing (now that it remembers c-states are supposed to be disabled). I was able to get a little over 6 days of uptime before I re-enabled TDarr (more on that below). The other issue I was seeing is that sometimes it wouldn't crash and reset, it would just hit 100% CPU usage and become unresponsive. I was able to kill docker one time this happened, and while the system didn't recover, I did see CPU usage drop to a more normal level. From there, I started experimenting with the containers I'm running, and long story short, it's 100% TDarr. For whatever reason, on this hardware while I'm using my GPU to re-encode videos it just saturates the CPU. There's a setting in the application to set ffmpeg priority to low and that seems to have fixed the issue. If someone somehow stumbles on this from Google, the TDarr setting is under GPU > Options > Low FFmpeg/HandBrake process priority.
  3. Unfortunately I don't have any power supply idle control settings in my bios, but I do have global C-States disabled. The last crash happened ~16 hours ago. I believe that boot finished at ~September 17th 12:17 in the logs, and then I I installed rc5 and rebooted at 12:35, if I'm reading the logs right. And it looks like the boot started at 12:15:39 after the most recent crash. syslog
  4. I'm unfortunately still having issues. I haven't had an ECC error logged since I took out the one DIMM, so I do think that was an issue. unRAID has locked up a couple times due to CPU and Memory consumption. The memory consumption I've solved by just powering off a couple of my containers. The CPU consumption, I think I've fixed as well. I'd read a couple threads where cache drive corruption was an issue, and I decided now was as good of time as any to replace my cache array with a single nvme, and I haven't had any CPU consumption issues since. After all that, I managed ~45 hours of uptime before unRAID crashed. I've been using grafana to check CPU and memory usage when crashes happen, and this most recent crash has CPU usage right at 50% and memory at 2.5GB free, with 26GB used and 19GB for cache+buffer, so I'm thinking it's not a memory consumption issue either? I'm not sure how much memory unRAID needs free at any given time. I've attached new diagnostics generated after this most recent crash. I've also updated to 6.11 rc5 (after the crash), and the crash happened on rc4. tower-diagnostics-20220917-1327.zip
  5. It's hard to say for sure, since one of my crashes happened after 2 days of uptime, but it does seem to be doing fine with what I think is the problem DIMM removed. Only issue now is I'm running out of memory haha. I'm going to stop some of my containers to reduce memory usage and I'll report back in a few days if I don't have any crashes. Thanks for the suggestion!
  6. I recently moved unraid over to a server with an AMD CPU (Epyc 7551P) and my server has been crashing ever since. So far uptime has ranged from 2 hours to ~50, with it typically crashing around the 5 hour mark. Aside from CPU, memory, and motherboard, I haven't made any other changes. So far I've ran two memtest passes without error, and I've disabled global c-states in the BIOS. I have server health logging enabled in my IPMI as well and the only thing logged there is "Correctable ECC / other correctable memory error @DIMMC1 - Assertion", however there's one of these messages for each DIMM each time a crash happens. I've also tried upgrading from 6.10.3 to 6.11.0-rc4 with the same behavior and I've since reverted to 6.10.3. tower-diagnostics-20220912-1227.zip
  7. I would really like to see the new/trending/top new installs links back on the sidebar. I use them for discovery a lot of the time, and right now you have to Show More, do your browsing, click on the apps link again, then Show More on the next category. That's my only real complaint with the redesign, I'm happy to have the option to toggle descriptions as well. Thanks for all the work, Squid!
  8. I actually had the same thought but I haven't had a chance to shut my sever down to verify. I'll report back when I can check (hopefully tomorrow). Thanks for the response! EDIT: I went ahead and pulled the card and it's a GTS 450
  9. I'm having issues getting my 1050 Ti working. I see it in my system devices but in my syslog I see it try to initialize a few times with the message "The NVIDIA GPU -device id- installed in this system is not supported by the NVIDIA 465.27 driver release, etc, etc, None of the NVIDIA devices were initialized. According to the supported devices list on Nvidia's site for this driver version, the 1050 Ti should be supported. I've also tried the production branch with the same results. So far I've added "video=efifb:off" to my syslinux.cfg and verified I don't have it bound to vfio. Any help would be appreciated! Edit: I also just tried with driver 465.24.02 without luck. tower-diagnostics-20210502-1444.zip
  10. It looks like the OrganizrV2 docker hub was changed. It's now at: https://hub.docker.com/r/organizr/organizr and the current build is organizr/organizr. I updated my settings with the new URL & build and it updated without issue.