Jump to content

meep

Members
  • Posts

    764
  • Joined

  • Last visited

About meep

  • Birthday April 29

Converted

  • Gender
    Male
  • URL
    http://mediaserver8.blogspot.ie
  • Location
    Ireland

Recent Profile Visitors

6,438 profile views

meep's Achievements

Enthusiast

Enthusiast (6/14)

74

Reputation

1

Community Answers

  1. I use HDBaseT to run displays and peripherals from a VM on my unRaid server to a desk in a different building. I write a bit about it here; https://mediaserver8.blogspot.com/2019/07/routing-vms-anywhere.html I'm currently pushing a single display 1920x1080 @ 60Hz, and you'd need one transmitter / receiver per display. Not sure if they hit 144HZ, but you could check specs on AV Access devices, or similar technology. Mine runs across about 60M of Cat6A cable and is very stable.
  2. Ah, something that must not have come through when I migrated, or perhaps something new. Is this what you refer to? So maybe my Switch Ultra could have worked after all?
  3. The device shows as adopting; Then goes offline for a moment; Then repeats on a loop. On the first try, my logs show success; But the adopting / offline loop just keeps going. Docker Inspect shows the docker IP to be 172.17.0.10 When I had the Ultra, I couldn't SSH into it (not supported), but your post just reminded me this was a switch with SSH supported; I did a set inform to the IP of my unraid server, et voila, it worked! Though that brings up my next question....... Whenever I have occasion to stop and restart this docker, my USG-Pro-4 'forgets' its inform UL and I need to log in to the device UI and reset it. I suspect I'm going to have the same issue with this switch. Any thoughts on that?
  4. I migrated from the legacy Unifi docker to this a few months ago and all seemed well, but now I'm having trouble adopting devices. (on repo /unifi:8.3.32-unraid) I had a Switch Ultra that started acting up and went into some kind of adoption / offline loop, and though it worked OK , and showed as adopted in logs, I could not access or configure it in the UI. I contacted Unifi support who ultimately requested that I DMA the switch. However, Now I've acquired a USW-Pro-24-PoE and it's doing the exact same thing!! I like running my controller in unRaid, but with the deprecation, migration hassle, and now this nonsense, I'm thinking of abandoning it and getting a Unifi key altogether. Any insights?
  5. Oh sure. This was not a response to anything, just a general interjection.
  6. A bit late to the party, but I found the process of migrating from old Unifi-Controller to this a little obtuse with a few speed bumps along the way. I documented my process for my future self, but it may help others. https://mediaserver8.blogspot.com/2024/06/migrating-deprecated-unifi-docker-on.html
  7. Hi, sorry for the late reply. I’m not sure why I didn’t receive a thread update notification. Yes, I still have the card, and managed to get it working reliably in my system by shuffling around some cards / slots. However, it really is a nice to have as I’ve lots of other USB controllers I’m not using. Would consider sale.
  8. So after a full weekend, and re-enabling all the various hardware and apps, the server has been stable. It's reasonable to say that disabling c-states was the fix for the issues I was encountering. These have been enabled since I built the server in August 2019, and the system has worked perfectly right through to 6.11.5. Only when I upgraded to 6.12 did regular crashes start, and these seemed to escalate in frequency with every point release I installed. At least now, thanks to the exhaustive testing, I've identified and removed a few CPU intensive plug ins I didn't really need, and identified a faulty SSD, so there's that at least.
  9. I believe power supply idle control was already set to normal, but I'll double check when I next reboot . I want to have the server running for 24 hours without issue before restarting and adding back some if my expansion cards.
  10. So we might have a winner here..... It's an oldie, but a goldie. Working through any and all possible troubleshooting, I disabled c-states in my bios and have had my first error free night in several weeks Not fully confirmed yet, and super confusing why this would impact my system now and not several years ago when this was actually a thing. Will report back and hopefully close out the thread if this turns out to be the actual solution.
  11. That's the plan. Though I see quite a few similar reports here but also on Reddit. I think there's something fundamentally problematic in 6.12.x
  12. I asked CHAT GPT for an analysis. Here's what I got back..... It seems like you're experiencing a recurring issue on your UNRAID server, and I'll walk you through the analysis of the provided logs. The logs indicate that there is a problem with your system's kernel, and it's encountering a "general protection fault." This is a type of error that occurs when the kernel detects that a process is attempting to access a part of memory it's not allowed to, often due to a software bug. Let's break down the information in the log: Timestamp and Process Information: The log entry starts with a timestamp indicating when the issue occurred. It mentions the process name (disk_load) and the PID (Process ID) of the process encountering the issue. Hardware Information: It provides information about the hardware, including the motherboard model and BIOS version. This is important because hardware-related issues can sometimes lead to kernel faults. Stack Trace: The RIP line indicates the Instruction Pointer at which the issue occurred. In this case, it points to the list_del function in the kernel code. The Code section shows the assembly code at the RIP location. The RSP register contains the Stack Pointer at the time of the fault. Registers like RAX, RBX, RCX, etc., show the values stored in those registers when the fault happened. Call Trace: This section provides a call stack trace, showing the sequence of function calls that led to the issue. It starts from the top-level function (list_del) and goes deeper into the kernel functions. Modules Linked In: This lists the kernel modules that are currently loaded. Kernel modules are pieces of code that can be loaded into the kernel to add functionality or drivers. Additional Information: The logs also indicate that this issue is affecting CPU 21. There are mentions of "rcu_preempt detected stalls." RCU (Read-Copy-Update) is a synchronization mechanism in the Linux kernel. These stalls might be related to the issue you're experiencing. Based on this information, it's clear that the issue is occurring in the kernel code, specifically in the list_del function, which is part of a linked list manipulation. The most likely cause of this issue is a software bug in the kernel or a kernel module. Kernel bugs can be very complex to diagnose and often require the expertise of kernel developers. Here are some steps you can take to further diagnose and potentially resolve the issue: Check for Updates: Ensure that your UNRAID server is running the latest kernel and kernel modules. Sometimes, updating the kernel can resolve known issues. Check for Specific Software: Determine if the issue started happening after installing or updating specific software or kernel modules. Try disabling or reverting those changes if possible. Kernel Debugging: Consider enabling kernel debugging and collecting more detailed logs. Kernel developers can use these logs to identify the exact cause of the issue. Community Forums: Check UNRAID or Linux kernel community forums for similar issues. Others may have encountered and solved similar problems. Contact Support: If the issue persists, consider contacting UNRAID support or the support channels for your specific hardware components. They may have specific insights or solutions. Hardware Testing: In some cases, hardware issues can manifest as kernel faults. Test your hardware components (e.g., RAM, CPU, motherboard) for any problems. Keep in mind that diagnosing and resolving kernel-level issues can be challenging, and it's essential to take precautions to avoid data loss during troubleshooting. If you're not comfortable with these steps, seeking assistance from experienced professionals or support channels is advisable.
  13. @Dimtar I'm on IPVlan (not macvlan) @SirLupus I haven't tried that, but like you, I need SMB so not really an option I've now spent DAYS peeling back the onion, removing all cards etc. and adding them back in one at a time. I thought I'd solved it in that I identified a bad SSD (that I wasn't even using) and removed that and seemed to gain some stability. However, overnight last night, it all came tumbling down again (during a parity check). I'll paste a bit of the log captured below, but it looks like something is tripping up the kernel and then continues to generate a kernel exception every 3 mins exactly. In a previous version of this, I could see these were reporting issues in smartctl, which led me down the path to find the bad drive, but here I have my 3 minute exceptions back, and no smartctl references. Really stumped. Next thing is move back to 6.11.x, but I cant be staying on that forever (assuming it works). What's up here @unraid ??? Here's the start of the issue overnight. The 3rd one just keeps repeating every 3 mins exactly until the whole system locks up, or at the very least, the GUI freezes out and becomes unusable.
  14. I would need to go all the way back to the last 6.11.x release, as thats the last time I had stability. I have the same inclination, which is why I'm currently focised on removing hardware, and will look into RAM, CPOU and Drive connections next.
  15. So with Docker and VMs disabled, the system is still generating multiple GPFs and Kernel crashes. I've attached todays Syslog that shows a boot up sequence around 8:40, with GPFs starting after 10:00. I did manage to do a clea shutdown, so there's that. Next, I'm going to remove any additional non-essential hardware such as extra GPUs, USB conbtroller etc. If still problematic, I'll boot to safe mode to eliminate PlugIns and after that, it's going to be a CPU and RAM re-install. Arrggghhhh. syslog_Sept15
×
×
  • Create New...