Jump to content

jonp

Members
  • Posts

    6,443
  • Joined

  • Last visited

  • Days Won

    24

Posts posted by jonp

  1. 58 minutes ago, pelux said:

    I've done this and can now access the GUI but cannot start the array due to invalid key.  I'm assuming the next step is to replace the key with a new one bound to the new flash GUID but this note has me a bit worried:

    Note: Replacing a Registration key results in permanently blacklisting the previous USB Flash GUID.

     

    Does this mean that if things don't work out with this new flash drive, I won't have the option to fall back on my current, semi working, drive.  Even by replacing the key again?

    That's correct.  Once a flash device is blacklisted, it can't be used with Unraid anymore.  There is definitely something amiss with the config/plugins on your old flash drive.  You can try going back to that device and redoing it the same way I had you do the new device and see if that works.

  2. Ok, redo your new flash drive so it's fresh and clean (no old config folder copied in there yet).  Next, copy over the following files from your old config folder to the new one:

     

    super.dat

    disk.cfg

    ident.cfg

    passwd

    Plus.key

    smbpasswd

     

    You should also copy over the "shares" subfolder, but do not copy over any of your plugins.  Then start up the flash and see if that works.

  3. On 7/27/2020 at 6:23 AM, [email protected] said:

    Good Day All

    I an New to the Unraid world and know my Way around NAS and Servers but i keep getting Freeze ups when doing random Task on my Server, some days when i let it Run and do nothing but as soon as i copy data over to a Share or Run a VM like MacinaBox the whole System Freezes up. 

    where the Web Ui does not respond and Network Drives go offline and the Server Freezes, i have also Ran it in GUI mode on the Server and it does the Same not matter the Mode i launch it in.

     

    Please help 

    hydrogen-diagnostics-20200727-1256.zip 96.12 kB · 1 download

     

    Okay update on the 28 July 2020

     

    Ran a Mem Test and all passed fine 

     

    Also i get this error on boot up 

    20200728_192210.jpg

    hydrogen-diagnostics-20200728-2024.zip 109.88 kB · 0 downloads

    Hi there,

     

    What I think you'll need to do is either setup a syslog server or with the monitor/keyboard connected, login to the console and type the following command right after it boots:

     

    tail /var/log/syslog -f

     

    This will begin printing the log out to the screen.  When the crash occurs, capture what you can off the monitor and post it back.  I'd also check for any BIOS updates to see if something may be amiss in that configuration.

  4. Hi there,

     

    One major question before we can start diagnosing the issue is in regards to this:

     

    On 7/26/2020 at 5:25 PM, Derrikdj said:

    but in the last few days the network connection to my server suddenly disconnects

     

    What happened in the last few days?  I'm asking because it's very unusual for a working system to all of the sudden start exhibiting this kind of behavior without any changes to the software or hardware.  Did you recently update Unraid and this started occurring after that?  What about updates to your router or switch?  Any hardware changes on the server or network recently?  Also from looking at the logs, it appears the behavior doesn't start until a little over 7 hours after you've booted it up.  Does that sound right?

     

    Another thing you can try is to disable the use of eth0 and eth1 in the bonding group.  Stop the array and navigate to the Network Settings page and try taking those unused devices out of the bond configuration.

  5. Ok, I took another look at your system diagnostics.  First let's try this.  If you have another USB flash lying around, please download and install the latest Unraid release onto it (not NVIDIA build) and try and boot your server using that (remove your primary USB and set aside during this test).  The sole purpose of this is to determine if a stock configuration of Unraid loads correctly or not.  You have a lot of plugins installed that could be causing issues, so this is a way to verify whether or not that is the case here.  Alternatively you can try booting Unraid in safe mode.

  6. On 7/13/2020 at 4:04 PM, ThePockets said:

    @jonp Sorry for the late update, thank you all for the suggestions! Using OVMF + Q35 did solve the problem of it crashing the entire server. If I continue using VNC for graphics it looks like I can pass anything else through fine. I passed a USB controller and onboard audio through and both worked great. If I try to pass through my GPU though, all I get is a black screen. On the VM logs I get this: 

    
    2020-07-09 02:31:15.643+0000: Domain id=1 is tainted: high-privileges
    2020-07-09 02:31:15.643+0000: Domain id=1 is tainted: custom-argv  				<---- THIS LINE IS MARKED RED
    2020-07-09 02:31:15.643+0000: Domain id=1 is tainted: host-cpu
    char device redirected to /dev/pts/0 (label charserial0)
    2020-07-09T02:31:17.580603Z qemu-system-x86_64: -device vfio-pci,host=0000:08:00.0,id=hostdev0,bus=pci.2,addr=0x0,romfile=/mnt/cache/domains/vbios/TU116_edited.rom: Failed to mmap 0000:08:00.0 BAR 3. Performance may be slow		<----- THIS LINE IS MARKED YELLOW
    2020-07-09T02:48:15.583801Z qemu-system-x86_64: terminating on signal 15 from pid 6361 (/usr/sbin/libvirtd)

    I dumped the GPU's vBIOS using GPU-Z on another computer, and I tried using an unedited vBIOS as well as the changes that SpaceInvader One explained in his NVIDIA GPU passthrough video. Neither version seems to work. I have been trying to troubleshoot these problems (hence why I haven't replied in a while), but I haven't found anything that has helped. Thanks again for your help!

    Ok, so is it fair to say that in your current state, using OVMF + i440fx GPU pass through "works" but you get a server crash.  Using OVMF + Q35 "works" for everything BUT GPU passthrough.  Correct?  What about Q35 + SeaBIOS?  Did you try that and did that work for GPU pass through?

  7. Hi again,

     

    I'm a little confused on one of your updates from earlier.  You say you created a new docker image called docker2.img and then you say this:

    On 7/21/2020 at 3:01 PM, JPDom1 said:

    I have access to my dockers again but when trying to load a docker from the templated i get a failure every time see photo below.

    But then later you say:

    On 7/21/2020 at 3:16 PM, JPDom1 said:

    I got pi**ed off and walked away....Came back and 

     

    image.thumb.png.e3eaa28eb0286f30e66a42fc81eac701.png

     

    Was happy to see my containers and VM again.

    So I'm confused.  Did you redownload all these into the new docker2.img or is this once again trying to use the original docker.img?  If the new docker2.img, how did you get past the error you previously reported?

     

    Another thing to try:  disable the docker service from the Docker Settings page and see if the Community Apps plugin works again.  Maybe one of your containers is causing a weird network conflict?  If so, you should try turning your containers on one at a time until you can recreate the problem.

     

    The key for us to help solve this is to figure out what has tripped the system into this state.  None of this is normal behavior and I am desperate to try and find a way to recreate your issue on my end so we can debug.

  8. Hi there,

     

    The challenge might be the graphics card you are using.  But if you had this working previously on your homebrew setup, it should work on Unraid.  Can you try changing the machine type of the VM to i440fx?  If that doesn't work, try changing the BIOS type to SeaBIOS and see if that works.

  9. Hi there,

     

    If this issue has been occurring since you originally went to configure Unraid, then it is likely there is something amiss with the hardware or BIOS that is causing these issues.  If you had a working setup originally and then one day this started happening, we have to figure out what changed that caused this to start happening.

     

    Perhaps try formatting the flash to default settings and see if you can get Unraid to boot there.  If so, then you know something is amiss in the configuration of your previous flash.

  10. This is definitely a hardware-specific issue.  I don't know why your particular gear is showing these kinds of problems, but I would expect to be hearing from a lot more people if this was a more generic issue.  The best we can do with this is try and report the issue upstream to the QEMU/KVM developers in hopes that they know what is going on.

  11. Ok, to be fair, Hyper-V and KVM are not anywhere close on the spectrum of hypervisors and if other underlying gear changed (including the HBA and storage), that obviously could have an impact.  What about BIOS updates?  Any available?  Another thing you could try would be to disable IOMMU in the BIOS to see if that has any impact.

  12. Wow, that's pretty concerning.  If there is no hardware pass-through happening and you're getting these kinds of crashes, it leads me to believe a buggy BIOS on your hardware.  What is the underlying hardware on this system?

  13. Ok, what happens if you path the storage to something other than that PCIe NVMe Unassigned Device?  Again, the goal here is to narrow down the root cause or what combination is causing it.

     

    Another thing you could try would be changing the Machine Type or the BIOS type to see if that has an affect.

  14. Hi there,

     

    Are you trying to pass through the NVMe drive to the VM directly?  If so, try not doing that and see if you can reproduce the lockup.  If so, then the issue stems from the underlying hardware/VM configuration.  If the issue goes away, then you know it's isolated to that PCIe device.

  15. A few things I'd like you to try:

     

    1)  Check for a BIOS update.

    2)  Create a new VM using SeaBIOS instead of OVMF.

    3)  Create a new VM using Q35 instead of i440fx.

     

    Try the combination of SeaBIOS + Q35, SeaBIOS + i440fx, and OVMF + Q35 and see if any of those combos have any impact on the crashing.  If not, this may be a hardware / BIOS issue that we can't resolve from the software side.  I know that AMD offers a good price for performance product, but unfortunately their testing in the VM department leaves a lot to be desired.  There is only so much we can do from the software side to quirk the kernel/QEMU.  Your last resort would be to contact the motherboard manufacturer and notate the problems you are having to see if they have a beta BIOS that may resolve it for you.

  16. 2 hours ago, bamhm182 said:

    Would you be open to it being made with Python? Seems to me that python would be a good choice since it is easily extensible, cross platform, and easier to maintain. Last I looked into it, you could easily build for Linux, Windows, and OS X. The only stipulation is that the OS X executable needs to be made on OS X.

     

    We're not opposed to it.  The big thing is the user can't have to download and install any additional "components" in order to make it work.  So it has to be a self-contained executable on all platforms.

  17. On 6/29/2020 at 6:25 PM, Fizzyade said:

    Without reinventing the wheel, why not just supply an compatible img and then tell people to use balena etcher which is pretty much the go-to image writer, you just have to supply an img in a suitable format.

     

    There are additional things that our creator does that a simple IMG file does not.  Our tool validates the USB flash GUID as usable (most of the time ;-), it allows the user to toggle EFI boot mode, as well as customize hostname and networking options.  It even lets the user select which release to install (from a backup, the current available release, or from our Next branch).  These are features that are important to ease of use for new users and while we can appreciate that not everyone needs this, those that do really appreciate it.

     

    13 hours ago, jammin said:

    Is it out of scope to discuss a different method of license enforcement than the USB key serial number?  It's a scary single point of failure and kinda restricting to require a USB stick at all.

    If it has to be hardware, maybe the check could be when starting the array instead of on boot, and base it on one or more array member serial numbers?  Or the MAC address of the NIC?  

     

    It is out of scope for the purpose of this RFQ, but know that we are investigating other licensing methods for future inclusion.  Changing licensing is always a real iceberg of a problem.  Seems small and simple from above the water line, but below it is a gigantic thing just waiting to sink your ship ;-).  That is going to have to be another battle for another day.

     

    11 hours ago, Fizzyade said:

    Maybe I'm massively missing something, but I've hated the fact that I've had to use the Unraid tool and have longed to have an image that I could just flash with my favourite tool.

     

    While we definitely appreciate what certain users want, we have to address the wider market of users that aren't as savvy.  While I definitely agree if you're savvy enough to build a computer, you're probably savvy enough to figure out how to image a USB flash using some generic tool, but we're not just targeting that kind of customer and perhaps longer term users won't be building their own servers at all.  The point is, our flash creator tool should work fine, but it's been a bit more of a bear to maintain than we'd like, so we're looking for offers from developers that want to earn a little extra cash to help build this thing.  And yeah, we probably will have to fix it again after the next Mac release comes out, but that's fine and something we're also willing to accept.

    • Like 2
    • Thanks 1
  18. Hi Unraid Community!

     

    We have a special request for anyone who is familiar with the work required and wants to make a little $cash!

     

    A few years back we released our own USB flash creator tool for Unraid OS.  For those of you who remember, installation of Unraid used to require a manual process (documented here), but we wanted this new tool to be a far easier way to get up and running.

     

    Here we are a few years later and the tool desperately needs an update, especially the macOS version.  Our problem is that the development team is heads down focused right now on getting 6.9 and 6.10 out the door.  As such, we wanted to throw out a request to our Community to see if anyone has the tools and talent to help us with this.

     

    This is a formal RFQ (Request for Quote) to correct issues in the current USB flash creator for Unraid OS, for both Windows and Mac platforms.  We're not necessarily looking for any increased functionality at this time, though creative ideas on how to make it better will be considered.

     

    To respond to this RFQ, please email [email protected] with your bid and time estimate for the work.  We will update this post once a bid has been accepted.  If you have questions regarding the RFQ, please post them here so our responses can be made in the post publicly for all to see.  Thanks everyone!!

     

    All the best,

     

    Team Lime Tech

    • Like 6
    • Thanks 1
  19. Hi there,

     

    Saw your email into support and wanted to chime in on your thread here.  Unfortunately johnnie.black is right in that you're going to need to take the "one at a time" approach to figure out the root cause.  The main problem here is that there wasn't some "event" that occurred prior to these issues that we can point to.  Everything was fine until it wasn't.  When issues like that happen, 99 times out of 100 it's because of something amiss with the hardware or a plugin/container update that broke something.  Do you have your containers set to auto-update or do you manually update them?

     

    You can absolutely check out HTOP through a command line (just type htop from a terminal session) and see a more detailed process reporting, but even then, you will likely still have to resort to shutting down all your containers, letting the system run for a while to see if the CPU usage spikes just randomly and if not, start slowly turning on containers one by one until you find the culprit.  I wish I had better advice for you, but again, when the issues just come out of nowhere like this and there wasn't some event that occurred right before the issues manifested, there is just no other way to narrow it down.

  20. What would be helpful is to know how long the server was operational before these issues came up and if anything changed in the week or so prior to the issues occurring.  If nothing changed and the server all of the sudden started having this behavior, it really has to be something amiss with the hardware.  Maybe dust buildup shorted something out or is causing heat or other issues.  The PSU might be failing (a failing PSU could cause issues if it can't supply enough power or supplies too much power under certain situations).  You can try running a memtest, though that is only 1 possible culprit.

     

    I wish I had more advice for you in this scenario, but unless you can point to a trigger that is causing the issue, it's really like looking for a specific needle in a pile full of needles ;-).

  21. Hi there,

     

    Saw your email into support and have read through this thread.  You're definitely getting some good advice by folks in here.  Johnnie's last post is especially important, although you can also use the idrac connection to print the log to the screen in real time with this command:

     

    tail /var/log/syslog -f

     

    Then when the server hard crashes, you can take a screenshot of what you see and post it back here.  The trouble with crashes is that often times the crash occurs before the events can be written to the log file, so having it printed to the screen can be just fast enough to capture the crash event in the log.

     

    It would be helpful to know when these issues started occurring.  If you're hardware has been fine for months and months and all of the sudden these issues come out of nowhere, it is likely the result of a problem in the hardware.  Might need better cooling, cabling, etc. and keep in mind that the temperature warnings in Unraid are generic.  You have to look up the actual hardware you're using to find the temp ratings and then manually set those temps in the disk settings page for those devices.  Otherwise you may be giving yourself a false impression that heat isn't the issue when perhaps the drives are being forced to operate at a much higher temp than they should be.

     

    Lastly, if you are banging your head against the wall on this, the best thing to do is to try reducing the number of components in your system setup (apps, containers, VMs, cache drive) until you isolate the issue.  Try running the array without the cache assigned at all.  Turn off all docker containers.  Let it just run for a while and see if it crashes.  Then slowly start turning services on one by one until you recreate the issue.  Once you've found the key variable that causes the crash, we can try and replicate to recreate if its a bug, or give you advice on how to solve it within the software via a configuration tweak.

×
×
  • Create New...