Marshalleq

Members
  • Posts

    968
  • Joined

  • Last visited

Report Comments posted by Marshalleq

  1. Yes ZFS.  Thanks, but I still think this is a bug given rolling back to rc2 the problem goes away.  What you highlight above are correct paths.  I did reboot a few times to confirm consistent problem.  The paths were mounted already when navigating via bash.  Somehow the unraid system wasn't seeing them though.  Obviously I shouldn't have to do it manually either.  So if there's something specific you want me to test I'm happy to do that.  You want me to upgrade and manually mount zfs again?  Given I've already tried that I am a little reluctant but will do it if it is needed to convince for further investigation.

  2. Yeah I need to get back onto this I have been away for many months, should be back next week but who knows when I'll have the time to get back to it.  I too would love a higher speed highway between Mac and unraid.  I have a few 10G cards, but they're expensive to get a matching one for a MacBook so thunderbolt should be a simpler and more fast solution in theory!

    • Upvote 1
  3. On 10/6/2022 at 9:19 AM, CS01-HS said:

     

    Just a note because it threw me off - restarting samba didn't apply my /(flash|boot)/config/smb-fruit.conf changes. Stopping the array and restarting did though.

    @limetech - fantastic work here - I've said it before and I'll say it again - I'm constantly impressed with the way you guys focus the features on the customer in a way that nobody else does.  Regarding the above comment, is this something to do with a licensing dependency?  Otherwise I have noted some similar behaviour requiring array restart before and it would certainly be much nicer if the whole docker system and vm's didn't have to be stopped for these kinds of small changes.  Though I must say I'm not sure that this is normally required for SMB changes.

  4. I'm another referred here by the fix common problems plugin.  I use tdarr extensively having been in discussion with the developer since the beginning of it's creation.  I have never and still do not have any of these issues.  However, I do not use the unraid array except as a dummy USB device to start the docker services (I use ZFS) and do not use NFS (I use SMB).  I strongly suspect this issue is more about tdarr triggering an unraid bug of some kind than tdarr itself being the issue.

  5. Yes, I don' know what the cause is but a few days ago I changed to q35-5.0 and that seemed to fix it.  Before that, it didn't lock up but it was painfully slow - but it looked like a lockup if you weren't prepared to wait 20 minutes for your vm to boot. :)

     

    So possibly you could get to 5.0 as well.

     

    Edit  - 5.0 still had slowness issues, just less than 5.1  - trying 4.2 again.

     

    Looking through the changes here, there's not a lot that's changed - so it shouldn't be too hard to pin down.

  6. Mines been running a lot longer since a) moving to virtio-net b) updating to the rc series with newer kernel and c) probably more specifically, removing a rather taxing 24x7 compression routine I run (which I will turn back on at some point.  I actually haven't had a crash yet, still monitoring.

     

    I also notice for the first time EVER for me on unraid, the VM get's to the Tianobios screen in sub 5 seconds.  Previously, that only ever happened for the first run after a reboot, then it would be e.g. 1 minute before it would get there, or even longer.

     

    I still suspect something about threadripper has been causing this for me and I doubt it's gone, just reduced.

  7.  

    2 hours ago, Gnomuz said:

    a Raid 1 file system which turns unwritable when one the pool members fails while the other is up and running and requires a reboot to start over is not exactly what I expected ... So let's say we share the same unreliable experience with BTRFS mirroring.

    LOL, I know right?!  This is what amazes me sometimes with the defenders of BTRFS.  The basics 'seem' to be overlooked.  That said, I've never been sure if it's BTRFS or something wonky with the way Unraid formats the mirror.  I know in the past that there definitely WAS a problem with the way unraid formatted it (I got shot down for suggesting that too), but in the end that was fixed by the wonderful guys at LimeTech.

     

    In that prior example, I found that I could fix the issue in the command line and not in the GUI, but I still had issues with BTRFS getting corrupted so ended up giving up.  But full disclaimer, I'm pretty light on BTRFS experience - however having used probably more filesystems than most, in both large production and small residential instances, I think I'm qualified enough to say it's unreliable, at least in Unraid (only place I've used it).

    2 hours ago, Gnomuz said:

    For the moment I let the cache SSDs running via the LSI, have converted the cache to XFS and forgotten the mirroring as you suggested. I've immediately noticed that with the same global I/O load on the cache, the constant write load from running VMs and containers was divided by circa 3.5 switching from btrfs raid1 to xfs. At least, btrfs write amplification is a reality I can confirm...

    That's an impressive stat.  I thought I read write amplification was solved some time ago in unraid, but perhaps it's come back (or perhaps a reformat is needed with accurate cluster sizes or something).  BTW, I am also sure I read in the previous beta that the spin down issue was solved for SAS drives, which logically should also mean with a controller of some sort.  It was in one of the release notes.

     

    You may be interested in my post this morning on my experience migrating away from unraids array here (though still using unraid product).  Not for the inexperienced but I'm definitely looking forward to ZFS being baked in.

  8. There have been arguments on both side of the fence (btrfs is great, btrfs is not great).  My experience has been the latter and I would recommend not using it in your cache.  Switch to XFS and forget the mirror.  With BTRFS mirror I did not have a reliable experience as have others.

     

    BTRFS 'should' be able to cope with a disconnect, even if it had to be fixed manually. However, if both devices have got constant hardware disconnects then I guess that's going to be pretty challenging on any filesystem.

     

    From my quick reading, this is a kernel issue though, not a hardware issue, so if you're not on the latest beta, if it were me, I'd probably shift to that given rc1 has the much newer kernel.  

     

    Running SSD's via an LSI card will reduce their speed as well BTW.

     

    Another thing I'd do is raise it directly on the kernel mailing list (it's probably already there if it's still a current issue) because that would eventually find it's way into unraid.

     

    Hopefully you're not on the latest beta / kernel as that alone might possibly solve it.

     

     

  9. I run a threadripper 1950X with an Asus X399 Prime-A motherboard.  Originally I didn't have to disable C-states and enable power supply typical current idle.  I did have to do that on my previous Ryzen 1700X system.  Anyway, recent crashes made me revisit that.  The other thing I did was adjust my VM to use the new NIC settings, which from previous testing seem to be much slower but more stable, I should check if that's still the case or not.  But I was getting my logs filling up due to having virtio set instead of virtio-net.  I assume having a full drive due to logs isn't great for stability either - maybe it's partitioned though, I haven't checked.

     

    Mostly the issues do seem to come when I'm gaming in a Windows 10 VM though, other VM's seem OK.  So the main difference I can think of is GPU passthrough.

     

    Also, mine doesn't always crash per-se, but dmesg shows similar kernel messages to those posted in the beginning with kernel traces etc (wish I knew how to read those).  My memory has also undergone extensive testing.

     

    Is any of this common to anyone else here with system crashes?

     

    • Normal humans often can't see all the ways messaging can be interpreted (this is why we have comms people)
    • The most offensive sounding things are often not intended in that fashion at all.
    • Written text makes communication harder because there are no facial or audio cues to support the language

     

    I expected our community developers (being that they've clearly communicated in text behind screens for many years) would understand that things aren't always intended as they sound.

     

    In this regard, I support @limetech wholeheartedly.

     

    Nevertheless the only way to fix this is probably for @limetech to privately offer an apology and discuss as a group how to fix, then publish back together that it's resolved.

     

    (30 years managing technical teams - seen this a few times and it's usually sorted out with an apology and a conversation).

     

    • Like 8
    • Thanks 1
  10. Assuming you're not running any impacted 3rd party plugins like ZFS, you also need to ensure your cache drive is backed up.  I'm not sure there's an automated way - I think you have to copy it somewhere else and reformat the drive after you've downgraded, then put the data back in.  Could be wrong.  It definitely kicks it out of the array though.

  11. I've just been bitten again by the GPU bug whereby I have to shut down (not restart) the whole host to get it to work.  Logs have been submitted before.  Basically the VM does start and run, but the screen is still on text mode, running perfectly.  Shutting it down seems to get it back into gear.  Latest beta 30.

     

    There is of course a possibility of faulty hardware occurring at the same time as the beta upgrade.  That's a hard one to test.

     

    Scratch that - that doesn't work either.  Downgrading - FYI I can't downgrade to stable, even though it's an option, just reverts back to beta 25 all the time.

    OK downgrading back to beta 25 got it to work.  This might in fact be downloading the machine version.  Haven't checked.

  12. 3 hours ago, limetech said:

    I saved a link to this post to have someone look at vnc on safari but it would be much better to open bug report.

    Can I just add, VNC on Safari (from unraid specifically) has never worked for me until Mac OS Big Sur (currently in beta).  I used to have to open a separate Firefox browser, copy the link from safari over and it would work.

     

    I tested it on multiple computers installs and things and could never get it to work before.

     

    Big Sur is still in beta and their browser does seem to have a few minor issues here and there, but clearly there's a new engine or something behind it - never seen a new version of Mac OS change the browser quite so much before.

     

    But vnc works on unraid so I'm pretty excited about that.  Hope it helps someone.

  13.  

    23 minutes ago, Dava2k7 said:

    Hey all was just playing a game on my VM on the Tv and suddenly my VM switched its self off whilst in game and now it wont passthrough to TV i have these errors in my VM logs. Ive tried recreating VM using different machines q35 ect nothing seems to work

     

    2020-10-01 20:54:10.828+0000: Domain id=23 is tainted: high-privileges
    2020-10-01 20:54:10.828+0000: Domain id=23 is tainted: host-cpu

    char device redirected to /dev/pts/0 (label charserial0)

     

    any ideas??

     

    I get this kind of thing (the part where stuff stops working in a VM etc for a good year or so, but it's worse in the current beta if that's what you're using) and find that rebooting the host usually fixes it.  You may have to do a complete power off though to reset the hardware.  Failing that, try deleting the vm template (not the disks) and recreating it.  This is still for some reason needed to be done on a regular basis for me.  Though Limetech said that it's not normal, but worth a try for you.

  14. I did some testing on one of my own problems last night, but it turn out it exists in beta 25 also.  The problem I accidentally discovered due to an unbootable install iso can be replicated on my machine over and over.

     

    The problem is that if you force close a vm at the first install screen (or the screen where you are presented with the failed to boot grub text (e.g. using vnc), you are presented with similar to the following screenshot:

     

    1055989072_ScreenShot2020-10-01at5_29_13PM.thumb.png.f91e44d775d046a511823cb34424da3f.png

    This seems to result in two issues I've noticed, 1 - I can no longer access the virtual machines tab or it's contents, 2 - I can't delete files from the virtual machines folder on the SSD I'm using, requiring a reboot of the host.

     

     

    I've run a full memory test over night, including with SMP enabled to rule out any memory related issues and it all came up clean.

     

    I'm posting this here in case it's some other combination of hardware I have.  To that end please note I'm on a threadripper 1950x which has never given me a single issue on unraid and I am storing this vm on an SSD formatted with ZFS.  If that becomes an issue I can store it on an official unraid file system.

     

    I have performed the same test on my xeon system and cannot get this to occur, my suspicion is it's AMD related as both systems run the beta and ZFS and the AMD system it appears to be 100% repeatable and seems to slow down the system too.

     

    I suspect if I force close the VM at any other point this will also happen.  Logs attached.

    obi-wan-diagnostics-20201002-0852beta25.zip

    obi-wan-diagnostics-20201001-1724beta29.zip