• VM Manager Settings REBOOT REQUIRED


    Tward2
    • Minor

    Upgraded to 6.6.1 and all of my VM's have disappeared. If I install a new one it will be gone the next day. I need to enable PCIe ACS override so that I can isolate my gpu and it says, System must be rebooted for changes to take effect! When I reboot message is still there and IOMMU groupings are still the same.

    tower-diagnostics-20181002-1832.zip




    User Feedback

    Recommended Comments

    49 minutes ago, Tward2 said:

    Now I am getting the error from fix common problems:

    /var/log is getting full(currently 100% is being used)

    Post a new set of diagnostics

    Link to comment

    One of your cache devices has hardware issues, those are write and read errors, it likely dropped offline at some point in the past:

     

    Oct  2 18:16:43 Tower kernel: BTRFS info (device sdb1): bdev /dev/sdc1 errs: wr 7978000, rd 4925468, flush 64968, corrupt 0, gen 0

    Due to that there are several checksum errors, e.g.:

    Oct  2 18:40:39 Tower kernel: BTRFS warning (device sdb1): csum failed root 5 ino 4289000 off 4096 csum 0x1ce89bb4 expected csum 0xa6a98abc mirror 2
    Oct  2 18:40:39 Tower kernel: BTRFS warning (device sdb1): csum failed root 5 ino 4289000 off 8192 csum 0xdb87daca expected csum 0xe250b854 mirror 2
    Oct  2 18:40:39 Tower kernel: BTRFS info (device sdb1): read error corrected: ino 4289000 off 0 (dev /dev/sdc1 sector 56751696)
    Oct  2 18:40:39 Tower kernel: BTRFS info (device sdb1): read error corrected: ino 4289000 off 4096 (dev /dev/sdc1 sector 56751704)

     

    You should run a scrub to bring the dropped device up to date, but note that if your using NODATACOW shares, and some are set like that by default, those can't be fixed since they aren't checksummed, and it's outright dangerous when used with btrfs raid1, since when a devices drops offline and then re-joins the pool btrfs has no way of knowing which member has the correct data and will read from all members, including the one with old (and now corrupt) data resulting in unfixable pool corruption.

    Link to comment
    19 minutes ago, limetech said:

    How is this possible?

    It's a current problem with how btrfs works, it was recently discussed in the mailing list, I can send you the link if you want, NODATACOW disables checksums and without them btrfs has no way of knowing that one the members has stale data and it will happily continue to read from it corrupting the pool.

     

    It's not a problem for single pools, and not a problem without NODATACOW, IMO since most users use SSDs for cache and COW shoulnd't have a very noticeable impact on VM type files when used with an SSD it would be safer to default to COW for the system shares.

    Link to comment
    58 minutes ago, johnnie.black said:

    it was recently discussed in the mailing list

    Sure I'm aware of limitation that NODATACOW also disables checksums, but more interested in how a pool member can go "offline" and then magically go back "online" - wasn't aware btrfs does this.

    Link to comment
    3 minutes ago, limetech said:

    but more interested in how a pool member can go "offline" and then magically go back "online"

    It happens frequently, like in the case above, I see it almost every week on various diagnostics, one device drops offline, usually from a bad cable, on the next reboot it comes back online, and the pool continues to work without apparent problems since any checksum errors are corrected by the good mirror, but for NODATACOW shares btrfs won't notice and will alternately read from both mirrors since it considers both to have the same data, when in fact they might not, and even running a scrub can't fix anything, since without checksums it will skip those files.

    Link to comment
    23 minutes ago, johnnie.black said:

    It happens frequently, like in the case above, I see it almost every week on various diagnostics, one device drops offline, usually from a bad cable, on the next reboot it comes back online, and the pool continues to work without apparent problems since any checksum errors are corrected by the good mirror, but for NODATACOW shares btrfs won't notice and will alternately read from both mirrors since it considers both to have the same data, when in fact they might not, and even running a scrub can't fix anything, since without checksums it will skip those files.

     

    Please post the mailing list link.  If a device comes back "online" and is automatically rejoined to the pool, how does btrfs know which device has the proper checksum?

    Link to comment

    I shut down the server and checked the power connectors on both cache drives and plugged the SATA cables into different ports on the motherboard and rebooted the system. The VM Settings page still says it needs to be rebooted for changes to take effect and my docker.img is now filling up 42.9 GB.

    tower-diagnostics-20181004-1915.zip

    Link to comment

    Try creating a new libvirt.img in case the current one is corrupt, you need to specify the file, this like you tried earlier won't work:

     

    Oct  4 18:44:56 Tower emhttpd: shcmd (174): /usr/local/sbin/mount_image '/mnt/user/system/libvirt/' /etc/libvirt 10
    Oct  4 18:44:56 Tower root: /mnt/user/system/libvirt/ is not a file

     

    Docker image filling up it's usually the sign of a misconfigured docker.

    Link to comment

    The reboot message appears when either "PCIe ACS override" or "VFIO allow unsafe interrupts" setting is different from the start up (syslinux.cfg) configuration.

     

    Click on Main -> Boot Device - Flash

    And place a screenshot of Syslinux Configuration

    Link to comment

    Your syslinux.cfg file includes the pcie_acs override statement, but this is not executed on start up.

     

    I had once an issue that my BIOS would cache an earlier syslinux.cfg file and consequently changes were not propagated. Needed to reset my BIOS to default settings and let the system start in legacy mode.

    Perhaps something similar is happening to you?

    Edited by bonienl
    Link to comment
    4 hours ago, Tward2 said:

    But also why is my number 6 disc not filling up?

    All depends on the setting of "Allocation method" and "split level" or perhaps included/excluded disks.

    Link to comment

    I reset my Bios to factory. Still gives me the reboot error. If I put in a graphics card and set it to primary instead of the igpu, it will accept it and acs override works. But the pci graphics card resolution is terrible and huge, hard to read anything.

    Link to comment
    On 10/5/2018 at 11:29 AM, Tward2 said:

    image.thumb.png.04202fb650cf42a4821c858c7b8fbb8e.png

    Looks like you manually booted in to 'Unraid OS GUI Mode' which doesn't have the 'pcie_acs_override=downstream' in the append line.  This explains why it's saying to reboot on the settings page because it expects 'pcie_acs_override=downstream' to be in the append line for the 'Unraid OS GUI Mode' too.  @bonienl is this a bug for the PCIe ACS Override setting not updating all the label sections in the syslinux.cfg file?

    Link to comment
    8 minutes ago, eschultz said:

    is this a bug for the PCIe ACS Override setting not updating all the label sections in the syslinux.cfg file

    Not a bug. By design the setting in the GUI is only applied to the active start up selection.

    GUI will also not allow to make changes when started in safe mode.

     

    When started in GUI mode, simply apply the settings and reboot.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.