pervel

Members
  • Posts

    38
  • Joined

Posts posted by pervel

  1. I just got two Samsung 870 QVO 2.5" SSD and set them up in a new pool. I'm just curious if the trim function will be run automatically on the new pool just like on the original cache pool? Or do I have to do something special to trigger it?

     

    Also, they are set up in a BTRFS raid1. Will trim be performed on both drives?

    • Upvote 1
  2. I have now also tested with vdisk using cache='none' again. This time without any errors. Yay!

     

    I think I can finally conclude that my problems were all hardware related and that new RAM modules have fixed the problem - at least for now. It is still quite strange to me that all of my four RAM modules apparently failed. That suggests an external cause. But I have no clue what that cause might be. I guess I just have to wait and see if it happens again.

    • Like 1
  3. I've installed the new RAM. So far that really seems to have helped. I'm getting none of the BTRFS errors from before. 😀

     

    So I guess the conclusion so far is that it really was bad RAM. Of course that worries me since it's hard to believe all 4 RAM sticks failed without some external reason.

     

    I have not yet tried to change my vdisks back to cache='none'. I would like to see the system run stable for a few days straight first.

     

    Thanks for the assistence, @JorgeB!

    • Like 1
  4. 2 hours ago, JorgeB said:

    Not sure, don't remember previous complains, but let me see if I can duplicate that.

     

    Sadly, I did get a couple of BTRFS errors today even with these changes. But far less than before where the log was filling up with them. So I don't know what I conclude right now. I'm getting new RAM sticks today. I will try those and report back.

    • Like 1
  5. I think I have found the problem! It's completely unrelated to any RAM issues. It has to do with how I have defined my vdisk. For a long time I have been using the following definition:

     

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/mnt/user/domains/WinVPN/vdisk1.qcow2'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>

     

    Notice the line with cache='none'. I have changed that from the default cache='writeback'. I honestly cannot remember why I did that. I think I read somewhere that this had better performance. However, this appears to be the reason for all the BTRFS errors. If I change back to cache='writeback', I no longer get any BTRFS errors.

     

    I'm happy to change back to the default but I do wonder if this is perhaps a bug?

  6. 1 hour ago, JorgeB said:

    It's strange, try moving the vdisk to the array, VM must be off, see if there are any errors, if there aren't move it back.

    I did the following:

    1. Copied the vdisk from cache to array
    2. Changed the VM to point to the new vdisk location on the array
    3. Ran the VM. Did not get any errors in the log
    4. Copied the vdisk back from array to cache
    5. Changed the VM back to point to the vdisk location on the cache
    6. Ran the VM. Got the same BTRFS errors as before all pointing to the inode for the vdisk

    What can I learn from this? The array is formatted XFS and not BTRFS. Should XFS give the same kind of errors if the file is damaged?

     

    Btw, the VM appears to run just fine despite these errors. But maybe that's just luck.

     

  7. I've run 4 passes of memtest on the remaining good RAM stick. It gives no errors. I've uses these 4 sticks for 2 years now without issues. So it is really strange.

     

    There are no errors reported when running scrub. But when I run the VMs, I get BTRFS errors. The inode in the error message does indeed belong to the virtual disk file for the corresponding VMs. These files were copied over from the SSD at the time when I was still using the bad RAM sticks. Should that matter?

     

    I plan to buy new RAM sticks today. I really hope there are no other hardware issues present. But it is worrying that 3 sticks have failed.

  8. When I ran the scrub I got BTFS errors that referenced paths to files on the cache such as:

    Nov 22 21:33:06 Basse kernel: BTRFS error (device dm-1): bdev /dev/mapper/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 763, gen 0
    Nov 22 21:33:06 Basse kernel: BTRFS warning (device dm-1): checksum error at logical 264171896832 on dev /dev/mapper/nvme0n1p1, physical 135347519488, root 5, inode 20179, offset 159330914304, length 4096, links 1 (path: domains/GamePC/vdisk2.qcow2)

    After deleting those files I no longer get scrub errors.

     

    But I still get errors when trying to run a VM from the cache I get a more generic error such as:

    Nov 22 21:42:06 Basse kernel: BTRFS error (device dm-1): bdev /dev/mapper/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 780, gen 0
    Nov 22 21:42:08 Basse kernel: BTRFS warning (device dm-1): csum failed root 5 ino 20188 off 10248269824 csum 0xbf691df4 expected csum 0xd7cd4596 mirror 1

     

    Can you explain what is going on? Are somes files still damaged? Or is the physical SSD damaged? What is the best way forward for me?

     

    I should note that it's only the two big Windows VMs that give these errors. I have a number of Linux VMs that don't seem to give errors. But they are also noticeably smaller.

  9. I don't understand what's going on. I have now run memtest on each of the 4 RAM sticks. 3 of them showed errors. Only one did not. What are the chances of that? I wonder if this means there is something else wrong with the computer.

     

    Then I started Unraid with the single good RAM stick I deleted the files in syslog that scrub complained about. Then ran a new scrub and it finished without reporting errors.

     

    However, when I start both of my Windows VMs, I still get loads of BTRFS errors in syslog. But scrub says nothing. How can this be?

     

    basse-diagnostics-20221122-2152.zip

  10. 20 minutes ago, JorgeB said:

    Btrfs is detecting data corruption, this is usually RAM related, and Ryzen with overclocked RAM like you have is known to corrupt data in some cases, start by correcting that then run a scrub.

     

    Thanks for the fast response. I adjusted RAM down to 2133 MHz (this was the "Auto" setting in the BIOS) and ran a scrub. Sadly, I'm see the same errors both when running the scrub and when running a Windows VM from the cache pool.

     

    basse-diagnostics-20221122-1811.zip

  11. My cache pool consisting of 2 Samsung SSD 970 EVO NVME drives seem to be having issues. First, it was marked read-only. Then I decided to move all data to the array and reformat the cache pool and move the files back. But even after that I still get lots of errors (but so far not marked read-only):

     

    Nov 22 13:35:59 Basse kernel: BTRFS error (device dm-1): bdev /dev/mapper/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 67, gen 0
    Nov 22 13:35:59 Basse kernel: BTRFS warning (device dm-1): csum failed root 5 ino 20192 off 283238006784 csum 0x270bc44d expected csum 0x0c0e90f7 mirror 1
    Nov 22 13:35:59 Basse kernel: BTRFS error (device dm-1): bdev /dev/mapper/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 42, gen 0
    Nov 22 13:35:59 Basse kernel: BTRFS warning (device dm-1): csum failed root 5 ino 20179 off 133788278784 csum 0xaa8b2d45 expected csum 0x3b4fa3e7 mirror 1
    Nov 22 13:35:59 Basse kernel: BTRFS warning (device dm-1): csum failed root 5 ino 20179 off 133788286976 csum 0x30968432 expected csum 0x0d3e5c34 mirror 1
    Nov 22 13:35:59 Basse kernel: BTRFS error (device dm-1): bdev /dev/mapper/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 43, gen 0
    Nov 22 13:35:59 Basse kernel: BTRFS error (device dm-1): bdev /dev/mapper/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 44, gen 0
    

     

    What could be the cause of this? Are the SSDs failing?

    basse-diagnostics-20221122-1605.zip

  12. On 10/16/2022 at 12:29 PM, JorgeB said:

    Backup current /config folder and recreate the flash drive using the USB tool (with v6.11.1), if OK restore the config folder, if it also fails replace the flash drive.

     

    Is /config really the only folder you need to backup and restore? On my flash drive I can see some potentially relevant folders like /extra (from NerdTools I think) and /private (I may have created that myself at some point).

     

    Btw, I came here because of a similar error when trying to upgrade from 6.11.3 to 6.11.4. A second attempt upgraded without errors. Should I still consider replacing my USB stick?

  13. On 8/25/2021 at 9:32 PM, cheesemarathon said:

     

    Easy fix. You're using the env variables that minio has depreciated. I updated the template but they don't seem to have pulled through to you. Remove your container and create it again from community apps. Just don't delete your minio appdata as well!!!! @pervel This should also fix your issue.

     

    I have just done the above as well and had no issues. Just make sure to fill in your same user and password as before.

    Thanks! Removing and re-creating the container did the job.

  14. After the latest update the Minio container cannot start. I get the following error:

    ERROR Unable to validate credentials inherited from the shell environment: Invalid credentials
    
    > Please provide correct credentials
    HINT:
    Access key length should be at least 3, and secret key length at least 8 characters

     

  15. Is there something I can do to get all temperature monitors for my ASUS ROG STRIX B550-E GAMING motherboard?

    This is the only thing I am getting:

    1980133337_Screenshot2021-03-02at15_00_46.png.0eaf7abf22a8cc4653c22c5c51b1ea1b.png

     

    I think the three k10temp are all CPU temperature but I'm not sure. They are almost always exactly the same. I would really like to also get motherboard temperature.