BlueSialia

Members
  • Posts

    28
  • Joined

  • Last visited

Everything posted by BlueSialia

  1. So the root of the issue was obvious and so was the solution as well. The docker and libvirt images where precisely in the disk with the least free space. Literaly less than 1 MB. Moving a couple files from that disk to another fixed everything. I was a little bit afraid of doing that while the data recovery was taking place, but everything went well from what I can tell.
  2. I had 4 disks in my array. One parity drive of 8TB, two 8TB drives and a 2TB drive. I was rapidly approaching max usage on them so I bought a 16TB drive. Replaced the parity drive with the new drive and recalculated parity. Then, because I didn't monitor it properly I reached 100% of usage on my array. So with the recalculated parity finished less than 24 hours ago I replaced the 2TB drive with the 8TB that I was using as parity before. But now, as the replacement drive is being filled with the data from the replaced drive I have the issues of Docker not starting and no VMs showing in the VMs panel. Extra information: I've also replaced a NVIDIA 1060 GPU with a NVIDIA 4070 GPU at the same time I replaced the 2TB drive. And also enabled the option in Docker about keeping user-defined networks. I've been using a user-defined network for months with that disabled without issue (I think) but I enabled it now because I noticed it was a thing. unblue-diagnostics-20231024-2022.zip
  3. Did you figure it out? I just replaced an old and small disk from my array with a larger one and I have the same issue as you. Same libvirt log, just different VMs names, as you would expect.
  4. Recently I've discovered that my server was crashing/freezing because the Ryzen CPU doesn't play nice with Linux's c-states. So I disabled them in the BIOS. Which means my server has a power consumption of 150 watts while idle. My system information: M/B: ASUSTeK COMPUTER INC. ROG STRIX X570-F GAMING Version Rev X.0x - s/n: 190754725200579 BIOS: American Megatrends Inc. Version 4021. Dated: 08/09/2021 CPU: AMD Ryzen 9 3900X 12-Core @ 3800 MHz HVM: Enabled IOMMU: Enabled Cache: 768 KiB, 6 MB, 64 MB Memory: 64 GiB DDR4 (max. installable capacity 128 GiB) Network: bond0: fault-tolerance (active-backup), mtu 1500 Kernel: Linux 5.19.17-Unraid x86_64 OpenSSL: 1.1.1s Attached is my BIOS settings. Any clue of what to do to reduce the power consumption? BIOS_setting.txt
  5. I finally found a solution. But a very undesirable one. The issue was this. My Ryzen CPU doesn't support Unraid's (Linux) c-states. I disabled them in the BIOS. Now, as it is expected, my server has a very high power consumption even at idle.
  6. One year since this started happening to me and the issue is still the same. At some point, the server just dies somehow. The hardware is still on but the system is not even part of the network. And I have no idea about how to debug this problem. I think the only thing I can do is turn off every docker and VM that is not critical, wait and see if it exhibits the same behavior. If by any chance it manages to reach an uptime of a month then I can enable what I turned off slowly. Like one per 2 weeks or something. Even if this works it'll mean months of having most of my things disabled. I'm just hopeless.
  7. I didn't upload the entire file. Just posted the log entries of one of the days the server crashed.
  8. I still face this issue. I replaced the USB, reformated the drives, new settings. I basically setup Unraid from zero again. I cannot get an uptime past 15 days. It ends up crashing. I am now on vacation and I cannot access my Plex or any other services hosted on it because it is down. I connect to my home network with a VPN and can't even ping the server. I have no idea about how to debug this. There is nothing in the syslog server.
  9. I was suspecting that. Specially because of my config change, which makes me think the flash drive may be dying. How would go to try to determine that? It's not possible to run any of the tests that can be executed on a cache or array drive on the flash drive, isn't it?
  10. I'll check the BIOS settings. Although, would it even make sense that this is the cause when I've used the same BIOS settings for years and this crashing has started to happen a month and a half ago? Nope, nothing. At least not while knowing that I was doing so. I actually don't remember changing anything in Unraid for a long while. I have my Plex, my VMs that I start with Wake Up on LAN... I have the Dashboard opened all the time, but just because I like looking at it haha.
  11. So it happened again. It happened on the 12th and this is every log entry from that day: Aug 12 01:01:34 UnBlue emhttpd: read SMART /dev/sdd Aug 12 01:02:23 UnBlue root: /etc/libvirt: 27.6 MiB (28966912 bytes) trimmed on /dev/loop3 Aug 12 01:02:23 UnBlue root: /var/lib/docker: 2.9 GiB (3095732224 bytes) trimmed on /dev/loop2 Aug 12 01:02:23 UnBlue root: /mnt/cache: 242.4 GiB (260238061568 bytes) trimmed on /dev/nvme0n1p1 Aug 12 02:00:01 UnBlue crond[1341]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Aug 12 02:03:55 UnBlue emhttpd: spinning down /dev/sdd Aug 12 02:40:55 UnBlue emhttpd: read SMART /dev/sdd Aug 12 03:42:45 UnBlue emhttpd: spinning down /dev/sdd Aug 12 17:52:17 UnBlue emhttpd: read SMART /dev/sdd Aug 12 18:52:40 UnBlue emhttpd: spinning down /dev/sdd As far as I know. There is nothing weird there. The next log entry is from today (I was away so I couldn't restart the server earlier). There is one thing that did caught my eye. On restart I have the notification: unassigned.devices.plg: An update is available. Click here to install version 2022.08.12. I can't say it for sure, but I believe every time my server has crashed I've found a pending update for that same plugin.
  12. Things got weird. I don't know why but my VM settings changed. The VM manager tells me my default VM storage path does not exist. It's set to `/mnt/user/domains/` and, in fact, that share does not exist. I used `/mnt/user/vdisks/` (Yup, I got into this because of Linus Tech Tips). Setting it back to vdisks fixes it but the fact that my configuration just changed scares me. I feel like the system is slowly dying.
  13. After noticing the word "diagnostics" turned into an hyperlink I discovered the persistent logs feature of Unraid. Neat trick of the forums! Hopefully that will tell me more next time it happens.
  14. So basically the tittle. It's been three mornings I wake up to my server not being up. The computer is powered on but the Web GUI can't be seen, my Plex is down too. I can't even ping the machine. I've restarted the machine and downloaded the diagnostics but the logs contain only entries since the last boot. So basically only a few minutes. As far as I can see I won't find any error log of the crash there. What can I do? unblue-diagnostics-20220731-2137.zip
  15. I have 2 Nvidia GPUs. A 1070 and a 670. I was using both in two Windows 10 machines without issue. A few days ago I decided I wanted to try Windows 11 and change some things regarding how my VMs and their disks were configured. So I updated Unraid to the lates Release Candidate. I also updated my motherboard BIOS because it was old and went through every setting in the BIOS to make sure everything was fine. Now passthrough of my 670 doesn't work no matter what I do. Using OVMF the VM boots but the GPU presents the code 43 error, which is normal, but using SeaBIOS the VM doesn't do anything. Complete black screen with one core stuck at 100%. And I guess Windows never even boots because the router doesn't assign any IP to the VM. This is both in Windows 10 and 11. So my previous VM doesn't work any longer. The 1070 works fine. The vBIOS of the 1070 supports UEFI but the vBIOS of the 670 does not. Because of that my Windows 10 VMs were OVMF for the 1070 and SeaBios for the 670. I've noticed my Unraid boots in UEFI mode. I honestly don't know how it was before. But I wonder if Unraid being in UEFI doesn't play nicely with using SeaBIOS. Could that be what stops me from using my 670? In case it is, is there something I should be wary about if I were to disable the UEFI mode of Unraid to boot in Legacy mode?
  16. I've never been into overclocking so I have no idea how risky each thing is. I knew XMP was an overclock, but since it was a suggestion from the system itself and I didn't know about the limit for Ryzen CPUs I guess I didn't worry at all. Anyway, thank you for the help. I hope I don't see further corruption after undoing the overclock.
  17. Hmmm, didn't know my CPU wasn't meant to have 3200 MT/s RAM when using 4 sticks. It's true the default in the BIOS was lower. In order to increase it I used a XMP profile (is that the name of it?) the BIOS itself suggested to me, so I thought it was safe. Assuming that's the cause and the RAM modules themselves and other hardware is healthy, should I then tune it back to 2667 MT/s to avoid further corruption? Or is there some other recommended procedure that will allow me to keep it at 3200 MT/s?
  18. Sure! Here it is. unblue-diagnostics-20211230-1018.zip
  19. PREAMBLE So, without actually experiencing any issues, I went to the syslog and saw many entries like the following: Dec 29 18:00:21 UnBlue kernel: BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 42601, gen 0 All of those point to the same device: md1. root@UnBlue:~# df -h Filesystem Size Used Avail Use% Mounted on rootfs 32G 1.6G 30G 5% / devtmpfs 32G 0 32G 0% /dev tmpfs 32G 0 32G 0% /dev/shm cgroup_root 8.0M 0 8.0M 0% /sys/fs/cgroup tmpfs 128M 2.7M 126M 3% /var/log /dev/sda1 15G 947M 14G 7% /boot overlay 32G 1.6G 30G 5% /lib/modules overlay 32G 1.6G 30G 5% /lib/firmware tmpfs 1.0M 0 1.0M 0% /mnt/disks tmpfs 1.0M 0 1.0M 0% /mnt/remotes /dev/md1 7.3T 5.9T 1.5T 80% /mnt/disk1 /dev/md2 7.3T 1.4T 6.0T 18% /mnt/disk2 /dev/md3 1.9T 372M 1.9T 1% /mnt/disk3 /dev/nvme0n1p1 932G 580G 351G 63% /mnt/cache shfs 17T 7.2T 9.3T 44% /mnt/user0 shfs 17T 7.2T 9.3T 44% /mnt/user /dev/loop2 24G 4.7G 20G 20% /var/lib/docker /dev/loop3 1.0G 5.2M 904M 1% /etc/libvirt So I go and check the disk1 drive. SMART test shows no errors. scrub does find checksum errors. UUID: 328d4811-6fe2-4785-a6c3-1d05a7cf6133 Scrub started: Wed Dec 29 12:42:49 2021 Status: finished Duration: 7:59:57 Total to scrub: 5.80TiB Rate: 211.30MiB/s Error summary: csum=7411 Corrected: 0 Uncorrectable: 0 Unverified: 0 That sounds like a lot of errors. I can see the files with those errors in the syslog now. They are quite a lot too, spread among different folders: Plex videos, some Steam games, 2 of my VM disks... That's probably irrelevant though. I have no idea if those files are fine and the error is actually on the checksum so I don't intend to fix them with scrub. And I can delete all the problematic files. It will take some time to do it safely and make sure I don't lose anything from the VMs. WHAT THIS POST IS ACTUALLY ABOUT This post is about figuring out what caused those errors. What can I do about it? Some extra information: All of the drives that form the array (parity disk included) are new (months old) to the system. Up until recently I had only a parity and another disk. But they started to report some "Reallocated sector count" that eventually turned into a few "Offline uncorrectable", so I replaced them and saw the opportunity to expand the array. Is it possible I've carried some issue from the past drives into these?
  20. I'm interested in this as well. I want to use a newer and physically smaller USB for the UnraidOS and start from the default state. It's been a year since I started using Unraid and I feel like I've done many things inefficiently. And I don't trust myself to "just remove whatever you don't want" because I will probably miss something.
  21. Hello, Was this bug reintroduced at some point? I'm using unRAID 6.8.3 and I have this issue.
  22. Oh, okay. Are those numbers the expected result then? Thank you for indulging me anyway.
  23. Didn't know that rsync was slow. Anyway, I tried: root@UnBlue:/mnt/user/vdisks/BlueSialia - Windows 10# pv vdisk1.img > Test.img ^C.9GiB 0:00:16 [ 740MiB/s] [===> ] 8% ETA 0:02:51 It fluctuated between 400MiB/s and 900MiB/s. Faster, but still 4 to 5 times slower than the advertised speed.