Hollandex

Members
  • Posts

    29
  • Joined

Everything posted by Hollandex

  1. Is there a way to make the Palworld server NOT a community server? I just want to host it on my LAN for my family. I don't have 8211 forwarded, obviously, but I'm curious if that's enough to keep my server off the community list.
  2. I switched to a single drive using xfs, with an aggressive backup schedule. I haven't a single corruption issues since I switched. I'd like to move to zfs at some point, but I don't see a reason to yet. If it ain't broke...
  3. My thread history will tell the tale. haha Long story short, I've never had much luck with it. Always getting file corruption errors, to the point of having to format the entire cache pool. Which sort of negates the redundancy of BTRFS if I'm having to format the whole thing anyway. RAM seems to be the culprit, even though 12 hours of memtest show the RAM is working just fine. So yeah, BTRFS seems neat and apparently works great for 99% of the people that use it. But not me.
  4. Yeah, I'm running Mover now to get everything off the cache so I can format it again. I might abandon BTRFS and go to XFS with an aggressive backup schedule or something. At the very least, I'm curious if I run in to any data corruption issues outside of BTRFS. I appreciate all your help. I believe you've been the one replying to every help thread I've started. Thank you.
  5. Yeah, I ran memtest for about 12 hours a while back without any errors at all. This really seems to be a BTRF issue. Obviously, my hardware is playing some part in it but this RAM works fine in every other application. But when it comes to BTRFS, something isn't right. I'm sort of at a point where I don't care about redundancy on my cache drive. It hasn't helped anyway. I've had my entire cache drives go corrupt and become unrecoverable, leading me to lose everything on them. So I'm tempted to just use XFS and have an aggressive backup schedule or something.
  6. Correct. At the moment, all the cache has on it is domains, system, and appdata. No other directories.
  7. Not really sure where this type of topic should live. Hopefully, this is the correct spot. I'm curious why our only option for redundant cache pools is BTRFS. For instance, why couldn't we use XFS and let it handle the parity like it does with the array? Not against BTRFS. Just curious why that's the only option provided for cache pools.
  8. Sorry, I realize what you're saying now. I ran a scrub and get 6 errors that it can't fix. The files are odd. Things like usr/lib/freerdp2/libparallel-client.so etc/ssl/certs/ca-certificates.crt I'm not sure those files are even on the cache drives, are they?
  9. I swapped the RAM sticks, again, just to make sure I wasn't crazy. Corruption Errors started almost immediately after starting Unraid. This is how it happens every time. It's fine for a while, then they slowly start happening. And they get faster and more frequent until something gets really corrupt like my docker image. Then I have to format the drives completely and start the process all over. Another interesting thing I've noticed is that the errors seem to double on one of the drives. For instance, here's my current dev stats output. It shows 4 errors on nvme0 and 8 on nvme1. It's not always double but it often is, or close to it. [/dev/nvme1n1p1].write_io_errs 0 [/dev/nvme1n1p1].read_io_errs 0 [/dev/nvme1n1p1].flush_io_errs 0 [/dev/nvme1n1p1].corruption_errs 8 [/dev/nvme1n1p1].generation_errs 0 [/dev/nvme0n1p1].write_io_errs 0 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 0 [/dev/nvme0n1p1].corruption_errs 4 [/dev/nvme0n1p1].generation_errs 0
  10. Yup, I did that. I pass the -z flag every time I run it.
  11. Yup. As the text you quoted says, "I've disabled all overclocks".
  12. Okay, ran with one DIMM for a while without any issue. Switched to the other DIMM, no issue for about 2 weeks. Then, tonight, I started getting corruption errors again. So I figured the DIMM was bad, swapped back to the "good" DIMM, and I'm still getting errors. So now I'm not sure what's going on. Bad mobo?
  13. Pulled one stick out and have been running for a few days now with no corruption errors. Not sure if that means the stick I pulled out is bad or what. Figured I'll give it another few days and if there are still no corruption errors, I'll swap the sticks and see if it's a bad stick, or if it's something with running 2 sticks on the mobo.
  14. I'm running two 2TB NVMe drives for my cache pool. After some random amount of writes, `btrfs dev stats /mnt/cache` will report corruption_errs on both drives. When I scrub the cache, I will sometimes get checksum errors, sometimes not. I don't get any other errors from the scrub. The files in question pass their individual md5 checksums (if I have them) and I can safely copy them off the cache. Another thing odd is that it's reporting only 186.35GiB total to scrub. That is no where near the full 2TB but maybe I don't understand the scrub output. UUID: 71382277-c2da-416b-86b5-6725b66b58d1 Scrub started: Mon Nov 14 21:09:36 2022 Status: finished Duration: 0:00:33 Total to scrub: 186.35GiB Rate: 6.00GiB/s Error summary: no errors found I've disabled all overclocks on RAM and CPU on the motherboard, as far as I can tell. It's an Asus ROG Maximus XIII Hero so if there's something I should be checking in the BIOS, let me know. I've also run memtest86 against the RAM for ~12 hours with no errors. Is there anything I can do to diagnose the cause of this? At best, it results in `btrfs dev stats` reporting errors. At worst, I've had my docker image get corrupt and not start, my VMs get corrupt and not start, and one time my entire cache pool got borked and I lost everything on it. diagnostics-20221114-2118.zip
  15. SOLVED: Based on one random forum thread I found, I tried installing and then uninstalling the Dynamix Temperature Sensor plugin. That fixed it. Why? Your guess is as good as mine. But if anyone else runs in to this issue, try installing that plugin (found in CA) and then uninstall it. I recently upgraded to Unraid 6.11.2 and now there's this Airflow section on my Dashboard. It shows 1 fan with a 0 RPM. No idea what fan it thinks it's seeing. There are 6 fans in the computer in total, all on. So none of them are at 0 RPM. I don't have any fan control plugins installed. Where did this Airflow section come from and how do I get rid of it?
  16. My docker image got corrupted again. Probably due to cache drive errors but I'm still unsure. I recreated the docker image and, so far, everything is good. I ran btrfs dev stats against my cache drives and they both had a high number of corruption errors. No idea how old they were but I cleared them out. Then I ran a scrub against the pool and I got 5 checksum errors (all but 1 fixed, explained below). UUID: f0eb0645-ca4a-418e-bc12-95393fa57c50 Scrub started: Sun Nov 6 12:42:38 2022 Status: finished Duration: 0:02:09 Total to scrub: 762.89GiB Rate: 5.91GiB/s Error summary: csum=5 Corrected: 0 Uncorrectable: 0 Unverified: 0 Also, corruptions errors are starting to come back on one of the drives. They both had corruption errors prior to this but currently only one does since I cleared them an hour-ish ago. [/dev/nvme0n1p1].write_io_errs 0 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 0 [/dev/nvme0n1p1].corruption_errs 2 [/dev/nvme0n1p1].generation_errs 0 [/dev/nvme1n1p1].write_io_errs 0 [/dev/nvme1n1p1].read_io_errs 0 [/dev/nvme1n1p1].flush_io_errs 0 [/dev/nvme1n1p1].corruption_errs 0 [/dev/nvme1n1p1].generation_errs 0 On to the questions! First, I have a user script running hourly that should have warned me about the btrfs errors, but it didn't. Is there something incorrect with this? #!/bin/bash if mountpoint -q /mnt/cache; then btrfs dev stats /mnt/cache if [[ $? -ne 0 ]]; then /usr/local/emhttp/webGui/scripts/notify -i warning -s "ERRORS on cache pool"; fi fi Second question, how do I fix the checksum error? I saw the system logs mentioned the corrupted files. They weren't critical so I deleted them and scrubbed again. I still get 1 checksum error. Nothing in the syslog is pointing to a file. Any idea what this could be and/or how to fix it? Nov 6 13:38:26 Sanctuary ool www[7491]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/cache' '-r' Nov 6 13:38:26 Sanctuary kernel: BTRFS info (device nvme0n1p1): scrub: started on devid 2 Nov 6 13:38:26 Sanctuary kernel: BTRFS info (device nvme0n1p1): scrub: started on devid 1 Nov 6 13:38:32 Sanctuary kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 Nov 6 13:39:04 Sanctuary kernel: BTRFS info (device nvme0n1p1): scrub: finished on devid 2 with status: 0 Nov 6 13:39:05 Sanctuary kernel: BTRFS info (device nvme0n1p1): scrub: finished on devid 1 with status: 0 And last question, is there any way to diagnose the corruption issues? This is the third time I've had the docker image get corrupted, and every time there's been some cache drive errors, too. One time, the drives were completely borked and I had to format them. I ran memtest for over 12 hours with no errors. I doubt the drives are going bad, they were both bought a couple years ago when I built this system. I know btrfs can have issues if the RAM is overclocked on Ryzen systems. I do have XMP turned on but I'm not overclocking the RAM beyond its rated speeds and this is an Intel system. No idea if any of that matters.
  17. I'll pull back the XMP profile on the RAM and see if that does the trick. Thanks for your help!
  18. I wanted to mention that, last time this happened, I ran MemTest for about 12 hours with no errors. And I ran extended SMART tests on both NVMe drives. No errors. Is there any way to get raid functionality out of cache drives without BTRFS? I'd be curious to see if this issue is specific to BTRFS. If not, I may go to a single XFS cache drive with nightly backups.
  19. I posted a while back about my BTRFS cache pool getting borked. At the time, the solution was to reformat the drives and set them up again. I did that, all has been well for a couple months. Then, corruption again. So, I did my usual thing. Format the drives, set them up as a pool again, use the CA Backup plugin to restore my AppData, and I was off to the races. I added a UserScript that checks the BTRFS pool hourly (as suggested by JorgeB) and within a few hours, I started to get corruption errors on both of the NVMEs. root@Sanctuary:~# btrfs dev stats /mnt/cache [/dev/nvme0n1p1].write_io_errs 0 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 0 [/dev/nvme0n1p1].corruption_errs 4 [/dev/nvme0n1p1].generation_errs 0 [/dev/nvme1n1p1].write_io_errs 0 [/dev/nvme1n1p1].read_io_errs 0 [/dev/nvme1n1p1].flush_io_errs 0 [/dev/nvme1n1p1].corruption_errs 2 [/dev/nvme1n1p1].generation_errs 0 root@Sanctuary:~# btrfs fi usage -T /mnt/cache Overall: Device size: 3.64TiB Device allocated: 310.06GiB Device unallocated: 3.33TiB Device missing: 0.00B Used: 95.50GiB Free (estimated): 1.77TiB (min: 1.77TiB) Free (statfs, df): 1.77TiB Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 173.05MiB (used: 64.00KiB) Multiple profiles: no Data Metadata System Id Path RAID1 RAID1 RAID1 Unallocated -- -------------- --------- --------- -------- ----------- 1 /dev/nvme0n1p1 153.00GiB 2.00GiB 32.00MiB 1.67TiB 2 /dev/nvme1n1p1 153.00GiB 2.00GiB 32.00MiB 1.67TiB -- -------------- --------- --------- -------- ----------- Total 153.00GiB 2.00GiB 32.00MiB 3.33TiB Used 47.56GiB 195.88MiB 48.00KiB I read somewhere that maybe a "scrub" is in order. I assume that's "btrfs scrub /mnt/cache"? I didn't want to start blindly typing commands, though. Should I run a scrub? And, if so, do I scrub the cache pool or an individual drive? I realized there's an option to scrub in the cache settings. I did that, no errors found. But the size doesn't look right at all. My cache is two 2TB drives. And there is only about 300MB used. It should have been scrubbing nearly the full 2TB, right? UUID: f0eb0645-ca4a-418e-bc12-95393fa57c50 Scrub started: Tue May 3 13:45:16 2022 Status: finished Duration: 0:00:18 Total to scrub: 95.76GiB Rate: 5.32GiB/s Error summary: no errors found Any other suggestions? Or any other output that might be helpful? Thanks!
  20. UPDATE I ran 8 passes of MemTest, across ~14 hours. Zero errors. I might try more passes later but, for now, I'm satisfied there aren't any issues with the RAM. I ran an extended self test on both NVMEs. No issues found. So, at this point, I have no idea why or how my btrfs pool got corrupted. Which kind of sucks. I'd love to pinpoint a reason so I can feel assured it won't happen again. I formatted each drive as XFS then put them back in a btrfs pool (this was the only way I could get Unraid to let me format the drives from btrfs to btrfs). I nuked my docker vdisk, just in case it was the culprit. And now everything is back to running like normal. Thanks to both of you, Squid and JorgeB, for the help!
  21. Okay, running memtest now. If the problem persists, since the cache is acting as if it's read only, will Mover be able to correctly move the contents off the cache and on to the array? Or should I just manually do a copy/paste from /mnt/cache to /mnt/disk1 (for instance)? I think I'll format the cache pool either way, just to be safe, so I wanted to make sure I get appdata/domains/system files properly backed up.
  22. I rebooted to no avail but I then shut the system down entirely so I could mess with the components inside the case. When I started it back up, everything is working fine again. Docker starts, VMs work, etc. So....does this still sound lime a RAM problem? I'll probably take the server offline tonight and let memtest run while I sleep. Edit: Spoke too soon. It started to work. Now it's all failing again. In the same way.
  23. I also just noticed that I can't start a VM. I get this error: "unable to open /mnt/user/domains/EndeavourOS/vdisk1.img: Read-only file system" Cache drives are mounted correctly and not even close to full.
  24. Diagnostics attached. I woke up today to Docker containers not working. If I tried to stop/restart any container, I got "Execution Error". I tried stopping Docker, deleting the vdisk, and starting Docker back up. Now, the Docker page says Docker failed to start. No idea what's going on. The server was working great last night. Never turned it off. Now it's borked. sanctuary-diagnostics-20220316-1131.zip