97WaterPolo

BTRFS Cache drive failed, how to restore and update to ZFS?

97WaterPolo replied to 97WaterPolo's topic in General Support

Even if it dropped offline the restart should've fixed it right? It seems like that 1 drive that's bad is consistently failing to read and write in syslog as well. If I reformat it there's a strong chance that I'll lose all the data on both the drives, so if data preservation is my main goal swapping it out and letting it rebuild would be the best? It looks like the cache pool is mounted in read only mode since I can't run a btrfs scrub, it just instantly aborts if run through GUI or through cmd line. Are you saying if I stop the array and unmount the drive that's having issues from the pool and then restart the array it'll resume just fine since the good drive is still in the "pool"? I thought I'd need at least two drives for it to be a cache pool and if I remove one it won't start up since it's supposed to be mirrored.

BTRFS Cache drive failed, how to restore and update to ZFS?

97WaterPolo replied to 97WaterPolo's topic in General Support

Since only one of the drives is failing, can I just swap out that one drive and see if it rebuilds from the good drive to the new drive? I did do an rsync onto my array but it completed with some file errors so I'm not sure if I have a perfect copy. I do use CA backup, but I never realized that it was wiping all of the previous runs. So I only have a backup from this past Monday but I'm not sure when the corruption started so I'd like to resort to wiping and reformatting as the last result in the event my backup is corrupted. Will swapping out the bad drive with the new one and letting the system rebuild the pool work?

BTRFS Cache drive failed, how to restore and update to ZFS?

97WaterPolo replied to 97WaterPolo's topic in General Support

Hi @JorgeB I did do a restart, but once I re-enabled anything reading/writing from the cache drive it started throwing errors in syslog and I am seeing errors again on btrfs dev stats after wiping it. I think that specific drive might be shot? Is there any issue if I were to swap it out with a new 2TB NVME SSD or should I try to wipe and reformat the bad drive? I also tried scrubbing that drive and I am getting an error code of -30. I also see a Fix common problems error where it is Unable to write to cache, I feel like after the reboot I am in a worse state as before it was still able to read/write from the cache pool. I do however see the device back in the dashboard as accessible and reading a temperature again What are your thoughts on migrating to 6.12.X to use ZFS instead of btrfs? aincrad-diagnostics-20240424-2120.zip

BTRFS Cache drive failed, how to restore and update to ZFS?

97WaterPolo posted a topic in General Support

Hi everyone, Server (6.11.5) has been running like a dream for so long I forgot it exists. Anyways I hopped on today and I got a bunch of errors on 1 of my cache drives. I ran the btrfs dev stats and noticed my nvme1n1p1 had a bunch of errors, I zeroed it out and run btrfs scrub and still have numerous errors. From what I've seen on other forum posts looks like my drive is on the way out. I ordered a new 980 PRO 2TB NVME and it should be here tomorrow, but I'd like some advice on how to migrate over. So for right now it looks like all my docker containers and VMs are running without any issue and seem to be writing to the Cache 2 (the drive without any errors). I am planning on leaving everything running until tomorrow evening when I get the new drive in from Amazon (I assume this would be okay since the other drive is operating fine). Once I receive it I am planning on doing the following. Stopping all VMs/Docker containers Change all my Shares to "Yes" where it has Cache as Only/Prefer (I am still on 6.11.5) Run the mover to get everything off the cache drives and onto my array. Shutdown the server Replace the bad drive with the new one from amazon Start array and assign the new drive to the pool Let the pool rebuild and then revert my changes to the shares from above, Run mover Restart VMs/Docker containers Does this seem like an appropriate checklist to get my cache pool back up and running without any data loss? I've also been reading up about ZFS and using it for the cache pool, but it looks like for stable support I would do best upgrading to 6.12.X and then trying to set up my cache pool in ZFS instead of BTRFS. I assume it would be worth it to get my existing cache pool back up and running first before I do any migration of sort, but if ZFS is less prone to errors for a cache pool, would it make sense to stop at step 6 and do the following? 6. Start array and assign the new drive to the pool 7. Upgrade UnraidOS to 6.12.X 8. Setup the new cache pool as ZFS 9. Revert changes to share and then run mover 10. Restart VMs/Docker containers My only hesitation is that it seems like a major jump in versions from 6.11 to 6.12, should I wait until my array is in a stable state before I do the upgrade? Is it even worth to upgrade if the only thing I want out of it is ZFS for my cache pool? aincrad-diagnostics-20240423-2306.zip

File Sync isn't occuring from unraid desk unless "ls -l" on Virtual Machine VirtioFs

97WaterPolo posted a topic in General Support

Hi everyone, I set up a little test scenario below that illustrates the symptoms of what I am experiencing, So I have the following criteria Virtiofs Mode for "/mnt/user/Backup/Logs/" => "logs" fstab entry of "logs /mnt/logs virtiofs ro,relatime,sync 0 0" Full rwe on the test file I executed the commands back and forth from left to right. root "ls -l" to display the current directory on unraid os alexander "ls -l" to display current directory on the virtual machine root "cat testfile" to display the content of the file on unraid alexander "cat testfile" to display the content of the file on the virtual machine root "sudo nano testfile" and append a string in nano root "cat testfile" to display the new content after nano modify alexander "cat testfile" still has the old content of the file alexander "ls -l" which relists the directory and refreshes some cache alexander "cat testfile" now has the new content from unraid drive. I have tried this routine numerous times and it seems to always to have the old file until I do "ls -l" or some background process that I am not aware of refreshes the file. So far the only thing that will for sure refresh the file is "ls -l", I could consistently "cat testfile" for 10+ runs and it won't change until a "ls -l" was ran, then it changed instantly. One of the things that I thought might be an issue was if the share was on a cache pool, but upon checking it is disabled for my "Backup" share. Any help would be greatly appreciated!! Thank you!

February 22

Machine Check Events detected on your server

97WaterPolo replied to 97WaterPolo's topic in General Support

Had another MCE error today, checked the logs and I got an unhelpful message. Is there a way to check what went wrong without running mce since my CPU doesn't support it? May 7 04:30:08 AINCRAD root: Fix Common Problems: Error: Machine Check Events detected on your server May 7 04:30:08 AINCRAD root: mcelog: ERROR: AMD Processor family 25: mcelog does not support this processor. Please use the edac_mce_amd module instead. May 7 04:30:08 AINCRAD root: CPU is unsupported May 7 04:30:12 AINCRAD root: Fix Common Problems: Warning: Docker Update Patch not installed aincrad-diagnostics-20230507-2218.zip

Machine Check Events detected on your server

97WaterPolo posted a topic in General Support

Hi, I logged on today and saw this error in my Fix Common Problems tab! I’ve seen it once before about a month ago and I cleared it as I thought it was a fluke since I recently did a restart. Now that the server has been running for awhile and I got this error I’m hoping someone could point me in the right direction. I tried running mcelog, but I got mcelog: ERROR: AMD Processor family 25: mcelog does not support this processor. dule instead. CPU is unsupported Please use the edac_mce_amdmo I have also attached my diagnostics in the hope someone could help! I haven’t noticed any issues or failures since I got the message. Thank you!! aincrad-diagnostics-20230430-2239.zip

UnraidOS instance been hacked?

97WaterPolo posted a topic in Security

Hi everyone, Little nervous given the topics of the forums that I've found from searching, especially the one starting with "TL;DR If you're seeing constant logs from avahi-daemon, beware, you probably got hacked." What has happened recently: For the past few months I've had random crashes where unraidos would freeze up, I thought it was because of my build as well as docker containers, etc but none of it has stopped it. I've disabled C-States, locked docker containers to the core, altered the power idling, and I still have random crashes ranging from 1 day to 3 months (increasing more recently) Starting today I've had logs spamming my syslog from avahi-daemon and that lead me on a search to the attached forum posts I checked my ifconfig and I've found a bunch of Network Interfaces I've never seen before (I'm used to eth0, lo, wgo, and br0. I had a bond0, and multiple br-XXXXXXX I've had issues connecting to my server VIA WireGuard and Tailscale. I was able to a few weeks ago but I tried this morning and no connection. I was unable to ping anything, hostnames (google.com) or numerical lookup of google (142.250.68.110) Docker Community Apps throws an error saying it can't retrieve a feed Common Fix and Problems reports that it can't connect to github.com Recently got alerts from my "Deco" app, which notifies my whenever a new device joins the network and I've been getting a few "UNKNOWN DEVICE HAS JOINED THE NETWORK". This pops up from time to time so never thought about it till now. Exposure of UnraidOS server (192.168.68.114): Port Forwardings 192.168.68.114 (UnraidOS Server) Internal: 6881 External: 6881 (nothing running on that port) 192.168.68.114 (UnraidOS Server) Internal: 51820 External: 51820 (Wireguard VPN Service) 192.168.68.48 (Nginx) Internal: 8080 External: 80 (Nginx Proxy Manager) 192.168.68.48 (Nginx) Internal: 4443 External: 443(Nginx Proxy Manager) Nginx Proxy Manager All of my docker containers are on br0 and all have static IPs assigned to them, and then I do routing of services I want to expose outside (like Jellyfin, or Gitea, etc) of my network. All have SSL certs I have numerous shares on my network and they all require a username and password to access Only recently starting working on my UnraidOS instance again last couple days. For the most part I leave it alone, but one of the things I wanted to do was create VLANs either at the Router level, Unraid Level, or Software level so I enabled Virtual Machines and installed the stock CentOS iso to play around with. I've made some VLANs VIA Unraid and also tried to do some VIA my router but none of it has worked so I've kind of reversed whatever I did. Please any advice would be appreciated, my server crashed at 3AM this morning so I had to do an unclean shutdown so my parity is currently rebuilding.

POSSIBILITY HACKED?!?!?

97WaterPolo replied to kodyorris's topic in General Support

@kodyorris Did you ever figure out what happened with your UnraidOS box. I just started having those logs in console today and I have no clue what's happening.

Hack confirmed dealing with weird avahi-daemon logs...dang..

97WaterPolo replied to kris_wk's topic in Security

Hi @kris_wk Do you have an example of the Avahi logs that you were talking about? I recently had some Avahi logs start spamming my syslog and I'm not exactly sure what it is? Google takes me to this thread and it's kinda shocking to read. My errors are the following Apr 5 20:18:16 AINCRAD avahi-daemon[7137]: Joining mDNS multicast group on interface vethde33b3b.IPv6 with address fe80::3c9e:1dff:fe6b:2d30. Apr 5 20:18:16 AINCRAD avahi-daemon[7137]: New relevant interface vethde33b3b.IPv6 for mDNS. Apr 5 20:18:16 AINCRAD avahi-daemon[7137]: Registering new address record for fe80::3c9e:1dff:fe6b:2d30 on vethde33b3b.*. Apr 5 20:18:19 AINCRAD kernel: br-870a6a64b157: port 6(vethde2abeb) entered disabled state Apr 5 20:18:19 AINCRAD kernel: vethcf927ee: renamed from eth0 Apr 5 20:18:19 AINCRAD avahi-daemon[7137]: Interface vethde2abeb.IPv6 no longer relevant for mDNS. Apr 5 20:18:19 AINCRAD avahi-daemon[7137]: Leaving mDNS multicast group on interface vethde2abeb.IPv6 with address fe80::c7:e4ff:fed0:ed63. Apr 5 20:18:19 AINCRAD kernel: br-870a6a64b157: port 6(vethde2abeb) entered disabled state Apr 5 20:18:19 AINCRAD kernel: device vethde2abeb left promiscuous mode Apr 5 20:18:19 AINCRAD kernel: br-870a6a64b157: port 6(vethde2abeb) entered disabled state Apr 5 20:18:19 AINCRAD avahi-daemon[7137]: Withdrawing address record for fe80::c7:e4ff:fed0:ed63 on vethde2abeb. Apr 5 20:18:21 AINCRAD kernel: veth6b779e3: renamed from eth0 Apr 5 20:18:21 AINCRAD kernel: br-870a6a64b157: port 7(vethde33b3b) entered disabled state Apr 5 20:18:21 AINCRAD avahi-daemon[7137]: Interface vethde33b3b.IPv6 no longer relevant for mDNS. Apr 5 20:18:21 AINCRAD avahi-daemon[7137]: Leaving mDNS multicast group on interface vethde33b3b.IPv6 with address fe80::3c9e:1dff:fe6b:2d30. Apr 5 20:18:21 AINCRAD kernel: br-870a6a64b157: port 7(vethde33b3b) entered disabled state Apr 5 20:18:21 AINCRAD kernel: device vethde33b3b left promiscuous mode Apr 5 20:18:21 AINCRAD kernel: br-870a6a64b157: port 7(vethde33b3b) entered disabled state Apr 5 20:18:21 AINCRAD avahi-daemon[7137]: Withdrawing address record for fe80::3c9e:1dff:fe6b:2d30 on vethde33b3b. Apr 5 20:18:21 AINCRAD kernel: br-870a6a64b157: port 6(veth06c5b41) entered blocking state Apr 5 20:18:21 AINCRAD kernel: br-870a6a64b157: port 6(veth06c5b41) entered disabled state Apr 5 20:18:21 AINCRAD kernel: device veth06c5b41 entered promiscuous mode Apr 5 20:18:21 AINCRAD kernel: br-870a6a64b157: port 6(veth06c5b41) entered blocking state Apr 5 20:18:21 AINCRAD kernel: br-870a6a64b157: port 6(veth06c5b41) entered forwarding state Apr 5 20:18:22 AINCRAD kernel: eth0: renamed from veth3087c0e Apr 5 20:18:22 AINCRAD kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth06c5b41: link becomes ready Apr 5 20:18:23 AINCRAD avahi-daemon[7137]: Joining mDNS multicast group on interface veth06c5b

Kernel Panic - Not syncing: stack-protector Kernel stack is corrupted

97WaterPolo replied to 97WaterPolo's topic in General Support

What does switching from Macvlan to ipvlan do? Will I still be able to bind my docker containers to specific IPs on my network?

Kernel Panic - Not syncing: stack-protector Kernel stack is corrupted

97WaterPolo posted a topic in General Support

Hi everyone, I've been having a lot of issues with my UnraidOS server randomly freezing with nothing triggering it as far as I know. All these freezes occur at random occurrences (from 1 day to 3 months apart) with a random load (from middle of the day when no one is using to middle of the night when others are connecting to the server). I have searched through the forums countless times trying to figure out what is wrong with my build and I can't find anything that points me in the right direction. I would love some input on what else to try because every time I attempted one of the fixes below it would work and then randomly crash which is quite heartbreaking because I keep thinking I fixed it. About 4 months ago I updated to 6.11.5 and updated some hardware and the crashes have been happening for the last 2 months or so. I used to leave my server alone in the corner of the room and only touch it whenever I wanted to add something new, but the amount of crashes and uncertainty recently has really been bothering me and I'd love to get some help! Hardware Specs: AMD Ryzen 9 5950X 16-Core @ 3400 MHz X570S AERO G 4 x 32GB @2133Mhz 2 x 2TB Samsung SSD 970EVO (Cache Pool) 2 x 18TB HDDs for parity 3 x 4TB HDDs for data 4 x 8TB HDDs for data NVIDIA GeForce GTX 1060 3GB (For Tdarr Encoding used rarely) Things I have attempted: Disabled XMP Profile Memtextx86, ran with no errors whatsoever Disable C-States globally () Pin my docker containers to specific CPU cores This is the first time I've had a crash with the Unraid server hooked up to another monitor with a syslog tail (In the past I've used the syslog server and that never captured any useful information) which is why I have this screenshot. Following up on some forum research, I saw a post here reference something to do with docker and switching to ipvlan after 6.10+ but the first URL is broken, is there any information regarding this? ( I have a good chunk of docker containers, but majority of everything is on br0 with custom IPs. I have Virtual Machines enabled but I don't have any running. Any help would be greatly appreciated, thanks in advance! aincrad-diagnostics-20230328-1832.zip

Cache pool mounted read only with RAID showing full but disk showing empty

97WaterPolo replied to 97WaterPolo's topic in General Support

Implemented a monthly scrub after running a scrub once more with no errors. Thank you for the input on the schedule and utilization! Much appreciated

Cache pool mounted read only with RAID showing full but disk showing empty

97WaterPolo replied to 97WaterPolo's topic in General Support

Got it, thank you for the input! After I restarted by system and it got mounted as read-only, I stopped the array and mounted just the cache. I was then able to delete the syslog file and then I ran the btrfs scrub which fixed a bunch of errors (thankfully no uncorrectable errors) and then I did a "btrfs device stats -z /mnt/cache" to zero out the numbers. Since then it has been running smoothly with no issues (last few days). Re-do the cache as in change all my shares to the array, and then re-format both of my cache drives in my pool? Thank you for clarifying on how it allocates chunks. I didn't realize that it dynamically adds and removes as needed. Since it was rather class to the size of my old hard drives (around 256GB) I thought it was something related to that rather than a file system corruption. I followed and set up alerts so that I will know if there are ever errors so I can do a scrub. Do you think it is still worth it to move cache to array, and then reformat my pool? EDIT: Upon checking my pool I see that the balance and scrub is disabled, should I enable them on some sort of schedule?

Cache pool mounted read only with RAID showing full but disk showing empty

97WaterPolo replied to 97WaterPolo's topic in General Support

Thank you! I've attached my fdisk -l result, I do have it as the full partion size, how do I increase the currently allocated btrfs size in UnraidOS so that it can fully use the whole drive?

Posts

Joined

Last visited

Recent Profile Visitors

97WaterPolo's Achievements

Noob (1/14)

Reputation

BTRFS Cache drive failed, how to restore and update to ZFS?

BTRFS Cache drive failed, how to restore and update to ZFS?

BTRFS Cache drive failed, how to restore and update to ZFS?

BTRFS Cache drive failed, how to restore and update to ZFS?

File Sync isn't occuring from unraid desk unless "ls -l" on Virtual Machine VirtioFs

Machine Check Events detected on your server

Machine Check Events detected on your server

UnraidOS instance been hacked?

POSSIBILITY HACKED?!?!?

Hack confirmed dealing with weird avahi-daemon logs...dang..

Kernel Panic - Not syncing: stack-protector Kernel stack is corrupted

Kernel Panic - Not syncing: stack-protector Kernel stack is corrupted

Cache pool mounted read only with RAID showing full but disk showing empty

Cache pool mounted read only with RAID showing full but disk showing empty

Cache pool mounted read only with RAID showing full but disk showing empty