cintay Posted February 28 Share Posted February 28 Hello, I am at a loss here. I've been having headaches with my UnRAID setup. For context, I moved my entire UnRAID setup from an older supermicro build to a new build with the following specs (not sure if relevant): MOBO: ASUS Z790 CPU: i7 14700k RAM: Crucial Pro 4x48GB When I transferred the old USB over, I had to create a new UnRAID USB and transfer the license over because my old USB was 3.0. I bought a new Cruzer Blade USB and transferred the license over, then copied all the contents of the old USB into the new one. This worked for a little while until the internet cut out. I have the server on a UPS (the power didn't cut out either), so I don't think that affected anything. After rebooting, the server kept giving me issues. After countless hours of disabling Docker containers, reconfiguring new cache pools, etc., I'm at a loss. My entire docker service will not run, I just get: "Docker Service failed to start." I restarted back into safe mode and get the same thing. Since the old UnRAID was on a different version, I also downloaded the docker patch from CA. I also changed the docker IP settings to ipvlan (it was macvlan before the switch). I did this before the crashing and docker failing to load, though. I've attached my most recent diagnostics file. Any help is extremely appreciated. Thank you for reading tower-diagnostics-20240228-1133.zip Quote Link to comment
cintay Posted February 28 Author Share Posted February 28 Additional note: I've also seen the posts suggesting that the docker image is corrupted. My docker was working fine last night after a restart, but when I woke up this morning, the UnRAID webui was only showing the "UNRAID" logo and the background, which prompted me to restart the server, leading to this. My cache drive(s) are also not full. Despite it working before I restarted the server this morning, is this potentially the culprit? Quote Link to comment
trurl Posted February 28 Share Posted February 28 Won't directly fix the problem, but you should clean all this up. Feb 28 11:30:01 Tower root: Fix Common Problems: Warning: Share system set to cache-only, but files / folders exist on the array Feb 28 11:30:01 Tower root: Fix Common Problems: Warning: Share mediawiki_old set to not use the cache, but files / folders exist on the cache drive Feb 28 11:30:01 Tower root: Fix Common Problems: Warning: Share scratchshare set to not use the cache, but files / folders exist on the cache drive Feb 28 11:30:04 Tower root: Fix Common Problems: Other Warning: Background notifications not enabled Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share appdata set to use pool cache, but files / folders exist on the secondarycache pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share appdata set to use pool cache, but files / folders exist on the superfast pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share domains set to use pool cache, but files / folders exist on the secondarycache pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share frigate set to use pool cache, but files / folders exist on the secondarycache pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share neo_nextcloud set to use pool superfast, but files / folders exist on the cache pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share system set to use pool cache, but files / folders exist on the secondarycache pool Feb Quote Link to comment
trurl Posted February 28 Share Posted February 28 Why do you have 100G docker.img as xfs? Default 20G btrfs should work just fine, maybe a little bit larger if you have a lot of containers. Quote Link to comment
Solution trurl Posted February 28 Solution Share Posted February 28 You have btrfs csum errors on sdc, which is probably the reason for corruption. And a very good reason to do memtest. Do memtest before doing anything else. You don't even want to attempt to run any computer unless memory is working perfectly. Everything goes through RAM, the OS and other executable code, your data, everything. The CPU can't do anything with anything until it is loaded into RAM. Quote Link to comment
cintay Posted February 28 Author Share Posted February 28 Thank you so much for your response. 43 minutes ago, trurl said: Won't directly fix the problem, but you should clean all this up. Feb 28 11:30:01 Tower root: Fix Common Problems: Warning: Share system set to cache-only, but files / folders exist on the array Feb 28 11:30:01 Tower root: Fix Common Problems: Warning: Share mediawiki_old set to not use the cache, but files / folders exist on the cache drive Feb 28 11:30:01 Tower root: Fix Common Problems: Warning: Share scratchshare set to not use the cache, but files / folders exist on the cache drive Feb 28 11:30:04 Tower root: Fix Common Problems: Other Warning: Background notifications not enabled Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share appdata set to use pool cache, but files / folders exist on the secondarycache pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share appdata set to use pool cache, but files / folders exist on the superfast pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share domains set to use pool cache, but files / folders exist on the secondarycache pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share frigate set to use pool cache, but files / folders exist on the secondarycache pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share neo_nextcloud set to use pool superfast, but files / folders exist on the cache pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share system set to use pool cache, but files / folders exist on the secondarycache pool Feb Noted. I will fix this once I fix the biggest problem... hopefully I can fix that. 42 minutes ago, trurl said: Why do you have 100G docker.img as xfs? Default 20G btrfs should work just fine, maybe a little bit larger if you have a lot of containers. I think this was a remnant from a previous UnRAID setup where one of my containers was blowing up the Docker image and rather than fixing it, I made the docker.img extremely large. This was probably a mistake. Maybe it's good reason to create a new docker.img -- I'm just too scared to lose all my configuration. 34 minutes ago, trurl said: You have btrfs csum errors on sdc, which is probably the reason for corruption. And a very good reason to do memtest. Do memtest before doing anything else. You don't even want to attempt to run any computer unless memory is working perfectly. Everything goes through RAM, the OS and other executable code, your data, everything. The CPU can't do anything with anything until it is loaded into RAM. Sounds good. I started a memtest86 using a spare USB I had. I'm following the instructions from this Corsair guide. The guide says it will take an hour for 8 GB. Seeing that I have 192GB of RAM, does that mean it'll take an entire day? Hope not, but I guess that's what I'll have to do. Small aside: I may have mixed up the RAM kits I bought (I bought two 2x48GB kits). Perhaps that could be a problem? I don't understand though everything was working fine... until it wasn't. Quote Link to comment
trurl Posted February 28 Share Posted February 28 1 hour ago, cintay said: create a new docker.img -- I'm just too scared to lose all my configuration. That is going to be my recommendation after you get everything else taken care of. Nothing to be scared of. https://docs.unraid.net/unraid-os/manual/docker-management/#re-create-the-docker-image-file https://docs.unraid.net/unraid-os/manual/docker-management/#re-installing-docker-applications Quote Link to comment
cintay Posted February 29 Author Share Posted February 29 Thank you for your help. Turns out the memtest is failing. I returned my RAM, bought new sets of the same exact ones, and it's still failing. My MOBO and CPU are brand new, so I'm not entirely sure what's going on. I checked the compatibility of the MOBO and RAM sets and it turned out they weren't listed, so I bought a 2x48GG kit that was listed as compatible for my MOBO, but that also came up with errors in memtest. I think I'm going insane. Do you have any ideas? Thanks for your help. Quote Link to comment
JonathanM Posted February 29 Share Posted February 29 11 minutes ago, cintay said: Do you have any ideas? Just because something is brand new doesn't mean it's good. For this level of consumer equipment, you are the final QC. It's cheaper to deal with consumer returns than it is to thoroughly test every unit before shipping it. Quote Link to comment
trurl Posted February 29 Share Posted February 29 Are you sure you aren't overclocking? Quote Link to comment
cintay Posted March 1 Author Share Posted March 1 Thank you so much for all your help! I ended up resolving the problem (for now, at least, I hope it stays that way). I ended up returning the motherboard, CPU, and RAM and purchasing entirely different sets. Sets that failed: CPU: Intel i7-14700K MOBO: ASUS z790-V PRIME RAM: 2 x Crucial Pro 96GB kit (totaling 192GB) I returned that set and replaced the parts with the exact models (except the motherboard, which I changed to the "P" version because "V" ran out of stock): CPU: Intel i7-14700K MOBO: ASUS z790-P PRIME RAM: 2 x Crucial Pro 96GB kit (totaling 192GB) I then returned that and replaced it with this: CPU: Intel i7-14700K MOBO: Gigabyte Z790 AORUS Elite X RAM: 2 x TeamGroup T-Create Expert 64GB (2 x 32GB) (now totaling 128GB) This combination seems to work better. I think what was happening was one of (or some combination of): The MOBO I purchased didn't have the Crucial RAM as officially listed, but this should've been solved by my Crucial Pro purchase The amount of RAM I was attempting to push was too much Somehow my first and second CPU were faulty to some extent. I think the memory controller may have been damaged or nonfunctional Somehow the MOBO for the first two were faulty to some extent? Regardless, I kept getting errors running memtest on the first two setups. I brought home the last set, ran the memtest, and woke up to a giant "PASS" in green. Thank goodness. 19 hours ago, trurl said: Are you sure you aren't overclocking? I ran the memtest with different configurations -- some with XMP disabled (or in ASUS's BIOS, set to "manual", which I think turns off XMP. Gigabyte's BIOS makes it a little more obvious to turning off XMP). I also tried two sticks, four sticks, etc. I didn't try putting a single stick in each slot and running it, I figured since I was in the return period I'd just go ahead and return before it was too late. 21 hours ago, JonathanM said: Just because something is brand new doesn't mean it's good. For this level of consumer equipment, you are the final QC. It's cheaper to deal with consumer returns than it is to thoroughly test every unit before shipping it. I think this was the comment that pushed me to return and find something else. Thanks so much for the final nudge. I was dreading rebuilding it, but I think it ended up for the better. Microcenter rep even joked, "There's a reason we bundle those [motherboards] together." On 2/28/2024 at 2:44 PM, trurl said: That is going to be my recommendation after you get everything else taken care of. Nothing to be scared of. https://docs.unraid.net/unraid-os/manual/docker-management/#re-create-the-docker-image-file https://docs.unraid.net/unraid-os/manual/docker-management/#re-installing-docker-applications Thanks so much for these guides. I used these guides to rebuild the docker image and I'm in the process of redownloading and reconfiguring my docker applications. It wasn't as painful as I anticipated, thankfully, but still a little tedious to go back and configure the hardcoded URLs. Quote Link to comment
trurl Posted March 1 Share Posted March 1 46 minutes ago, cintay said: reconfiguring my docker applications If you used Previous Apps there shouldn't be any need to reconfigure. Quote Link to comment
cintay Posted March 1 Author Share Posted March 1 Just now, trurl said: If you used Previous Apps there shouldn't be any need to reconfigure. I realized that about halfway after -- over the years, I ended up downloading different versions of the same containers (linuxserver, binhex, etc.) and couldn't remember which ones I used. I think my last setup was using a combination of binhex, linuxserver, and others, which made it harder to map. Thanks again for the help, though. It's not as bad as I thought it would be. I'm trying to fix all the errors I neglected before so that in case I have to redo this all over again, it'll be smoother. On 2/28/2024 at 12:02 PM, trurl said: Won't directly fix the problem, but you should clean all this up. Feb 28 11:30:01 Tower root: Fix Common Problems: Warning: Share system set to cache-only, but files / folders exist on the array Feb 28 11:30:01 Tower root: Fix Common Problems: Warning: Share mediawiki_old set to not use the cache, but files / folders exist on the cache drive Feb 28 11:30:01 Tower root: Fix Common Problems: Warning: Share scratchshare set to not use the cache, but files / folders exist on the cache drive Feb 28 11:30:04 Tower root: Fix Common Problems: Other Warning: Background notifications not enabled Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share appdata set to use pool cache, but files / folders exist on the secondarycache pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share appdata set to use pool cache, but files / folders exist on the superfast pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share domains set to use pool cache, but files / folders exist on the secondarycache pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share frigate set to use pool cache, but files / folders exist on the secondarycache pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share neo_nextcloud set to use pool superfast, but files / folders exist on the cache pool Feb 28 11:30:04 Tower root: Fix Common Problems: Warning: Share system set to use pool cache, but files / folders exist on the secondarycache pool Feb Question: to solve the appdata set to use pool cache but files exist on __ pool, how do I go about doing that? My proposed steps are: 1. Change appdata share to set primary storage to "Cache" 2. Secondary storage (didn't realize this was a thing) to "None" 3. Then manually mv all /mnt/X/appdata to /mnt/cache/appdata ? (where X is the different pools) Is that it? Quote Link to comment
trurl Posted March 2 Share Posted March 2 8 hours ago, cintay said: Is that it? Yes that is probably the simplest. I see you have Dynamix File Manager installed. But nothing can move open files, so you will have to disable Docker in Settings before you can work with these. Quote Link to comment
cintay Posted March 2 Author Share Posted March 2 15 hours ago, trurl said: Yes that is probably the simplest. I see you have Dynamix File Manager installed. But nothing can move open files, so you will have to disable Docker in Settings before you can work with these. Okay, thank you so much! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.