MikaelTarquin Posted April 24, 2022 Share Posted April 24, 2022 A few days ago I had a power outage. My ups allowed the server to gracefully shutdown, but then I was unable to bring it back up the next day. It turns out the USB drive was bad. I didn't have a flash back up, so I made a new USB and used the registration tool to reclaim my license. Everything seemed to go very smoothly, at first. Today I was notified that my Ombi page isn't working, and sure enough I can't login either. In looking for possible causes, I noticed on my dashboard that my log is using 100% of its memory. I am unsure what is causing this, so I attached the diagnostics here. Would anyone be able to help me figure out why this is happening? Thank you! nnc-diagnostics-20220424-0943.zip Quote Link to comment
Squid Posted April 24, 2022 Share Posted April 24, 2022 Looks like the root problem here is that your file system on the cache drive is corrupted. This is caused a lot of times by memory being bad. Run a memtest. (Side note, if you have no plans to upgrade to a multiple device pool, then usually you're better off using XFS as it's more forgiving on systems that are not 100% rock stable) Quote Link to comment
MikaelTarquin Posted April 24, 2022 Author Share Posted April 24, 2022 (edited) Ok, is the best way to do a memtest from the boot menu, and let it run for a few days? I replaced the cache drive very recently. It seems Plex and others are working, how best should I handle the corrupted file system? EDIT: I saw this post from a few years back saying it's pointless to run memtest with ECC RAM. Is that true? My ram is ECC (Dell poweredge t630). https://forums.unraid.net/topic/91204-how-to-run-memtest-headless/?do=findComment&comment=846406 Edited April 24, 2022 by MikaelTarquin Quote Link to comment
itimpi Posted April 24, 2022 Share Posted April 24, 2022 I think the version of memtest you can download from memtest86.com can handle ECC RAM. Quote Link to comment
MikaelTarquin Posted April 25, 2022 Author Share Posted April 25, 2022 Thanks! Is there a recommend way to run that on my unRAID server, or do I just need to run that on its own boot device? 3 hours ago, itimpi said: I think the version of memtest you can download from memtest86.com can handle ECC RAM. Quote Link to comment
JonathanM Posted April 25, 2022 Share Posted April 25, 2022 10 minutes ago, MikaelTarquin said: run that on its own boot device This. Quote Link to comment
JorgeB Posted April 25, 2022 Share Posted April 25, 2022 Apr 23 05:08:42 NNC kernel: macvlan_broadcast+0x10e/0x13c [macvlan] Apr 23 05:08:42 NNC kernel: macvlan_process_broadcast+0xf8/0x143 [macvlan] If the server sometimes crashes see below. Macvlan call traces are usually the result of having dockers with a custom IP address, upgrading to v6.10 and switching to ipvlan might fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right)), or see below for more info. https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/ See also here: https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/ Quote Link to comment
MikaelTarquin Posted June 14, 2022 Author Share Posted June 14, 2022 I still need to run memtest, and update Unraid to v6.10, but have been busy with a move and unable to find the time. However, today I noticed my cache drive is throwing a SMART error again (Reallocated Sector Counts) This exact thing happened almost exactly 1 year ago, and I was unable to solve the problem then short of buying a new SSD. Needless to say, seeing an expensive 2TB SSD throw SMART errors after only 1 year and ~30TB of writes is extremely upsetting. If it's related, during the move, I also discovered I was unable to boot my server (a Dell T630) until I moved a stick of RAM out of slot B1 (currently slots A1, A2, B2, and B3 are populated). Swapping other DIMMs didn't resolve the error, it was only when that slot was unpopulated that it got to BIOS. Am I just screwed? nnc-diagnostics-20220613-1909.zip nnc-smart-20220613-1918.zip Quote Link to comment
JorgeB Posted June 14, 2022 Share Posted June 14, 2022 SMART test is failing so it should be replaced. Quote Link to comment
MikaelTarquin Posted June 14, 2022 Author Share Posted June 14, 2022 But I JUST replaced it 13 months ago. This drive has 9000 hours of use, and only 30TB of writes. I don't want to just blindly replace expensive SSDs every 12 months if something in UNRAID or my system is killing them. Quote Link to comment
JorgeB Posted June 14, 2022 Share Posted June 14, 2022 It can be Unraid killing the SSD, it could be if there was an unusual high amount of writes, but that's not the case. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.