takethecake Posted January 6, 2020 Share Posted January 6, 2020 (edited) So I recently got my first server up and running (for Plex, Sonarr, Radarr), and it's been working phenomenally for the last ~2 weeks, other than this one particular quirk: every so often (it's happened three times total now), I'll try and log into the server and it'll be unresponsive (can't ping or load login page). The first two times this happened, I rebooted the machine and still had the same problem - when I connected a monitor to the Unraid box I saw it just didn't see the USB drive in the BIOS. So I moved the USB drive to another port, and when I rebooted the server came up perfectly fine like nothing had happened. This last time, simply rebooting the server got it back up; I didn't need to relocate the drive this time. So my question is how can I figure out what is going on here? Some thoughts: 1) Where could I find relevant log files for seeing if there's some shutdown event that's occurring? (this is kind of a "troubleshooting this type of thing in general" type of question) 2) Is there a preferred USB drive size, brand, etc? I'm using an ADATA UV128 16GB USB3.0 drive, and it's currently plugged into the USB 3.0 port on the front of my case. Previously I had it plugged into a back I/O shield USB 3.0 port, and then internally using a female-USB-to-mobo-header cable, both worked initially and then needed to be switched after the BIOS stopped recognizing them. Thanks guys, loving Unraid and the community so far!! Edited January 14, 2020 by takethecake Quote Link to comment
trurl Posted January 6, 2020 Share Posted January 6, 2020 Plug it in to a USB2 port and leave it there. 1 Quote Link to comment
itimpi Posted January 6, 2020 Share Posted January 6, 2020 The normal recommendations are a USB2 drive and if possible use a USB2 port as they seem to be more reliable, and Unraid gains no perceptible performance gain from USB3 as it runs from RAM. 16GB is plenty of space. 1 Quote Link to comment
takethecake Posted January 7, 2020 Author Share Posted January 7, 2020 Hmm, seems like it's still doing it even when plugged into a USB2 port. If I make a new boot drive using a different USB stick, can I avoid setting up the array and all the docker containers again? Quote Link to comment
trurl Posted January 8, 2020 Share Posted January 8, 2020 If you have a good and current backup of your flash drive, you can always get everything back by preparing a new install to the same or different flash drive, and copy the config folder from your backup. Everything about your configuration is in that config folder. 1 Quote Link to comment
takethecake Posted January 8, 2020 Author Share Posted January 8, 2020 Got it, I wasn't quite sure how that worked but sounds easy enough! Thanks for the help, I really appreciate it! Quote Link to comment
JonathanM Posted January 10, 2020 Share Posted January 10, 2020 The config transfer to move to a new flash drive is contingent on a paid license, the trial version requires setting things up again, for the obvious reasons. It's not obvious from your posts whether you have a paid license yet, so I wanted to point that out. Also, since the flash hardware GUID is linked with the license, when you change flash drives the license must be reissued to activate the new drive and deactivate the old flash. This is an automated process through limetech's servers, with a 1 year restriction. If you need to change flash drives sooner than 1 year, you will need to contact them and ask for a manual reissue. 1 Quote Link to comment
takethecake Posted January 10, 2020 Author Share Posted January 10, 2020 Ahh yeah I hadn't acquired a license yet, so I learned that when I went through the process of making a new boot drive and just went ahead and bought one. But, of course, even after making the new boot drive with a new USB 2.0 drive, and copying JUST the /config files over, this dumb problem returned. The new boot drive booted up great, all the dockers were working perfectly, my networking with my other PCs was great, I thought I was set. And then I woke up this morning and the machine was unresponsive again. When I click the Log button to get /tools/syslog, it only has entries from the current boot, so I can't see what might have happened to crash the computer to the unresponsive state. Maybe I need a BIOS update or something? I haven't checked to see if my board needs one yet - it's a ASUS Prime A320I-K. I'll look into that right now.. Quote Link to comment
trurl Posted January 10, 2020 Share Posted January 10, 2020 7 minutes ago, takethecake said: When I click the Log button to get /tools/syslog, it only has entries from the current boot, so I can't see what might have happened to crash the computer to the unresponsive state. Set up Syslog Server: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=781601 Quote Link to comment
takethecake Posted January 11, 2020 Author Share Posted January 11, 2020 Okay here's what I got last night, it happened again naturally, though I could ping the server and get a response. The only line I got after I logged in one last time at 7:55pm to make sure everything looked good was the one bolded below - exit status something something mover. Seems like there's some error with the mover that's causing this? - would explain why it happens regularly overnight (although strangely, when the problem started it was more like a weekly occurrence). The two lines at 8:42 are the beginning of the startup sequence after I rebooted the unresponsive machine. Jan 10 19:55:00 MrPlex webGUI: Successful login user root from 172.16.0.9 Jan 11 03:30:04 MrPlex crond[1730]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Jan 11 08:42:03 MrPlex emhttpd: Starting services... Jan 11 08:42:03 MrPlex emhttpd: shcmd (101): /etc/rc.d/rc.samba restart Quote Link to comment
trurl Posted January 11, 2020 Share Posted January 11, 2020 syslog snippets are seldom sufficient Possibly that mover entry was simply the last entry in syslog before crash because nothing else was going on at the time. Do you have any particular reason to suspect that all your previous crashes happened at the scheduled mover time? Have you done memtest? Quote Link to comment
takethecake Posted January 11, 2020 Author Share Posted January 11, 2020 I've always noticed it first thing in the morning, I don't think it's ever happened during the day - that timing could suggest it being mover-related but maybe not. I did manually run the mover just now and nothing bad happened, and mover logging didn't show anything suspicious. I have not done memtest - looks like I need to do that for at least 24 hours? One other thing I should have mentioned that just occurred to me - I've been getting a notice in my Unraid dashboard that my M.2 SSD cache drive is running too hot for Unraid's liking. It's running around 55 degC, which it's done since I installed it - I initially googled it and it seemed that it shouldn't be a problem but maybe there's something related to that? Quote Link to comment
Squid Posted January 11, 2020 Share Posted January 11, 2020 17 minutes ago, takethecake said: M.2 SSD cache drive is running too hot for Unraid's liking. It's running around 55 degC, which it's done since I installed it - I You can change the threshhold by Main, Cache Devices and clicking on Cache. Quote Link to comment
JorgeB Posted January 12, 2020 Share Posted January 12, 2020 16 hours ago, takethecake said: exit status 1 from user root /usr/local/sbin/mover exit status 1 means mover was already running. Quote Link to comment
takethecake Posted January 12, 2020 Author Share Posted January 12, 2020 (edited) Alright well I upped the threshold yesterday so I don't have to worry about it anymore, but I don't feel any closer to solving this - crashed again last night. Only thing I have to go on is to test my memory, which I'm going to do today, but that's still a bit of a shot in the dark. Is there anything I can look for in my diagnostics zip? Right now, assuming the memtest checks out okay, I think my best strategy is to just disable my dockers one at a time until the system stabilizes, so I'll start that and report back. I also added all my hardware to my signature in case any of my components are known for causing trouble... Another thing I stumbled across while trying to figure this out is the "Fix Common Problems" plugin - so I'm installing that now to see if it helps dig up anything useful. *Update - welp, looks like a HW error. FCP plugin found machine check events, so I installed mcelog and lo and behold I got the following lovely lines: Jan 12 08:33:18 MrPlex kernel: mce: [Hardware Error]: Machine check events logged Jan 12 08:33:18 MrPlex kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 5: bea0000000000108 Jan 12 08:33:18 MrPlex kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff810725a4 MISC d012000100000000 SYND 4d000000 IPID 500b000000000 Jan 12 08:33:18 MrPlex kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1578843178 SOCKET 0 APIC 0 microcode 8001138 Debating whether to keep letting memtest run or to just turn everything off and re-seat every connector and the RAM and then redo the diagnostics... Edited January 12, 2020 by takethecake Quote Link to comment
takethecake Posted January 12, 2020 Author Share Posted January 12, 2020 And another update - I did some more googling on that error and discovered that there seems to be an issue with the 1000-series Ryzen processors and C-states. I exited the memtest (1 hr no faults found), disabled C-states in my BIOS, and updated the BIOS for good measure (made sure C-states were still disabled after the update). When I rebooted the server, I re-ran Fix Common Problems and the MCE's no longer came up. Won't really know for sure if this fixed the problem until tomorrow morning (as that's when it "strikes".... lol). Quote Link to comment
takethecake Posted January 13, 2020 Author Share Posted January 13, 2020 Welp, that seems to have solved it - quite the rabbit trail, but hopefully this thread can help out anyone in the future doing a 1000-series Ryzen build. Server couldn't be happier this morning. So if I understand C-states correctly, what was happening was the computer tried to go into a sort of power-saving mode, and that's when it would crash? Quote Link to comment
JorgeB Posted January 13, 2020 Share Posted January 13, 2020 2 minutes ago, takethecake said: So if I understand C-states correctly, what was happening was the computer tried to go into a sort of power-saving mode, and that's when it would crash? Correct, in more recent bios there's usually an option to fix that without disabling c-states, look for "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar). 1 Quote Link to comment
takethecake Posted January 13, 2020 Author Share Posted January 13, 2020 Oh cool, next time I power down I'll take a look around and see if I can find that - for now I'm just happy it's stable haha. Thanks!! Quote Link to comment
Curtis777 Posted January 15, 2022 Share Posted January 15, 2022 On 1/13/2020 at 6:42 PM, takethecake said: Oh cool, next time I power down I'll take a look around and see if I can find that - for now I'm just happy it's stable haha. Thanks!! how is your Motherboard doing ? i am thinking of buying the same for unraid with a AMD Ryzen 3 1200 and 2x16GB RAM Quote Link to comment
takethecake Posted January 16, 2022 Author Share Posted January 16, 2022 So far so good! Haven't had any return of the problem I had in this thread; the only thing I've experienced recently is my NVMe drive running hot - I've had one unexplained crash in the last few months and I think that high temp was the reason. The other thing I might've considered when building this rig is getting a mobo/cpu combo that had built-in video support. Even though i'm running a headless server, when booting Unraid up for the first time I ran into an issue where if I didn't have a PCI video card attached, I couldn't make the server accessible to a local browser so that I could log in. Even after setting everything up and trying to remove the video card I couldn't get the server to boot. So now I just keep that video card in the server, and it keeps me from being able to use my PCI SATA expansion card I got to try and utilize some extra HDDs. Oh well, I can always just get bigger disks if I run out of room. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.