takethecake

Members
  • Posts

    11
  • Joined

  • Last visited

Everything posted by takethecake

  1. So far so good! Haven't had any return of the problem I had in this thread; the only thing I've experienced recently is my NVMe drive running hot - I've had one unexplained crash in the last few months and I think that high temp was the reason. The other thing I might've considered when building this rig is getting a mobo/cpu combo that had built-in video support. Even though i'm running a headless server, when booting Unraid up for the first time I ran into an issue where if I didn't have a PCI video card attached, I couldn't make the server accessible to a local browser so that I could log in. Even after setting everything up and trying to remove the video card I couldn't get the server to boot. So now I just keep that video card in the server, and it keeps me from being able to use my PCI SATA expansion card I got to try and utilize some extra HDDs. Oh well, I can always just get bigger disks if I run out of room.
  2. Oh cool, next time I power down I'll take a look around and see if I can find that - for now I'm just happy it's stable haha. Thanks!!
  3. Welp, that seems to have solved it - quite the rabbit trail, but hopefully this thread can help out anyone in the future doing a 1000-series Ryzen build. Server couldn't be happier this morning. So if I understand C-states correctly, what was happening was the computer tried to go into a sort of power-saving mode, and that's when it would crash?
  4. And another update - I did some more googling on that error and discovered that there seems to be an issue with the 1000-series Ryzen processors and C-states. I exited the memtest (1 hr no faults found), disabled C-states in my BIOS, and updated the BIOS for good measure (made sure C-states were still disabled after the update). When I rebooted the server, I re-ran Fix Common Problems and the MCE's no longer came up. Won't really know for sure if this fixed the problem until tomorrow morning (as that's when it "strikes".... lol).
  5. Alright well I upped the threshold yesterday so I don't have to worry about it anymore, but I don't feel any closer to solving this - crashed again last night. Only thing I have to go on is to test my memory, which I'm going to do today, but that's still a bit of a shot in the dark. Is there anything I can look for in my diagnostics zip? Right now, assuming the memtest checks out okay, I think my best strategy is to just disable my dockers one at a time until the system stabilizes, so I'll start that and report back. I also added all my hardware to my signature in case any of my components are known for causing trouble... Another thing I stumbled across while trying to figure this out is the "Fix Common Problems" plugin - so I'm installing that now to see if it helps dig up anything useful. *Update - welp, looks like a HW error. FCP plugin found machine check events, so I installed mcelog and lo and behold I got the following lovely lines: Jan 12 08:33:18 MrPlex kernel: mce: [Hardware Error]: Machine check events logged Jan 12 08:33:18 MrPlex kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 5: bea0000000000108 Jan 12 08:33:18 MrPlex kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff810725a4 MISC d012000100000000 SYND 4d000000 IPID 500b000000000 Jan 12 08:33:18 MrPlex kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1578843178 SOCKET 0 APIC 0 microcode 8001138 Debating whether to keep letting memtest run or to just turn everything off and re-seat every connector and the RAM and then redo the diagnostics...
  6. I've always noticed it first thing in the morning, I don't think it's ever happened during the day - that timing could suggest it being mover-related but maybe not. I did manually run the mover just now and nothing bad happened, and mover logging didn't show anything suspicious. I have not done memtest - looks like I need to do that for at least 24 hours? One other thing I should have mentioned that just occurred to me - I've been getting a notice in my Unraid dashboard that my M.2 SSD cache drive is running too hot for Unraid's liking. It's running around 55 degC, which it's done since I installed it - I initially googled it and it seemed that it shouldn't be a problem but maybe there's something related to that?
  7. Okay here's what I got last night, it happened again naturally, though I could ping the server and get a response. The only line I got after I logged in one last time at 7:55pm to make sure everything looked good was the one bolded below - exit status something something mover. Seems like there's some error with the mover that's causing this? - would explain why it happens regularly overnight (although strangely, when the problem started it was more like a weekly occurrence). The two lines at 8:42 are the beginning of the startup sequence after I rebooted the unresponsive machine. Jan 10 19:55:00 MrPlex webGUI: Successful login user root from 172.16.0.9 Jan 11 03:30:04 MrPlex crond[1730]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Jan 11 08:42:03 MrPlex emhttpd: Starting services... Jan 11 08:42:03 MrPlex emhttpd: shcmd (101): /etc/rc.d/rc.samba restart
  8. Ahh yeah I hadn't acquired a license yet, so I learned that when I went through the process of making a new boot drive and just went ahead and bought one. But, of course, even after making the new boot drive with a new USB 2.0 drive, and copying JUST the /config files over, this dumb problem returned. The new boot drive booted up great, all the dockers were working perfectly, my networking with my other PCs was great, I thought I was set. And then I woke up this morning and the machine was unresponsive again. When I click the Log button to get /tools/syslog, it only has entries from the current boot, so I can't see what might have happened to crash the computer to the unresponsive state. Maybe I need a BIOS update or something? I haven't checked to see if my board needs one yet - it's a ASUS Prime A320I-K. I'll look into that right now..
  9. Got it, I wasn't quite sure how that worked but sounds easy enough! Thanks for the help, I really appreciate it!
  10. Hmm, seems like it's still doing it even when plugged into a USB2 port. If I make a new boot drive using a different USB stick, can I avoid setting up the array and all the docker containers again?
  11. So I recently got my first server up and running (for Plex, Sonarr, Radarr), and it's been working phenomenally for the last ~2 weeks, other than this one particular quirk: every so often (it's happened three times total now), I'll try and log into the server and it'll be unresponsive (can't ping or load login page). The first two times this happened, I rebooted the machine and still had the same problem - when I connected a monitor to the Unraid box I saw it just didn't see the USB drive in the BIOS. So I moved the USB drive to another port, and when I rebooted the server came up perfectly fine like nothing had happened. This last time, simply rebooting the server got it back up; I didn't need to relocate the drive this time. So my question is how can I figure out what is going on here? Some thoughts: 1) Where could I find relevant log files for seeing if there's some shutdown event that's occurring? (this is kind of a "troubleshooting this type of thing in general" type of question) 2) Is there a preferred USB drive size, brand, etc? I'm using an ADATA UV128 16GB USB3.0 drive, and it's currently plugged into the USB 3.0 port on the front of my case. Previously I had it plugged into a back I/O shield USB 3.0 port, and then internally using a female-USB-to-mobo-header cable, both worked initially and then needed to be switched after the BIOS stopped recognizing them. Thanks guys, loving Unraid and the community so far!!