cmalec

Members
  • Posts

    6
  • Joined

  • Last visited

cmalec's Achievements

Noob

Noob (1/14)

0

Reputation

  1. I have seen no 'hard' crashes since running the memtest. However I have noticed my uptime get reset when it should not. I have not restarted my server since the memtest so I should be at almost 14days however it says only 2days. Not sure why or how that happen. I do have my array set to auto-start but I did not reset it so unsure how it may have crashed and restarted itself. It maybe an entirely unrelated error/glitch but attaching logs from the last 3days anyway. So no answers but kind of no issues right now. Posting so if anyone finds this they don't think I fixed it and never posted the solution. scratch_12.txt
  2. made another 4th pass still no errors. exited and booted back up Unraid.
  3. memtest ran over night. Stilling running. will try to keep it running until later today
  4. attached diagnostics edgerunner-diagnostics-20221228-0827.zip
  5. I recently built a tower for my Unraid server(end of Oct) and it has been randomly "shutting down". It will be running fine and then I will notice everything is down. Its happened a few times its not consistent sometimes it runs for days or weeks without issue. When it does crash I can't reach the server but the tower is still on and running (lights fans etc still going). I run Unraid headless so nothing I can really do except reboot. Reboot and start the array without issue. I started capturing logs but I did not see anything in them that made me think something was wrong. There are large gaps in my logs that I assume is normal because it happens when the server is running normally as well. One time when I was rebooting and trying to figure out what was wrong I got an hardware error. I am not sure if this is the root cause or if it happens every time. Unsure how I would capture these hardware logs if im not literally looking at the screen. mcelog --cpu core_i5 --ascii --file /mnt/user/data/mce.txt Hardware event. This is not a software error. CPU 0 BANK 0 TSC 9d6db98fb MCG status: MCi status: Machine check not valid Corrected error MCA: No Error STATUS 0 MCGSTATUS 0 (Fields were incomplete) o which was from mce: [Hardware Error]: CPU O: Machine Check Exception: 5 Bank 6: f200000000900402 mce: [Hardware Error]: RIP !INEXACT! 10: <ffffffff810b5c67> {do_raw_spin_lock+0xb/0x1a} mce: [Hardware Error]: TSC 9d6db98fbo mce: [Hardware Error]: PROCESSOR O:a0671 TIME 1671733698 SOCKET O APIC O microcode 54 attached is the syslogs as well for example the server was up at 12/22 9am but then was found off at 9:50ish and i restarted things. there are no logs during this time. Any suggested next steps or ideas are welcome. syslog-192.168.1.77(12-22).log
  6. Few days ago booted up a fresh built tower with my fresh unraid boot stick. Everything spun up and worked fine. set up the array and some cache pools. started migrating some data with rsync. Everything going fairly smooth til i realize I can't ssh to the tower. Try to reboot but then I couldn't get the usb drive to be detected that lead me to flipping the bios setting for CSM on/off and got the boot stick back. Got back to the webUI but still no ssh. Poking around setting accidentally switch on ssl in settings>access management and lost the webUI. Shutdown take out boot stick plug it into laptop and turn off ssl via config and turn on ssh setting in config. Also delete ssh and ssl folders as mentioned in other threads on a way to fix lack of ssh. Now when i try to boot up i get nothing. machine seems to start running but nothing goes thru to the monitor so just 'no signal' message. but keyboard and tower light, fans spin so things are working. But also nothing seems to boot the tower never seems to get back its IP or be available. So assuming its gotta be the motherboard(asus prime z590-a)/bios but everything was running before so some setting i changed? Any ideas? Any way to revert my unraid settings while keeping the array data intact? Also trying to see is there a way to reset the bios settings. Open to any other options or things to try.