sedoro

Members
  • Posts

    5
  • Joined

  • Last visited

sedoro's Achievements

Noob

Noob (1/14)

1

Reputation

  1. Hi it's been 14 days uptime with 0 problems nor errors. Two parity checks completed with 0 errors. It seems the problem was related with Unraid 6.7.x somehow. Hope it gets fixed in future updates. I find the "Hardware event. This is not a software error." message quite misleading.
  2. So I've been able to complete a parity check (25 hours, 2.943 errors) by dowgrading Unraid to 6.6.7. The system has been up for 1 day 2 hours now, maybe I've just been lucky, but I have good feelings as I tried parity check like 20 times before with version 6.7.x with no luck. No more MCE errors neither. I'll run another parity check in some days and if the system doesn't reboot will add a [Solved] to the title.
  3. It's been a month since I had the first Hardware Error, and it just got worst. The system is randomly rebooting since end of July (Kernel Panic reboots - see attached capture). I haven't been able to perform a parity check as the system always reboots before it finishes (10 TB, 25hours usually) and I know there are parity errors so living in the edge now. When not performing parity check, the maximum period of no reboots have been of 4 days, but is is so random, that sometimes it just reboots before I can start array again. This is what I've discarded and why: RAM: I removed all sticks but 1 and ran system. Same reboots. I did it with 3 different sticks and different slots. PSU: I have dual PSU, have tried with only 1 at a time with same result. APU: Ran the system directly to AC. Same results. Latest Unraid upgrade. The problems started, more or less, when I upgraded to 6.7.2. I downgraded to 6.7.1 but reboots happens like always. I also removed both CPUs, looked for dust or twisted pins, and applied new thermal grease after that. I contacted the retailer and after some hardware tests they said this: Could it be related to a buggy microcode or to a software problem? They say I could try downgrade to 6.3.2 as seemed to be the point of conversation in that thread. What do you think? Is it worh trying? Also, two days ago I got a new Hardware Error: Thanks all for you help. PD: Title changed according to new symtoms. syslog
  4. Thanks for the answers, I tried memtest in the boot menu, the system rebooted but nothing happened. After reboot, everything was fine until this morning when I received another Hardware error. This one is different. What should I do next? syslog290719
  5. Hello everyone I've had my server since last Christmas so I'm quite new to this, and this morning I've woke up with a hardware error in my system: As far as I understand, there's a defect memory module. So should I just remove/replace this module? Syslog says it's the Channel 1, DIMM 0 module. I've attached the diagram of my MB. Channel 1 DIMM 0 would correspond to CPU1_DIMM_C0? Thanks in advace for your help! syslog tower-diagnostics-20190727-0756.zip