Darqfallen
-
Posts
174 -
Joined
-
Last visited
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Report Comments posted by Darqfallen
-
-
New motherboard will be installed this weekend. I will start a new thread if there are any issues.
Cheers.
-
-
Crashed again, Kernel panic. By chance was at the console when it happened.
-
Removed the old ram and installed 64 gigs of new ram. Re-updated to 6.8.0-RC5. If this fails as well then it's a new motherboard.
-
That was done, memory was shifted into different channels and retested. Perhaps I will look for some new memory and see, though DDR3-1600 or 1866 ECC ram is expensive.
-
1 minute ago, limetech said:
That syslog is from something else. Clearly this is a h/w issue and not Unraid OS.
Then why would there be no problem in 5.7.0?
-
Changed Status to Open
Changed Priority to Urgent
-
What's also interesting is what is picked up by the IPMI
1306 11/04/2019 21:21:18 Unknown BIOS POST Progress Progress - Asserted 1305 11/04/2019 21:19:01 Unknown BIOS POST Progress Progress - Asserted 1304 11/04/2019 21:18:52 Unknown BIOS POST Progress Progress - Asserted 1303 11/04/2019 21:18:52 Unknown BIOS POST Progress Progress - Asserted 1302 11/04/2019 21:18:45 Unknown BIOS POST Progress Progress - Asserted 1301 11/04/2019 21:18:38 Unknown BIOS POST Progress Progress - Asserted 1300 11/04/2019 21:18:31 Unknown BIOS POST Progress Progress - Asserted 1299 11/04/2019 21:17:15 Unknown BIOS POST Progress Progress - Asserted 1298 11/04/2019 21:17:11 Unknown BIOS POST Progress Progress - Asserted 1297 11/04/2019 21:17:02 Unknown BIOS POST Progress Progress - Asserted 1296 11/04/2019 21:17:02 Unknown BIOS POST Progress Progress - Asserted 1295 11/04/2019 21:17:02 Unknown BIOS POST Progress Progress - Asserted 1294 11/04/2019 21:17:01 Unknown BIOS POST Progress Progress - Asserted 1293 11/04/2019 21:16:51 Unknown BIOS POST Progress Progress - Asserted 1292 11/04/2019 21:16:49 Unknown BIOS POST Progress Progress - Asserted 1291 10/25/2030 22:31:12 Unknown Progress - Asserted - Asserted 1290 11/26/2031 15:22:24 Unknown OS Critical Stop Progress - Asserted - Asserted - Asserted 1289 05/14/2023 08:49:52 Unknown [undefined] Progress - Asserted - Asserted - Asserted - Asserted 1288 09/09/2028 04:57:20 Unknown [undefined] Progress - Asserted - Asserted - Asserted - Asserted - Asserted 1287 04/03/1987 11:36:16 Unknown OS Critical Stop Progress - Asserted - Asserted - Asserted - Asserted - Asserted - Asserted 1286 12/30/2025 09:38:56 Unknown [undefined] Progress - Asserted - Asserted - Asserted - Asserted - Asserted - Asserted - Asserted 1285 11/04/2019 21:16:16 Unknown OS Critical Stop Run-Time Stop - Asserted
Perhaps is some update in the microcode that causes my cpus to not play nice with the ram.
Ive also included the syslog that is logging to the USB drive.
The server crashed again as I was writing this so I've downgraded back to 6.7.0
-
Updated to RC5 to see if there's any difference. Lasted ~23 hours before a crash and reboot.
Diag at tached.
-
So something in the kernel patch in 6.7.2 and 6.8.0-RC1 is causing my hardware to start throwing memory errors. And a memtest will not show bad memory due to it being ecc memory. How do you fully test ECC ram?
-
Why wouldn’t it show up in a memtest then?
-
But it doesn’t happen in 6.7.0. So I’m thinking it’s a software issue.
-
Ran a memtest for 8 hours. No errors.
-
I'm sorry but I didn't realize that the diagnostics didn't use the whole syslog. I posted above my diag of when it was crashing. Attached is my whole syslog including the crashing.syslog.zip
I'm not sure what else to do, if I leave the server running on anything > 6.7.0 then it crashes and reboots every 1-4 hours.
-
Server kept randomly restarting, downgraded to 6.7.0 and everything is stable again.
-
Should it just show up in the syslog?
-
Memtest passed no errors so I’m not quite sure what to do next. Should I just downgrade back to 6.7.0 where I know it was stable?
-
Changed Status to Open
-
Server ran for 2 hours then spat out these errors and rebooted.
Oct 13 14:04:31 Dirge kernel: mce: [Hardware Error]: Machine check events logged Oct 13 14:04:42 Dirge kernel: mce: [Hardware Error]: Machine check events logged Oct 13 14:10:39 Dirge kernel: mce: [Hardware Error]: Machine check events logged Oct 13 14:10:52 Dirge kernel: mce: [Hardware Error]: Machine check events logged Oct 13 14:34:08 Dirge kernel: mce: [Hardware Error]: Machine check events logged Oct 13 15:04:12 Dirge kernel: mce: [Hardware Error]: Machine check events logged Oct 13 15:04:14 Dirge kernel: mce: [Hardware Error]: Machine check events logged Oct 13 15:09:48 Dirge kernel: mce_notify_irq: 1 callbacks suppressed Oct 13 15:09:48 Dirge kernel: mce: [Hardware Error]: Machine check events logged Oct 13 15:13:16 Dirge kernel: mce: [Hardware Error]: Machine check events logged Oct 13 15:15:25 Dirge kernel: mce: [Hardware Error]: Machine check events logged Oct 13 15:19:14 Dirge kernel: mce: [Hardware Error]: Machine check events logged Oct 13 15:19:15 Dirge kernel: mce: [Hardware Error]: Machine check events logged Oct 13 15:21:30 Dirge kernel: mce_notify_irq: 1 callbacks suppressed Oct 13 15:21:30 Dirge kernel: mce: [Hardware Error]: Machine check events logged Oct 13 15:35:23 Dirge kernel: mce: [Hardware Error]: Machine check events logged
So Im running a memtest to see if any memory modules have failed. I am not sure what that means.
-
Changed Status to Retest
-
Yes 6.7.2, nope, the only reason I know it restarts is I get the email a parity check has started,
I opened her up and pulled every stick of ram and swapped with its opposite bank. Went into the BIOS and then noticed the DDR3-10600 was running at DDR3-12800 (1333mhz vs 1600mhz), so I've dropped it back down to the proper speed and will monitor for stability. I will let you know in 12 hours or so if its still stable.
-
Fans are working great, all fans reporting >5000rpm, ambient temp in the room is 25C, System Ambient is 30C, CPU1 36C, CPU2 40C. Didn't have this issue in 6.7.0, had a similar issue in
3.7.26.7.2. But didn't have the time to work on it so I had downgraded to 6.7.0 again -
2 minutes ago, itimpi said:
Have you checked that all fans are working. Random restarts can happen if the CPU is overheating. Other common causes are power supply and RAM issues.
Fans are working great, all fans reporting >5000rpm, ambient temp in the room is 25C, System Ambient is 30C, CPU1 36C, CPU2 40C. Didn't have this issue in 6.7.0, had a similar issue in 3.7.2. But didn't have the time to work on it so I had downgraded to 6.7.0 again.
-
I unplugged it, server still randomly restarts.
6.8.0-rc1 server randomly restarts
-
-
-
-
-
in Prereleases
Posted
Changed Status to Closed