Crashed again, stepping in the dark again

January 2Jan 2

Hi there,

it's been some time since I last posted about a crash, but it now happened again. It also happened before, but I didn't post then.

What happened in the meantime:

server crashed again

I removed the two 8GB DIMM from my system
I memtest'ed only the one 32GB DIMM: pass
I memtest'ed only the 2x 8GB Kit: pass
left the 32GB Kit inside
server crashed
switched to the 32GB DIMM
(time passed, and for some reason I thought the 2x 8GB must be bad, because they are the oldest, so I sold them - they are gone)
because of a random talk I had with a co-worker about the crash, our IT guy gave me a 4x 16GB Kit, because all our machines are maxed out anyway and it just sits on the shelf...
I installed this 64GB kit, memtest'ed it: pass
time went on and it felt kinda good, as if this whole thing is over

Now I am sitting here, the server not even doing anything really (maybe one person watching Plex, and except for that just "being"), suddenly the fans blow and the piezo speaker goes off. Oh well.
I attached two logs, for crashes that happened; I guess there's nothing of value in there. I attached Diagnositics from just now, and two photos of the memtest (which also only kinda add frustration as I'd love find a hard reason why it happens, haha)

Given the log data, my guess is that a hardware failure is still the most likely reason this happens.But I don't know how much different memory I can get and test without selling a kidney; the CPU rarely gets hot, is cooled quited well, and both the Mainboard and CPU are only roughly one year old. It somehow never happens when any "heavy" tasks are done on the machine; two days ago I had 7 parallel Plex streams (which is very high for me), no issue.
Can it have something to do with powertop --auto-tune? As it happens in "mostly idle" (even though in the "near complete idle" at night it doesn't happen either).

I am out of ideas :(

syslog-192.168.178.8_crash260201-2255.log syslog-192.168.178.8_crash251213-1548.log lunas-diagnostics-20260102-2320.zip

Quote

January 3Jan 3

Community Expert

I'm afraid there's still nothing relevant logged, could be the Intel 13/14 gen issue, though this typically affects mostly the K models. There have been confirmed cases of non-K also having the issue.

Look for a BIOS update; it can sometimes help with that if the CPU is not too far gone. Failing that, it would be good to test with a different CPU.

Quote

1

January 3Jan 3

Community Expert

You might try a different PSU, too, just to rule out flaky power

Quote

1

January 3Jan 3

Author

Hi @JorgeB , thanks as always for getting back at this. A BIOS update is a very good idea; I am currently on F10 if I may trust the UI (will have to wait for the Parity Check that started till I can enter the BIOS). At least F12 was already out when I got the mainboard, and now we are at F16. Some change logs speak about "general improvements and stability", not just a new CPU or so, so... fingers crossed :) getting some different CPU is likely not easy, but I can probably try. So I understood I shouldn't look for 13/14 K models, is there an issue with 12K or 15K models? Just so I know what to look out for.

Hi @Michael_P , thanks for the suggestion! The PSU is also quite new, bought it last year with this Mainboard and CPU, on the quest for higher efficiency. This is not to say "it can't be that because it's new", I think I have the old PSU from before still around somewhere (or I ask IT if they also have one just sitting in the shelves).

One of the big gripes I have is that it's not linked to a clear case/action. This way, testing a new config can take just two days (as last post) or a month (like now), haha

Quote

January 5Jan 5

Author

sigh... the parity check went through, 0 errors, all fair enough.

I forgot my regular parity check starts sometime after midnight first monday of the month, but I now just left it. Was just going to go to bed .... and I hear that suspicious sound of fans and HDDs, followed by the piezo speaker. 3% into the new parity check, if I may trust the last UI refresh.

BIOS will be updated in two days or so then. Maybe I already have a look for Intel 12th series models or so.

Best Regards

Quote

January 9Jan 9

Author

Okay. I shutdown the server, installed the newest BIOS. I also made the BIOS settings less aggressive in terms of power saving/C-states etc. I messed something up while booting, so I shut the machine down again, only to realize it was already too far booted and it triggered an unclean shutdown. That parity check went through some hours ago. Now, almost one hour ago, the server crashed again. In the meantime, I installed an external SyslogServer just in case it doesn't write everything properly or so, but no avail: no valuable info either.

Third parity check is the charm I hope (god my HDDs will hate me)

So, back at square 1.

I looked into that Intel thing, it seems my CPU was never affected by that bug (at least not "officially", no idea what else might have been reported in this forum). Because the times are as shitty as they are, an i5-12400 costs more than I paid for this i5-13500 a year ago. 12500 and 12600 being even more expensive (they all lack E-cores, but I have feeling that unRAID doesn't use them anyway?)

If you really think this could solve this @JorgeB , I'd definitely consider this switch, as I am somewhat out of ideas. I also have to see if I have that older PSU I used before the current one somewhere (at least I don't recall that I sold it)

Best Regards

//edit I just noticed that the crash last monday was at a similar (but not the same) time as today, around 01:40h. Today is also Friday, so same day it crashed when I originally made this post (but much earlier, that was 22:55h). Probably just coincidences, but yeah, grasping straws here

Edited January 9Jan 9 by CameraRick
added info regarding timing (only appended in the end)

Quote

January 9Jan 9

Community Expert

5 hours ago, CameraRick said:
it seems my CPU was never affected by that bug (at least not "officially"

Will K models are far more common; I've seen multiple cases in the forum with non K CPUs. Issues were resolved after replacing it.

Quote

1

January 9Jan 9

Author

Hi @JorgeB , thanks for confirming this. I also found a Reddit post where disabling C-States seem to have added stability, which is something I also thought about already. Can't try because of Parity Check tho (I have to disable reboot after crash, huh).

So, I will probably try to switch against a 12400, and try to cut my losses.

Quote

January 9Jan 9

Community Expert

27 minutes ago, CameraRick said:
So, I will probably try to switch against a 12400, and try to cut my losses.

I'd give the PSU a shot first

Quote

1

January 9Jan 9

Author

Hi there @Michael_P , the issue I face now is that I can return the i5-13500, but only within two weeks. I contacted the seller because of the official Intel-issues, and they allowed the return, but now I am on a deadline. So I will order a new CPU now anyway (but still contemplating which one, as prices are... strange).
Right now I have spare vacation days, so I have actually time arranging all this, but these constant crashes are stressing me out a lot :( all the parity checks as well; one crash always costs two days that I can't do a thing. I will still try to find the older PSU, and see if I can use it anyway - it might have too few power connectors for this setup by now :(

Best Regards

//edit @JorgeB I ordered a 12600K now, because it was effectively the same price as the 12400. I read a lot, and the K's are apparently no issue in 12th gen. Will likely cripple the TDP, but I don't like paying same for less. Hopefully arrives soon! Thanks again you two !

Edited January 9Jan 9 by CameraRick
added info on purchase

Quote

January 17Jan 17

Author

A little update on this matter.

To catch up a little: I started to experiment with BIOS settings left and right, to get the most stable etc, some of them seemed to make it worse. Sometimes the Server crashed after mere hours then. Disabling C-States didn't help, not using powertop --auto-tune didn't do anything, deactivating ASPM-fixes (for SATA controllers) didn't do a thing. Or say, only more crashes, not less.

So as mentioned, I ordered the i5-12600K, installed it beginning of this week. I started with moderate BIOS settings, but since yesterday I also re-enabled all C-State stuff and use powertop --auto-tune. ASPM is still not in (one at a time). What I want to get at: I didn't had a single crash or instability since the CPU swap. It seems to just work. So thanks @JorgeB , I do think this was the culprit (even tho I am not 100% trusting the peace, but for the time beeing, I think this is fixed)

Quote

1

January 20Jan 20

Author

Okay, I was too early with this. Man I look dumb doing a triple post now, ha

sorry I had to unsolve it, Jorge :(

As I mentioned, I restarted the machine with more "aggressive" C-State politics and powertop --auto-tune. Two days ago, the machine crashed again. I restartet, same BIOS settings, but no --auto-tune enabled; parity check went through, and a few mins ago it crashed again.
I do believe the CPU had some play in this, so having it switched is not a waste, but something dawned for me yesterday.

The logs don't show any obvious errors, but almost all Logs show something similar in the last few lines:

13th of December

Dec 13 15:40:03 LuNAS emhttpd: spinning down /dev/sdh

Dec 13 15:40:55 LuNAS emhttpd: read SMART /dev/sdh

2nd of January

Jan 2 22:45:51 LuNAS emhttpd: spinning down /dev/sdg

18th of January

2026-01-18T16:29:50+01:00 LuNAS emhttpd: read SMART /dev/sdj

2026-01-18T16:30:06+01:00 LuNAS emhttpd: spinning down /dev/sdk

2026-01-18T16:30:06+01:00 LuNAS emhttpd: spinning down /dev/sde

today, 20th of January

2026-01-20T14:39:59+01:00 LuNAS emhttpd: spinning down /dev/sdc

2026-01-20T14:40:50+01:00 LuNAS emhttpd: spinning down /dev/sdo

2026-01-20T14:42:16+01:00 LuNAS emhttpd: read SMART /dev/sdi

There were more crashes, but I didn't archive them properly and now I have trouble finding out the exact time stamps to look through it. In any case, I recall they always looked like this. Which is not suspicious... but maybe it is?
It's always when spinning down disks, or reading SMART data. So could this, somehow, cause the crashes? Does this interfere with e.g. C-States so much?

Since late december, I developed a Docker for a fan controller. At some point (can't exactly recall when, but probably already new years) I set it up to read SMART data directly, because the disks.ini seems to have cached data that is not updated often. This could match some of the crashes (that I don't have Log data for listed above). Else, the ASM1166 cards could probably be a culprit?

Quote

January 20Jan 20

Community Expert

57 minutes ago, CameraRick said:
So could this, somehow, cause the crashes?

Very much doubt that is the problem; it's just normal to see a lot of those logged, but you can temporarily disable spin down and retest.

Quote

January 20Jan 20

Author

Spindown is one aspect, but reading SMART data another.

I noticed that disks don't wake up when you read temp through smartutil, but maybe that does something? I noticed these crashes mainly in times of "idle", so it somehow feels connected. I just don't know how :(

For now I will relax the C-States again, but it feels like there is some hardware issue that will be expensive to fix 🙈

Quote

January 20Jan 20

Community Expert

4 minutes ago, CameraRick said:
but reading SMART data another

That just means that Unraid thinks the drive has just spun up.

Quote

January 20Jan 20

Author

the Docker reads the SMART data through smartutil, but it doesn't wake up drives. Maybe reading the data could also cause issues? Mind you, it reads this data multiple times a minute. For now, I will only read from disks.ini

I'm just reaching for straws I guess :(

Quote

January 27Jan 27

Author

Ah well. It ran stable, till it didn't 😀

2026-01-27T14:08:25+01:00 LuNAS emhttpd: spinning down /dev/sde

2026-01-27T14:09:40+01:00 LuNAS emhttpd: spinning down /dev/sdc

2026-01-27T14:23:24+01:00 LuNAS emhttpd: spinning down /dev/sdd

That's the last three lines in the log. It's not even always the same disk, that would at least give a hint for a broken one or something :/

I think I was on the server remotely when it happened. I clicked on "archive" in the message center, then it had the loading circle for some time unusual long time. Didn't think much, and then received an error from my DAVx backup which is usually a sign the server is down. Went back to the unRAID tab, still loading, and after F5 it was (of course) not loading at all.

So, I relaced the C-States, I didn't do powertop --auto-tune, and I didn't had the Docker fetch data with smartutil.

I am really out of ideas, except it being some hardware defect I am incapable of finding :/

Quote

Crashed again, stepping in the dark again

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)