Hello everyone, I'm hoping to get some help with my random reboot problem. I just built this system in November and I’ve been struggling to solve this on my own. My server seems to crash or reboot itself after anywhere from 5 – 72 hours of uptime. I have the array set to not start itself on boot, so I will come back to my server and see it just sitting, waiting for me to start the array, with notifications about unclean shutdowns. It seems more likely to occur when I’m doing tasks with a lot of disk activity like a Parity Sync or running Mover (with many TBs to move) The server is plugged into an APC UPS with fresh batteries which report good health and I have NUT setup with a shutdown method. So far, I’ve done to following: Changed the CPU from an i5 13500 to an i7 13700k (upgrade unrelated to this issue). Removed an old disk with questionable SMART data, without alleviation. Replaced the TIM on my HBA. Reseated the RAM. I'm currently running memtest, but do not have results yet. I mirrored the syslog to flash and will attach it. System Specs: Intel Core i5-13500 > Intel Core i7-13700K. ASRock Z690 Extreme ATX LGA1700 Motherboard G.Skill Ripjaws V 64 GB (4 x 16 GB) DDR4-3200 CL16 Memory Appdata Storage: Western Digital Black SN850X 2 TB M.2-2280 PCIe 4.0 X4 NVME Solid State Drive Download Cache: 2x SAMSUNG 870 QVO 2.5" 1x SAMSUNG 850 EVO 2.5” PSU: SeaSonic FOCUS Plus Platinum 750 W 80+ Platinum HBA: LSI 9300-16i SAS in IT Mode. Please let me know if I’ve missed anything important or if more info is needed. Thank you. syslog

This is a HBA problem, make sure it's well seated and sufficiently cooled, you can also try a different PCIe slot. These should not be a problem. Ideally 24H but 2 or 3 passes are usually enough to find issues if it problem is big enough.

I'll try moving it to a different slot. The card has a fan mounted directly on the heatsink, but I've not found a way to monitor temps. Ah, sorry. I meant are these the things logged when Unraid first boots, illustrating that this was the point that the server rebooted? Thank you for all your help.

Unraid random reboots/crashes

December 11, 20232 yr

Hello everyone, I'm hoping to get some help with my random reboot problem. I just built this system in November and I’ve been struggling to solve this on my own. My server seems to crash or reboot itself after anywhere from 5 – 72 hours of uptime. I have the array set to not start itself on boot, so I will come back to my server and see it just sitting, waiting for me to start the array, with notifications about unclean shutdowns. It seems more likely to occur when I’m doing tasks with a lot of disk activity like a Parity Sync or running Mover (with many TBs to move)

The server is plugged into an APC UPS with fresh batteries which report good health and I have NUT setup with a shutdown method.

So far, I’ve done to following:

Changed the CPU from an i5 13500 to an i7 13700k (upgrade unrelated to this issue).
Removed an old disk with questionable SMART data, without alleviation.
Replaced the TIM on my HBA.
Reseated the RAM.
I'm currently running memtest, but do not have results yet.

I mirrored the syslog to flash and will attach it.

System Specs:

Intel Core i5-13500 > Intel Core i7-13700K.
ASRock Z690 Extreme ATX LGA1700 Motherboard
G.Skill Ripjaws V 64 GB (4 x 16 GB) DDR4-3200 CL16 Memory
Appdata Storage:

Western Digital Black SN850X 2 TB M.2-2280 PCIe 4.0 X4 NVME Solid State Drive

Download Cache:

2x SAMSUNG 870 QVO 2.5"

1x SAMSUNG 850 EVO 2.5”

PSU: SeaSonic FOCUS Plus Platinum 750 W 80+ Platinum
HBA: LSI 9300-16i SAS in IT Mode.

Please let me know if I’ve missed anything important or if more info is needed. Thank you.

syslog

Edited December 11, 20232 yr by Shomesomesho

Quote

December 11, 20232 yr

Community Expert

If memtest doesn't find anything try running the server with just one stick of RAM, if the same try a different one, that will basically rule out bad RAM or a board issue with all 4 DIMMs loaded.

Quote

December 11, 20232 yr

Thank you @JorgeB I will try that. I realize this isn't an Unraid specific question, but how long/how many passes would you advise memtest run?

Edit: I'm also seeing these messages in my log file:

Dec 11 01:11:11 Unraid kernel: mpt3sas_cm1: SAS host is non-operational !!!!
Dec 11 01:11:12 Unraid kernel: mpt3sas_cm0 fault info from func: mpt3sas_base_make_ioc_ready
Dec 11 01:11:12 Unraid kernel: mpt3sas_cm0: fault_state(0x2667)!
Dec 11 01:11:12 Unraid kernel: mpt3sas_cm0: sending diag reset !!
Dec 11 01:11:12 Unraid kernel: mpt3sas_cm1 fault info from func: mpt3sas_base_make_ioc_ready
Dec 11 01:11:12 Unraid kernel: mpt3sas_cm1: fault_state(0x2667)!
Dec 11 01:11:12 Unraid kernel: mpt3sas_cm1: sending diag reset !!

And then right after the above messages:

Dec 11 01:13:36 Unraid kernel: Linux version 6.1.49-Unraid (root@Develop-612) (gcc (GCC) 12.2.0, GNU ld version 2.40-slack151) #1 SMP PREEMPT_DYNAMIC Wed Aug 30 09:42:35 PDT 2023
Dec 11 01:13:36 Unraid kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot
Dec 11 01:13:36 Unraid kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Dec 11 01:13:36 Unraid kernel: BIOS-provided physical RAM map:

Was this the reboot/crash point?

Edited December 11, 20232 yr by Shomesomesho

Quote

December 11, 20232 yr

Community Expert

1 hour ago, Shomesomesho said:

SAS host is non-operational !!!!

This is a HBA problem, make sure it's well seated and sufficiently cooled, you can also try a different PCIe slot.

1 hour ago, Shomesomesho said:

Dec 11 01:13:36 Unraid kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Dec 11 01:13:36 Unraid kernel: BIOS-provided physical RAM map:

These should not be a problem.

1 hour ago, Shomesomesho said:

but how long/how many passes would you advise memtest run?

Ideally 24H but 2 or 3 passes are usually enough to find issues if it problem is big enough.

Quote

December 11, 20232 yr

11 minutes ago, JorgeB said:

This is a HBA problem, make sure it's well seated and sufficiently cooled, you can also try a different PCIe slot.

I'll try moving it to a different slot. The card has a fan mounted directly on the heatsink, but I've not found a way to monitor temps.

11 minutes ago, JorgeB said:

These should not be a problem.

Ah, sorry. I meant are these the things logged when Unraid first boots, illustrating that this was the point that the server rebooted?

11 minutes ago, JorgeB said:

Ideally 24H but 2 or 3 passes are usually enough to find issues if it problem is big enough.

Thank you for all your help.

Edited December 11, 20232 yr by Shomesomesho

Quote

December 11, 20232 yr

Community Expert

1 hour ago, Shomesomesho said:

Dec 11 01:13:36 Unraid kernel: Linux version 6.1.49-Unraid (root@Develop-612) (gcc (GCC) 12.2.0, GNU ld version 2.40-slack151) #1 SMP

Yes, this line means a new boot.

Quote

December 13, 20232 yr

Memtest passed with no errors. I swapped the HBA to a different port and it's been up for 24 hours so far. If it crashes again, I'll remove 3 sticks of ram and test each individually.

Quote

December 15, 20232 yr

3 Days, 14 hours uptime since just swapping the HBA out of the PCIe 5.0 into the PCIe 4.0. Don't want to get hopeful too soon, but this seems to have been the problem. Can anyone explain why the 5.0 slot would have an issue with my HBA? Should I have set the 5.0 slot in the bios to 3.0 link speed? Should I be setting the 4.0 to 3.0 link speeds?

Quote

December 15, 20232 yr

Community Expert

10 minutes ago, Shomesomesho said:

Can anyone explain why the 5.0 slot would have an issue with my HBA?

I can't.

Quote

January 4, 20242 yr

Solution

On 12/11/2023 at 11:41 AM, JorgeB said:

you can also try a different PCIe slot.

System hasn't had a similar crash since. Going to say this was the problem. Thank you for your help!

Quote

Unraid random reboots/crashes

Featured Replies

Solved by Guest

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)