I have a brand new UnRAID server that keeps "locking up" right before the parity sync finishes. I'm unsure what's going on. I've replaced everything except the hard drives (which are known to be in working order).
The host cpu is a Ryzen 5 5600G, and C states are turned off in the BIOS.
The most recent crash (detected by UptimeKuma) was around 2024-08-19 19:08:55 and looking at the syslog:
Aug 19 19:00:01 nas crond[1542]: failed parsing crontab for user root: Invalid frequency setting of /usr/local/emhttp/plugins/ca.update.applications/scripts/updateApplications.php >/dev/null 2>&1
Aug 19 19:08:05 nas kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Aug 19 19:08:05 nas kernel: sd 7:0:1:0: Power-on or device reset occurred
Aug 19 19:08:06 nas kernel: ixgbe 0000:04:00.0: Adapter removed
Aug 19 19:08:06 nas kernel: ixgbe 0000:04:00.0: Warning firmware error detected FWSM: 0xFFFFFFFF
Aug 19 19:08:08 nas kernel: br0: port 1(eth0) entered disabled state
Aug 19 19:08:11 nas ntpd[1517]: Deleting interface #1 br0, 10.0.1.5#123, interface stats: received=193, sent=193, dropped=0, active_time=30806 secs
Aug 19 19:08:11 nas ntpd[1517]: 162.159.200.1 local addr 10.0.1.5 -> <null>
Aug 19 19:08:11 nas ntpd[1517]: 216.239.35.4 local addr 10.0.1.5 -> <null>
Aug 19 19:08:35 nas kernel: sd 7:0:4:0: attempting task abort!scmd(0x0000000092884838), outstanding for 30065 ms & timeout 30000 ms
Aug 19 19:08:35 nas kernel: sd 7:0:4:0: [sdf] tag#1732 CDB: opcode=0x88 88 00 00 00 00 02 76 d5 7c a8 00 00 04 00 00 00
Aug 19 19:08:35 nas kernel: scsi target7:0:4: handle(0x000d), sas_address(0x4433221104000000), phy(4)
Aug 19 19:08:35 nas kernel: scsi target7:0:4: enclosure logical id(0x500605b007eda110), slot(7)
Aug 19 19:08:35 nas kernel: sd 7:0:4:0: task abort: SUCCESS scmd(0x0000000092884838)
Aug 19 19:08:35 nas kernel: sd 7:0:5:0: attempting task abort!scmd(0x000000002ca9186f), outstanding for 30105 ms & timeout 30000 ms
Aug 19 19:08:35 nas kernel: sd 7:0:5:0: [sdg] tag#1735 CDB: opcode=0x88 88 00 00 00 00 02 76 d5 7c a8 00 00 04 00 00 00
Aug 19 19:08:35 nas kernel: scsi target7:0:5: handle(0x000e), sas_address(0x4433221105000000), phy(5)
Aug 19 19:08:35 nas kernel: scsi target7:0:5: enclosure logical id(0x500605b007eda110), slot(6)
Aug 19 19:08:35 nas kernel: sd 7:0:5:0: task abort: SUCCESS scmd(0x000000002ca9186f)
Aug 19 19:08:35 nas kernel: sd 7:0:4:0: Power-on or device reset occurred
Aug 19 19:08:35 nas kernel: sd 7:0:5:0: Power-on or device reset occurred
Aug 19 19:08:36 nas kernel: sd 7:0:1:0: Power-on or device reset occurred
Aug 19 19:08:36 nas kernel: sd 7:0:4:0: Power-on or device reset occurred
Aug 19 19:08:36 nas kernel: sd 7:0:5:0: Power-on or device reset occurred
Aug 19 19:09:34 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Aug 19 19:09:34 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Aug 19 19:10:13 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
There's a few "curious" lines:
Aug 19 19:08:05 nas kernel: sd 7:0:1:0: Power-on or device reset occurred
Aug 19 19:08:06 nas kernel: ixgbe 0000:04:00.0: Adapter removed
Aug 19 19:08:06 nas kernel: ixgbe 0000:04:00.0: Warning firmware error detected FWSM: 0xFFFFFFFF
Aug 19 19:08:08 nas kernel: br0: port 1(eth0) entered disabled state
Aug 19 19:08:11 nas ntpd[1517]: Deleting interface #1 br0, 10.0.1.5#123, interface stats: received=193, sent=193, dropped=0, active_time=30806 secs
Aug 19 19:08:11 nas ntpd[1517]: 162.159.200.1 local addr 10.0.1.5 -> <null>
Aug 19 19:08:11 nas ntpd[1517]: 216.239.35.4 local addr 10.0.1.5 -> <null>
I've confirmed that the NIC is working in another system. (It's a genuine intel x520-da2). Even though the network appears to go down, when I try to login on the console directly, I'm unable to login (it times out after about 60 seconds).
Any ideas? I've tried letting the array rebuild twice, and it keeps locking up right at the end with a similar syslog.